用于查找未终止字符串的正则表达式

[英]Regex for finding an unterminated string


I need to search for lines in a CSV file that end in an unterminated, double-quoted string.

我需要在CSV文件中搜索以未终止的双引号字符串结尾的行。

For example:

例如:

1,2,a,b,"dog","rabbit

would match whereas

会匹配而

1,2,a,b,"dog","rabbit","cat bird"
1,2,a,b,"dog",rabbit

would not.

不会。

I have very limited experience with regular expressions, and the only thing I could think of is something like

我对正则表达式的经验非常有限,我唯一能想到的就是这样的

"[^"]*$

However, that matches the last quote to the end of the line.

但是,这会将最后一个引号与行尾相匹配。

How would this be done?

这怎么办?

4 个解决方案

#1


5  

Assuming quotes can't be escaped, you need to test the parity of quotes (making sure that there's an even number of them instead of odd). Regular expressions are great for that:

假设引号无法转义,则需要测试引号的奇偶校验(确保它们的偶数而不是奇数)。正则表达式非常适合:

^(([^"]*"){2})*[^"]*$

That will match all lines with an even number of quotes. You can invert the result for all strings with an odd number. Or you can just add another ([^"]*") part at the beginning:

这将匹配具有偶数引号的所有行。您可以将所有字符串的结果反转为奇数。或者您可以在开头添加另一个([^“] *”)部分:

^[^"]*"(([^"]*"){2})*[^"]*$

Similarly, if you have access to reluctant operators instead of greedy ones you can use a simpler-looking expression:

同样,如果您可以访问不情愿的运算符而不是贪婪的运算符,则可以使用更简单的表达式:

^((.*"){2})*.*$         #even
^.*"((.*"){2})*.*$      #odd

Now, if quotes can be escaped, it's a different question entirely, but the approach would be similar: determine the parity of unescaped quotes.

现在,如果引用可以被转义,那么它完全是一个不同的问题,但方法类似:确定未转义引号的奇偶性。

#2


4  

Assuming that the strings cannot contain ", you need to match a string that has an odd number of quotes, like this:

假设字符串不能包含“,则需要匹配具有奇数引号的字符串,如下所示:

([^"]*("[^"]*")?)*"

Note that this is vulnerable to a DDOS attack.

请注意,这很容易受到DDOS攻击。

This will match zero or more sets of unquoted run, followed by quoted strings.

这将匹配零个或多个未加引号的运行集,后跟引用的字符串。

#3


1  

Try this one:

试试这个:

".+[^"](,|$)

This matches a quote (anywhere in the line), followed (greedily) by anything but another quote before the end of the line or a comma.

这匹配一个引号(在行中的任何位置),除了(贪婪)除了行结尾之前的另一个引号或逗号之外的任何内容。

The net affect is that it will only match lines with dangling quoted strings.

净影响是它只匹配悬挂引用字符串的行。

I think it's even immune to 'nested expandos attacks' (we do live in a very dangerous world ...)

我认为它甚至不受'嵌套扩展攻击'的影响(我们确实生活在一个非常危险的世界......)

#4


0  

To avoid "nested expandos":

要避免“嵌套expandos”:

egrep -v '^[^"]*("[^"]*"[^"]*)*[^"]*$' my_file

注意!

本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:http://www.silva-art.net/blog/2010/05/25/cf64a49c6aa33112afff515cad27690f.html



猜您在找
用于递归查找字符串的正则表达式? - Regex for finding a string recursively? 正则表达式:SyntaxError:未终止的字符串文字。 - Regular Expressions: SyntaxError: unterminated string literal Eclipse正则表达式用于查找以下字符串 - Eclipse regex for finding the following string 用于查找查询字符串的倒数的正则表达式 - Regex for finding the inverse of a queried string 用于查找特定正则表达式匹配的字符串的算法 - Algorithm for finding strings that a specific Regex will match 仅使用正则表达式在字符串中查找中间字符 - finding middle character in string using regex only JavaScript正则表达式:查找不包含的字符串 - JavaScript Regex: Finding a String that does not contain

用正则表达式在字符串中查找单词。 - Octave - Finding words in a string using regex
相关教程
【No46】Java菜鸟到大牛学习路线之高级篇 【No44】Java菜鸟到大牛学习路线培训教程 【No254】冲击年薪50万之机器学习到深度学习学习路线教程 【No117】机器学习人工智能数学应用基础视频学习路线 【No45】Java菜鸟到大牛学习路线之实战篇 【No131】大数据学习从入门到精通学习路线视频教程 100G 【No171】冲击年薪50万之从数学基础python机器学习到深度学习算法学习路线视频教程 共321G 【No253】年薪50万!从python基础到人工智能机器学习深度学习攀登之路学习路线视频教程下载 总共300G
 
© 2014-2018 ITdaan.com 粤ICP备14056181号