I need to search for lines in a CSV file that end in an unterminated, double-quoted string.
我需要在CSV文件中搜索以未终止的双引号字符串结尾的行。
For example:
例如:
1,2,a,b,"dog","rabbit
would match whereas
会匹配而
1,2,a,b,"dog","rabbit","cat bird"
1,2,a,b,"dog",rabbit
would not.
不会。
I have very limited experience with regular expressions, and the only thing I could think of is something like
我对正则表达式的经验非常有限,我唯一能想到的就是这样的
"[^"]*$
However, that matches the last quote to the end of the line.
但是,这会将最后一个引号与行尾相匹配。
How would this be done?
这怎么办?
5
Assuming quotes can't be escaped, you need to test the parity of quotes (making sure that there's an even number of them instead of odd). Regular expressions are great for that:
假设引号无法转义,则需要测试引号的奇偶校验(确保它们的偶数而不是奇数)。正则表达式非常适合:
^(([^"]*"){2})*[^"]*$
That will match all lines with an even number of quotes. You can invert the result for all strings with an odd number. Or you can just add another ([^"]*")
part at the beginning:
这将匹配具有偶数引号的所有行。您可以将所有字符串的结果反转为奇数。或者您可以在开头添加另一个([^“] *”)部分:
^[^"]*"(([^"]*"){2})*[^"]*$
Similarly, if you have access to reluctant operators instead of greedy ones you can use a simpler-looking expression:
同样,如果您可以访问不情愿的运算符而不是贪婪的运算符,则可以使用更简单的表达式:
^((.*"){2})*.*$ #even
^.*"((.*"){2})*.*$ #odd
Now, if quotes can be escaped, it's a different question entirely, but the approach would be similar: determine the parity of unescaped quotes.
现在,如果引用可以被转义,那么它完全是一个不同的问题,但方法类似:确定未转义引号的奇偶性。
4
Assuming that the strings cannot contain "
, you need to match a string that has an odd number of quotes, like this:
假设字符串不能包含“,则需要匹配具有奇数引号的字符串,如下所示:
([^"]*("[^"]*")?)*"
Note that this is vulnerable to a DDOS attack.
请注意,这很容易受到DDOS攻击。
This will match zero or more sets of unquoted run, followed by quoted strings.
这将匹配零个或多个未加引号的运行集,后跟引用的字符串。
1
Try this one:
试试这个:
".+[^"](,|$)
This matches a quote (anywhere in the line), followed (greedily) by anything but another quote before the end of the line or a comma.
这匹配一个引号(在行中的任何位置),除了(贪婪)除了行结尾之前的另一个引号或逗号之外的任何内容。
The net affect is that it will only match lines with dangling quoted strings.
净影响是它只匹配悬挂引用字符串的行。
I think it's even immune to 'nested expandos attacks' (we do live in a very dangerous world ...)
我认为它甚至不受'嵌套扩展攻击'的影响(我们确实生活在一个非常危险的世界......)
0
To avoid "nested expandos":
要避免“嵌套expandos”:
egrep -v '^[^"]*("[^"]*"[^"]*)*[^"]*$' my_file
本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:http://www.silva-art.net/blog/2010/05/25/cf64a49c6aa33112afff515cad27690f.html。