(转)shlex — 解析 Shell 风格语法
原文:https://pythoncaff.com/docs/pymotw/shlex-parse-shell-style-syntaxes/171 这是一篇协同翻译的文章,你可以点击『我来翻译』按钮来参与翻译。
? 解析引用字符串##当我们输入文本时,遇到的一个常见问题是识别由引用字符构成的序列并把它们当做一个单独的实体。以引号分割文本有时并不能获得预期的效果,尤其是当引用具有嵌套层次时。例如以下的文本: This string has embedded "double quotes" and ‘single quotes‘ in it,and even "a ‘nested example‘".
一种简易的办法是构建一个正则表达式以找出在引号外的部分文本并把它们与引号内的部分分离开来,或相反的过程。但其实现过程非常繁琐,并且由于单引号和撇号易于混淆或是拼写错误而经常引发错误。更好的解决方法是使用真正的语法解析器,如? shlex_example.py import shlex import sys if len(sys.argv) != 2: print(‘Please specify one filename on the command line.‘) sys.exit(1) filename = sys.argv[1] with open(filename,‘r‘) as f: body = f.read() print(‘ORIGINAL: {!r}‘.format(body)) print() print(‘TOKENS:‘) lexer = shlex.shlex(body) for token in lexer: print(‘{!r}‘.format(token))
当该程序用于包含引号的数据时,语法解析器会生成一个包含期望标记的列表。 $ python3 shlex_example.py quotes.txt ORIGINAL: ‘This string has embedded "double quotes" andn‘singl e quotes‘ in it,and even "a ‘nested example‘".n‘ TOKENS: ‘This‘ ‘string‘ ‘has‘ ‘embedded‘ ‘"double quotes"‘ ‘and‘ "‘single quotes‘" ‘in‘ ‘it‘ ‘,‘ ‘and‘ ‘even‘ ‘"a ‘nested example‘"‘ ‘.‘
孤立的引号,例如撇号被按同样方法处置了。再看以下文本: This string has an embedded apostrophe,doesn‘t it?
包含撇号的标记词能够被区分出来。 $ python3 shlex_example.py apostrophe.txt ORIGINAL: "This string has an embedded apostrophe,doesn‘t it?" TOKENS: ‘This‘ ‘string‘ ‘has‘ ‘an‘ ‘embedded‘ ‘apostrophe‘ ‘,‘ "doesn‘t" ‘it‘ ‘?‘
siegel_seele 翻译于?3个月前
?
?0?
重译
由?
woclass?审阅
?
Making Safe Strings for Shells##The? shlex_quote.py import shlex
examples = [ "Embedded‘SingleQuote",‘Embedded"DoubleQuote‘,‘Embedded Space‘,‘~SpecialCharacter‘,r‘Backslash‘,] for s in examples: print(‘ORIGINAL : {}‘.format(s)) print(‘QUOTED : {}‘.format(shlex.quote(s))) print()
It is still usually safer to use a list of arguments when using? $ python3 shlex_quote.py ORIGINAL : Embedded‘SingleQuote QUOTED : ‘Embedded‘"‘"‘SingleQuote‘ ORIGINAL : Embedded"DoubleQuote QUOTED : ‘Embedded"DoubleQuote‘ ORIGINAL : Embedded Space QUOTED : ‘Embedded Space‘ ORIGINAL : ~SpecialCharacter QUOTED : ‘~SpecialCharacter‘ ORIGINAL : Backslash QUOTED : ‘Backslash‘
?
Embedded Comments##Since the parser is intended to be used with command languages,it needs to handle comments. By default,any text following a? $ python3 shlex_example.py comments.txt ORIGINAL: ‘This line is recognized.n# But this line is ignored. nAnd this line is processed.‘ TOKENS: ‘This‘ ‘line‘ ‘is‘ ‘recognized‘ ‘.‘ ‘And‘ ‘this‘ ‘line‘ ‘is‘ ‘processed‘ ‘.‘
?
Splitting Strings into Tokens##To split an existing string into component tokens,the convenience function? shlex_split.py import shlex
text = """This text has "quoted parts" inside it.""" print(‘ORIGINAL: {!r}‘.format(text)) print() print(‘TOKENS:‘) print(shlex.split(text))
The result is a list. $ python3 shlex_split.py ORIGINAL: ‘This text has "quoted parts" inside it.‘ TOKENS: [‘This‘,‘text‘,‘has‘,‘quoted parts‘,‘inside‘,‘it.‘]
?
Including Other Sources of Tokens##The? shlex_source.py import shlex
text = "This text says to source quotes.txt before continuing." print(‘ORIGINAL: {!r}‘.format(text)) print() lexer = shlex.shlex(text) lexer.wordchars += ‘.‘ lexer.source = ‘source‘ print(‘TOKENS:‘) for token in lexer: print(‘{!r}‘.format(token))
The string " $ python3 shlex_source.py ORIGINAL: ‘This text says to source quotes.txt before continuing.‘ TOKENS: ‘This‘ ‘text‘ ‘says‘ ‘to‘ ‘This‘ ‘string‘ ‘has‘ ‘embedded‘ ‘"double quotes"‘ ‘and‘ "‘single quotes‘" ‘in‘ ‘it‘ ‘,‘ ‘and‘ ‘even‘ ‘"a ‘nested example‘"‘ ‘.‘ ‘before‘ ‘continuing.‘
The source feature uses a method called?
?
Controlling the Parser##An earlier example demonstrated changing the? shlex_table.py import shlex
text = """|Col 1||Col 2||Col 3|""" print(‘ORIGINAL: {!r}‘.format(text)) print() lexer = shlex.shlex(text) lexer.quotes = ‘|‘ print(‘TOKENS:‘) for token in lexer: print(‘{!r}‘.format(token))
In this example,each table cell is wrapped in vertical bars. $ python3 shlex_table.py ORIGINAL: ‘|Col 1||Col 2||Col 3|‘ TOKENS: ‘|Col 1|‘ ‘|Col 2|‘ ‘|Col 3|‘
It is also possible to control the whitespace characters used to split words. shlex_whitespace.py import shlex
import sys
if len(sys.argv) != 2: print(‘Please specify one filename on the command line.‘) sys.exit(1) filename = sys.argv[1] with open(filename,‘r‘) as f: body = f.read() print(‘ORIGINAL: {!r}‘.format(body)) print() print(‘TOKENS:‘) lexer = shlex.shlex(body) lexer.whitespace += ‘.,‘ for token in lexer: print(‘{!r}‘.format(token))
If the example in? $ python3 shlex_whitespace.py quotes.txt ORIGINAL: ‘This string has embedded "double quotes" andn‘singl e quotes‘ in it,and even "a ‘nested example‘".n‘ TOKENS: ‘This‘ ‘string‘ ‘has‘ ‘embedded‘ ‘"double quotes"‘ ‘and‘ "‘single quotes‘" ‘in‘ ‘it‘ ‘and‘ ‘even‘ ‘"a ‘nested example‘"‘
?
Error Handling##When the parser encounters the end of its input before all quoted strings are closed,it raises? shlex_errors.py import shlex
text = """This line is ok. This line has an "unfinished quote. This line is ok,too. """ print(‘ORIGINAL: {!r}‘.format(text)) print() lexer = shlex.shlex(text) print(‘TOKENS:‘) try: for token in lexer: print(‘{!r}‘.format(token)) except ValueError as err: first_line_of_error = lexer.token.splitlines()[0] print(‘ERROR: {} {}‘.format(lexer.error_leader(),err)) print(‘following {!r}‘.format(first_line_of_error))
The example produces this output. $ python3 shlex_errors.py ORIGINAL: ‘This line is ok.nThis line has an "unfinished quote. nThis line is ok,too.n‘ TOKENS: ‘This‘ ‘line‘ ‘is‘ ‘ok‘ ‘.‘ ‘This‘ ‘line‘ ‘has‘ ‘an‘ ERROR: "None",line 4: No closing quotation following ‘"unfinished quote.‘
?
POSIX vs. Non-POSIX Parsing##The default behavior for the parser is to use a backwards-compatible style that is not POSIX-compliant. For POSIX behavior,set the? shlex_posix.py import shlex
examples = [ ‘Do"Not"Separate‘,‘"Do"Separate‘,‘Escaped e Character not in quotes‘,‘Escaped "e" Character in double quotes‘,"Escaped ‘e‘ Character in single quotes",r"Escaped ‘‘‘ "‘" single quote",r‘Escaped """ ‘"‘ double quote‘,""‘Strip extra layer of quotes‘"",] for s in examples: print(‘ORIGINAL : {!r}‘.format(s)) print(‘non-POSIX: ‘,end=‘‘) non_posix_lexer = shlex.shlex(s,posix=False) try: print(‘{!r}‘.format(list(non_posix_lexer))) except ValueError as err: print(‘error({})‘.format(err)) print(‘POSIX : ‘,end=‘‘) posix_lexer = shlex.shlex(s,posix=True) try: print(‘{!r}‘.format(list(posix_lexer))) except ValueError as err: print(‘error({})‘.format(err)) print()
Here are a few examples of the differences in parsing behavior. $ python3 shlex_posix.py ORIGINAL : ‘Do"Not"Separate‘ non-POSIX: [‘Do"Not"Separate‘] POSIX : [‘DoNotSeparate‘] ORIGINAL : ‘"Do"Separate‘ non-POSIX: [‘"Do"‘,‘Separate‘] POSIX : [‘DoSeparate‘] ORIGINAL : ‘Escaped e Character not in quotes‘ non-POSIX: [‘Escaped‘,‘‘,‘e‘,‘Character‘,‘not‘,‘in‘,‘quotes‘] POSIX : [‘Escaped‘,‘quotes‘] ORIGINAL : ‘Escaped "e" Character in double quotes‘ non-POSIX: [‘Escaped‘,‘"e"‘,‘double‘,‘e‘,‘quotes‘] ORIGINAL : "Escaped ‘e‘ Character in single quotes" non-POSIX: [‘Escaped‘,"‘e‘",‘single‘,‘quotes‘] ORIGINAL : ‘Escaped ‘‘‘ "‘" single quote‘ non-POSIX: error(No closing quotation) POSIX : [‘Escaped‘,‘ ""‘,‘quote‘] ORIGINAL : ‘Escaped """ ‘"‘ double quote‘ non-POSIX: error(No closing quotation) POSIX : [‘Escaped‘,‘"‘,‘‘"‘‘,‘quote‘] ORIGINAL : ‘"‘Strip extra layer of quotes‘"‘ non-POSIX: [‘"‘Strip extra layer of quotes‘"‘] POSIX : ["‘Strip extra layer of quotes‘"]
?
?
本文章首发在?
PythonCaff
本文中的所有译文仅用于学习和交流目的,转载请务必注明文章译者、出处、和本文链接? (编辑:李大同) 【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容! |