问题 shlex.split还不支持unicode？

根据文档，在Python 2.7.3中，shlex应该支持UNICODE。但是，当运行下面的代码时，我得到： UnicodeEncodeError: 'ascii' codec can't encode characters in position 184-189: ordinal not in range(128)

难道我做错了什么？

import shlex

command_full = u'software.py -fileA="sequence.fasta" -fileB="新建文本文档.fasta.txt" -output_dir="..." -FORMtitle="tst"'

shlex.split(command_full)

确切的错误如下：

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shlex.py", line 275, in split
    lex = shlex(s, posix=posix)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/shlex.py", line 25, in __init__
    instream = StringIO(instream)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 44-49: ordinal not in range(128)

这是我的mac使用python从macports输出的。我在使用“native”python 2.7.3的Ubuntu机器上得到了完全相同的错误。

10352

2018-01-08 15:57

起源

它不支持 unicode() 目的;它甚至在使用时也无法处理除ASCII字符之外的任何内容 unicode() 现在反对。 - Martijn Pieters♦

@MartijnPieters是一个bug还是预期的行为？我找不到任何文件限制的参考 - petr

已知的错误，我会说，看到这个问题。 - Martijn Pieters♦

答案:

该 shlex.split() 代码包装两者 unicode() 和 str() 一个实例 StringIO() 对象，只能处理Latin-1字节（因此不是完整的unicode代码点范围）。

如果你还想使用，你必须编码（UTF-8应该工作） shlex.split();模块的维护者意味着 unicode() 现在支持对象，而不是Latin-1范围的代码点之外的任何对象。

编码，分割，解码给了我：

>>> map(lambda s: s.decode('UTF8'), shlex.split(command_full.encode('utf8')))
[u'software.py', u'-fileA=sequence.fasta', u'-fileB=\u65b0\u5efa\u6587\u672c\u6587\u6863.fasta.txt', u'-output_dir=...', u'-FORMtitle=tst']

一个现在关闭了Python问题试图解决这个问题，但模块是面向字节流的，并没有实现新的补丁。现在使用 iso-8859-1 要么 UTF-8 编码是我能想到的最好的。

2018-01-08 16:07

实际上已有超过五年的补丁。去年我厌倦了在每个项目中复制ushlex并把它放在PyPI上：

https://pypi.python.org/pypi/ushlex/

2018-05-13 09:02

问题 shlex.split还不支持unicode？

答案:

热门问题