问题 如何在Ruby中将BOM标记写入文件

我有一些带拐杖的工作代码，可以将BOM标记添加到新文件中。

  #writing
  File.open name, 'w', 0644 do |file|
    file.write "\uFEFF"
    file.write @data
  end

  #reading
  File.open name, 'r:bom|utf-8' do |file|
    file.read
  end

有没有办法自动添加标记而不会写出神秘的内容 "\uFEFF" 在数据之前？就像是 File.open name, 'w:bom' # this mode has no effect 也许？

唉，我认为你的手动方法是要走的路，至少我不知道更好的方法：

http://blog.grayproductions.net/articles/miscellaneous_m17n_details

引用JEG2的文章：

Ruby 1.9不会自动为您的数据添加BOM，因此您就可以了如果你想要的话需要照顾好。幸运的是，它不是强硬。基本思路就是打印所需的字节数文件的开头。

****这个答案导致一个新的宝石： file_with_bom ****

我过去遇到过类似的问题而且我延长了 File.open 使用其他编码变体 w-模式：

class File
  BOM_LIST_hex = {
      Encoding::UTF_8      => "\xEF\xBB\xBF", #"\uEFBBBF"
      Encoding::UTF_16BE => "\xFE\xFF", #"\uFEFF",
      Encoding::UTF_16LE => "\xFF\xFE",
      Encoding::UTF_32BE => "\x00\x00\xFE\xFF",
      Encoding::UTF_32LE => "\xFE\xFF\x00\x00",
    }
  BOM_LIST_hex.freeze
  def utf_bom_hex(encoding = external_encoding)
    BOM_LIST_hex[encoding]
  end

class << self
  alias :open_old :open
  def open(filename, mode_string = 'r', options = {}, &block)
    #check for bom-flag in mode_string
    options[:bom] = true if mode_string.sub!(/-bom/i,'')

    f = open_old(filename, mode_string, options)
    if options[:bom]
      case mode_string
        #r|bom already standard since 1.9.2
        when /\Ar/   #read mode -> remove BOM
          #remove BOM
          bom = f.read(f.utf_bom_hex.bytesize) 
          #check, if it was really a bom
          if bom != f.utf_bom_hex.force_encoding(bom.encoding)
            f.rewind  #return to position 0 if BOM was no BOM
          end
        when /\Aw/  #write mode -> attach BOM
          f = open_old(filename, mode_string, options)
          f << f.utf_bom_hex.force_encoding(f.external_encoding)
        end #mode_string
    end

    if block_given?
      yield f 
      f.close
    end
  end
  end
end #File

Testcode：

EXAMPLE_TEXT = 'some content öäü'
File.open("file_utf16le.txt", "w:utf-16le|bom"){|f| f << EXAMPLE_TEXT }
File.open("file_utf16le.txt", "r:utf-16le|bom:utf-8"){|f| p f.read }
File.open("file_utf16le.txt", "r:utf-16le:utf-8",  :bom => true ){|f| p f.read }
File.open("file_utf16le.txt", "r:utf-16le:utf-8"){|f| p f.read }

File.open("file_utf8.txt", "w:utf-8", :bom => true ){|f| f << EXAMPLE_TEXT }
File.open("file_utf8.txt", "r:utf-8", :bom => true ){|f| p f.read }
File.open("file_utf8.txt", "r:utf-8|bom",              ){|f| p f.read }
File.open("file_utf8.txt", "r:utf-8",                     ){|f| p f.read }

一些评论：

代码来自1.9倍（但它仍然有效）。
我用了 -bom 作为一个bom指标（ruby 1.9使用 |bom。

一些需要修复的更好：

使用 |bom 代替 -bom
使用标准 r|bom 阅读
使它成为ruby 1.8和1.9

也许明天我会找一些时间来重构我的代码并将其作为宝石提供。

请不要使用UTF-8的BOM！它是既不需要也不推荐由Unicode Consortium提供。 - tchrist

感谢您的指导，掌握。我使用不同的方法来解决问题并修改我的模板引擎以尊重Encoding.default_external。 - ujifgc

“不，无论Unicode文本如何转换，BOM都可以用作签名：UTF-16，UTF-8或UTF-32。” unicode.org/faq/utf_bom.html - Jan

除非您在Windows世界中，否则他们会将BOM应用于ascii文件以识别为UTF-8 - jtruelove

@tchrist实际上在某些情况下，Unicode Consortium会推荐BOM。看到 unicode.org/faq/utf_bom.html#bom10 情况如下：1。符合某些协议（例如Microsoft .txt文件）; 2.在允许它的协议中指定文本流的编码或字节序，否则将不清楚。 - Dave Burt

您不希望在UTF-8文件中使用BOM！ - tchrist

@tchrist虽然你是对的，OP显然也是如此。 - Michael Kohl

你能解释一下为什么我们不想这样做吗？ :) @tchrist - fab

@fab因为utf8不需要它。使用utf16，bom表示文件是big endian还是little endian，有一个bom为一个和bom为另一个和bom是强制性的，但是utf8没有两个变种boms，bom是不必要的 stackoverflow.com/questions/2223882/... - barlop

感谢您的建议，但对于我的项目来说太过分了。 - ujifgc

为宝石+1。我想你可以联系Ruby或Rubinius的家伙，并将其合并为官方发行版。 - Aleksander Pohl

@AleksanderPohl宝石出版 rubygems.org/gems/file_with_bom 我希望我没有疏忽。 - knut

问题如何在Ruby中将BOM标记写入文件

答案:

热门问题

问题 如何在Ruby中将BOM标记写入文件

答案:

热门问题

问题如何在Ruby中将BOM标记写入文件