|
|
|
|
||||||
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
Hi,
In message "Re: REXML: ocument could not parse UTF-8 "<name>\302</name>""on Sat, 5 Jan 2008 02:40:00 +0900, "Jesse P." <j.prabawa@gmail.com> writes: |Im working with some UTF-8 data and basically if I run this: | |require 'rexml/document' |data = "<name>\302</name>" |doc = REXML: ocument.new(data)"<name>\302</name>" is not a valid UTF-8 byte sequence. The rest is up to you, after recognizing working on non UTF-8 data. matz. |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
Hi Matz,
Thanks for your . So I guess my problem is this: 1. I get an XML that is declared to be valid UTF-8, but 2. when I process some of the values, as you pointed out, some is not valid UTF-8, and 3. causes a lot of problems when parsed by REXML. For a string of characters (e.g. some xml file), is there anyway I can detect just the non UTF-8 characters and convert them to UTF-8? This way I can make sure what is processed by REXML is valid UTF-8 without unnecessarily processing characters in the string that are already valid UTF-8. Best regards, Jesse On Jan 5, 10:41 pm, Yukihiro Matsumoto <m...@ruby-lang.org> wrote: > Hi, > > In message "Re: REXML: ocument could not parse UTF-8 "<name>\302</name>""> on Sat, 5 Jan 2008 02:40:00 +0900, "Jesse P." <j.prab...@gmail.com> writes: > > |Im working with some UTF-8 data and basically if I run this: > | > |require 'rexml/document' > |data = "<name>\302</name>" > |doc = REXML: ocument.new(data)> > "<name>\302</name>" is not a valid UTF-8 byte sequence. The rest is > up to you, after recognizing working on non UTF-8 data. > > matz. |
|
![]() |
| Outils de la discussion | |
|
|