|
|
|
|
||||||
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
Hi,
In message "Re: REXML: ocument could not parse UTF-8 "<name>\302</name>""on Sun, 6 Jan 2008 03:00:04 +0900, "Jesse P." <j.prabawa@gmail.com> writes: |Thanks for your . So I guess my problem is this: |1. I get an XML that is declared to be valid UTF-8, but |2. when I process some of the values, as you pointed out, some is not |valid UTF-8, and |3. causes a lot of problems when parsed by REXML. | |For a string of characters (e.g. some xml file), is there anyway I can |detect just the non UTF-8 characters and convert them to UTF-8? I guess you have to define what you want to do with this broken UTF-8 data first. As long as you treat the data as UTF-8, it is impossible to treat it correctly. You can either * fix the data before reading it via REXML * parse data as Latin-1 or some other single byte encoding * replace the broken data with some valid UTF-8 sequence But YMMV. matz. |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
Thanks Matz
![]() On Jan 6, 3:01 am, Yukihiro Matsumoto <m...@ruby-lang.org> wrote: > Hi, > > In message "Re: REXML: ocument could not parse UTF-8 "<name>\302</name>""> on Sun, 6 Jan 2008 03:00:04 +0900, "Jesse P." <j.prab...@gmail.com> writes: > > |Thanks for your . So I guess my problem is this: > |1. I get an XML that is declared to be valid UTF-8, but > |2. when I process some of the values, as you pointed out, some is not > |valid UTF-8, and > |3. causes a lot of problems when parsed by REXML. > | > |For a string of characters (e.g. some xml file), is there anyway I can > |detect just the non UTF-8 characters and convert them to UTF-8? > > I guess you have to define what you want to do with this broken UTF-8 > data first. As long as you treat the data as UTF-8, it is impossible > to treat it correctly. You can either > > * fix the data before reading it via REXML > * parse data as Latin-1 or some other single byte encoding > * replace the broken data with some valid UTF-8 sequence > > But YMMV. > > matz. |
|
![]() |
| Outils de la discussion | |
|
|