PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Autres forums > Forum Programmation & Conception > comp.lang.ruby > Re: REXML::Document could not parse UTF-8 "<name>\302</name>"
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
Re: REXML::Document could not parse UTF-8 "<name>\302</name>"

Réponse
 
LinkBack Outils de la discussion
Vieux 05/01/2008, 19h01   #1
Yukihiro Matsumoto
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: REXML::Document could not parse UTF-8 "<name>\302</name>"

Hi,

In message "Re: REXML:ocument could not parse UTF-8 "<name>\302</name>""
on Sun, 6 Jan 2008 03:00:04 +0900, "Jesse P." <j.prabawa@gmail.com> writes:

|Thanks for your . So I guess my problem is this:
|1. I get an XML that is declared to be valid UTF-8, but
|2. when I process some of the values, as you pointed out, some is not
|valid UTF-8, and
|3. causes a lot of problems when parsed by REXML.
|
|For a string of characters (e.g. some xml file), is there anyway I can
|detect just the non UTF-8 characters and convert them to UTF-8?

I guess you have to define what you want to do with this broken UTF-8
data first. As long as you treat the data as UTF-8, it is impossible
to treat it correctly. You can either

* fix the data before reading it via REXML
* parse data as Latin-1 or some other single byte encoding
* replace the broken data with some valid UTF-8 sequence

But YMMV.

matz.

  Réponse avec citation
Vieux 06/01/2008, 13h29   #2
Jesse P.
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: REXML::Document could not parse UTF-8 "<name>\302</name>"

Thanks Matz

On Jan 6, 3:01 am, Yukihiro Matsumoto <m...@ruby-lang.org> wrote:
> Hi,
>
> In message "Re: REXML:ocument could not parse UTF-8 "<name>\302</name>""
> on Sun, 6 Jan 2008 03:00:04 +0900, "Jesse P." <j.prab...@gmail.com> writes:
>
> |Thanks for your . So I guess my problem is this:
> |1. I get an XML that is declared to be valid UTF-8, but
> |2. when I process some of the values, as you pointed out, some is not
> |valid UTF-8, and
> |3. causes a lot of problems when parsed by REXML.
> |
> |For a string of characters (e.g. some xml file), is there anyway I can
> |detect just the non UTF-8 characters and convert them to UTF-8?
>
> I guess you have to define what you want to do with this broken UTF-8
> data first. As long as you treat the data as UTF-8, it is impossible
> to treat it correctly. You can either
>
> * fix the data before reading it via REXML
> * parse data as Latin-1 or some other single byte encoding
> * replace the broken data with some valid UTF-8 sequence
>
> But YMMV.
>
> matz.


  Réponse avec citation
Réponse


Outils de la discussion

Règles de messages
Vous ne pouvez pas créer de nouvelles discussions
Vous ne pouvez pas envoyer des réponses
Vous ne pouvez pas envoyer des pièces jointes
Vous ne pouvez pas modifier vos messages

Les balises BB sont activées : oui
Les smileys sont activés : oui
La balise [IMG] est activée : oui
Le code HTML peut être employé : non
Trackbacks are oui
Pingbacks are oui
Refbacks are oui


Fuseau horaire GMT +1. Il est actuellement 13h02.


Édité par : vBulletin® version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC5 Tous droits réservés.
Version française #16 par l'association vBulletin francophone
PHWinfo est un site Éducation Sans Frontières
Ad Management by RedTyger
©Tous droits réservés par les parties respectives
Page generated in 0,09828 seconds with 10 queries