PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Autres forums > Forum Programmation & Conception > comp.lang.php > Can't parse  -- Is this a bug in the SAX xml parser? Or am I doing something wrong?
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
Can't parse  -- Is this a bug in the SAX xml parser? Or am I doing something wrong?

Réponse
 
LinkBack Outils de la discussion
Vieux 11/11/2007, 02h05   #1
Joshua Beall
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Can't parse  -- Is this a bug in the SAX xml parser? Or am I doing something wrong?

Hi All,

Consider the following test code:

$xml = "<DATA> </DATA>";
$parser = xml_parser_create();
$result = xml_parse($parser,$xml);
$errorcode = xml_get_error_code($parser);
$errormsg = xml_error_string($errorcode);
$ln = xml_get_current_line_number($parser);
$cn = xml_get_current_column_number($parser);

var_dump($result);
echo "Error parsing XML document, '$errormsg' : Line $ln, Column $cn";




This will output:

int(0)
Error parsing XML document, 'Invalid character' : Line 1, Column 12



A return code of int(0) indicates failure. If you replace with
, it works with no error.

I don't understand why it's failing on -- isn't it perfectly
valid to include that numeric entity in the text content of an XML
node? Is this a bug? Or am I doing something wrong?

  Réponse avec citation
Vieux 11/11/2007, 02h41   #2
Joshua Beall
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Can't parse &#11; -- Is this a bug in the SAX xml parser? Or am I doing something wrong?

Sorry I neglected to say -- I'm running PHP 5.2.4

  Réponse avec citation
Vieux 11/11/2007, 07h27   #3
John Dunlop
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Can't parse &#11; -- Is this a bug in the SAX xml parser? Or am I doing something wrong?

Joshua Beall:

> $xml = "<DATA> </DATA>";


The character referred to by that character reference does not match
the production Char.

http://www.w3.org/TR/REC-xml/#charsets

--
Jock

  Réponse avec citation
Vieux 11/11/2007, 12h46   #4
Joshua Beall
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Can't parse &#11; -- Is this a bug in the SAX xml parser? Or am I doing something wrong?

On Nov 11, 2:27 am, John Dunlop <j...@dunlop.name> wrote:
> Joshua Beall:
>
> > $xml = "<DATA> </DATA>";

>
> The character referred to by that character reference does not match
> the production Char.
>
> http://www.w3.org/TR/REC-xml/#charsets
>
> --
> Jock


I'm not sure I completely follow; doesn't the fact that character 0x0B
(decimal 11) is outside the range of acceptable characters simply mean
that you have to encode it as ? If not, then how should I encode
the vertical tab character (character code 0x0B) to put it in an XML
document?

-Josh

  Réponse avec citation
Vieux 11/11/2007, 16h09   #5
John Dunlop
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Can't parse &#11; -- Is this a bug in the SAX xml parser? Or am I doing something wrong?

Joshua Beall:

> I'm not sure I completely follow; doesn't the fact that character 0x0B
> (decimal 11) is outside the range of acceptable characters simply mean
> that you have to encode it as ?


No, the character referred to by a character reference must match the
production Char.

> If not, then how should I encode the vertical tab character (character
> code 0x0B) to put it in an XML document?


XML1.1, although the spec discourages using that character.

--
Jock

  Réponse avec citation
Vieux 11/11/2007, 18h54   #6
Joshua Beall
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Can't parse &#11; -- Is this a bug in the SAX xml parser? Or am I doing something wrong?

John Dunlop wrote:
> > If not, then how should I encode the vertical tab character (character
> > code 0x0B) to put it in an XML document?

>
> XML1.1, although the spec discourages using that character.


So there's no way to do this in XML 1.0?

Let me give a more complete description of my circumstance, and if you
have any suggestions on what I should do, I'd be grateful.

I'm trying to parse an XML export of a FileMaker Pro 8 database. FMP
is putting raw vertical tabs in the output at various places.

Before passing the XML document to the PHP SAX parser, I'm checking
through the document for any characters with a character code below 32
and trying to encode them. I didn't realize this was the wrong way to
go about this.

Now I can't change the fact that for some reason FMP is sometimes
going to be spitting out these vertical tab characters; apparently it
internally uses vertical tabs for something.

So is there any way to parse this document? Or the only thing I can
do is strip out any illegal characters that can't be encoded before I
parse it?

-Josh

  Réponse avec citation
Vieux 11/11/2007, 19h24   #7
John Dunlop
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Can't parse &#11; -- Is this a bug in the SAX xml parser? Or am I doing something wrong?

Joshua Beall:

> So there's no way to do this in XML 1.0?


No, short of such ad hockery as custom elements or PIs that pass the
character by reference to the application.

--
Jock

  Réponse avec citation
Vieux 11/11/2007, 20h08   #8
Joshua Beall
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Can't parse &#11; -- Is this a bug in the SAX xml parser? Or am I doing something wrong?

On Nov 11, 2:24 pm, John Dunlop <j...@dunlop.name> wrote:
> Joshua Beall:
>
> > So there's no way to do this in XML 1.0?

>
> No, short of such ad hockery as custom elements or PIs that pass the
> character by reference to the application.


What do you mean by "pass the character by reference"?

I could change the XML prologue before feeding it to the parser,
changing the version to 1.1 -- but could this cause other problems?

  Réponse avec citation
Vieux 11/11/2007, 20h55   #9
John Dunlop
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Can't parse &#11; -- Is this a bug in the SAX xml parser? Or am I doing something wrong?

Joshua Beall:

> What do you mean by "pass the character by reference"?


E.g., <char codepoint="U+000B"/>

--
Jock

  Réponse avec citation
Vieux 11/11/2007, 21h26   #10
Joshua Beall
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Can't parse &#11; -- Is this a bug in the SAX xml parser? Or am I doing something wrong?

On Nov 11, 3:55 pm, John Dunlop <j...@dunlop.name> wrote:
> Joshua Beall:
>
> > What do you mean by "pass the character by reference"?

>
> E.g., <char codepoint="U+000B"/>


Ah, I see -- and then I'd have to parse manually out that value later
on.

What do you think of simply changing the XML prologue to specify XML
1.1?

-Josh

  Réponse avec citation
Réponse


Outils de la discussion

Règles de messages
Vous ne pouvez pas créer de nouvelles discussions
Vous ne pouvez pas envoyer des réponses
Vous ne pouvez pas envoyer des pièces jointes
Vous ne pouvez pas modifier vos messages

Les balises BB sont activées : oui
Les smileys sont activés : oui
La balise [IMG] est activée : oui
Le code HTML peut être employé : non
Trackbacks are oui
Pingbacks are oui
Refbacks are oui


Fuseau horaire GMT +1. Il est actuellement 06h26.


Édité par : vBulletin® version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC5 Tous droits réservés.
Version française #16 par l'association vBulletin francophone
PHWinfo est un site Éducation Sans Frontières ©2000-2008
Ad Management by RedTyger
©Tous droits réservés par les parties respectives
Page generated in 0,15903 seconds with 18 queries