|
|
|
|
||||||
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
Hi All,
Consider the following test code: $xml = "<DATA></DATA>"; $parser = xml_parser_create(); $result = xml_parse($parser,$xml); $errorcode = xml_get_error_code($parser); $errormsg = xml_error_string($errorcode); $ln = xml_get_current_line_number($parser); $cn = xml_get_current_column_number($parser); var_dump($result); echo "Error parsing XML document, '$errormsg' : Line $ln, Column $cn"; This will output: int(0) Error parsing XML document, 'Invalid character' : Line 1, Column 12 A return code of int(0) indicates failure. If you replace with , it works with no error. I don't understand why it's failing on -- isn't it perfectly valid to include that numeric entity in the text content of an XML node? Is this a bug? Or am I doing something wrong? |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
Sorry I neglected to say -- I'm running PHP 5.2.4
|
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
Joshua Beall:
> $xml = "<DATA></DATA>"; The character referred to by that character reference does not match the production Char. http://www.w3.org/TR/REC-xml/#charsets -- Jock |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
On Nov 11, 2:27 am, John Dunlop <j...@dunlop.name> wrote:
> Joshua Beall: > > > $xml = "<DATA></DATA>"; > > The character referred to by that character reference does not match > the production Char. > > http://www.w3.org/TR/REC-xml/#charsets > > -- > Jock I'm not sure I completely follow; doesn't the fact that character 0x0B (decimal 11) is outside the range of acceptable characters simply mean that you have to encode it as ? If not, then how should I encode the vertical tab character (character code 0x0B) to put it in an XML document? -Josh |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
Joshua Beall:
> I'm not sure I completely follow; doesn't the fact that character 0x0B > (decimal 11) is outside the range of acceptable characters simply mean > that you have to encode it as ? No, the character referred to by a character reference must match the production Char. > If not, then how should I encode the vertical tab character (character > code 0x0B) to put it in an XML document? XML1.1, although the spec discourages using that character. -- Jock |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
John Dunlop wrote:
> > If not, then how should I encode the vertical tab character (character > > code 0x0B) to put it in an XML document? > > XML1.1, although the spec discourages using that character. So there's no way to do this in XML 1.0? Let me give a more complete description of my circumstance, and if you have any suggestions on what I should do, I'd be grateful. I'm trying to parse an XML export of a FileMaker Pro 8 database. FMP is putting raw vertical tabs in the output at various places. Before passing the XML document to the PHP SAX parser, I'm checking through the document for any characters with a character code below 32 and trying to encode them. I didn't realize this was the wrong way to go about this. Now I can't change the fact that for some reason FMP is sometimes going to be spitting out these vertical tab characters; apparently it internally uses vertical tabs for something. So is there any way to parse this document? Or the only thing I can do is strip out any illegal characters that can't be encoded before I parse it? -Josh |
|
|
|
#7 |
|
Messages: n/a
Hébergeur: |
Joshua Beall:
> So there's no way to do this in XML 1.0? No, short of such ad hockery as custom elements or PIs that pass the character by reference to the application. -- Jock |
|
|
|
#8 |
|
Messages: n/a
Hébergeur: |
On Nov 11, 2:24 pm, John Dunlop <j...@dunlop.name> wrote:
> Joshua Beall: > > > So there's no way to do this in XML 1.0? > > No, short of such ad hockery as custom elements or PIs that pass the > character by reference to the application. What do you mean by "pass the character by reference"? I could change the XML prologue before feeding it to the parser, changing the version to 1.1 -- but could this cause other problems? |
|
|
|
#9 |
|
Messages: n/a
Hébergeur: |
Joshua Beall:
> What do you mean by "pass the character by reference"? E.g., <char codepoint="U+000B"/> -- Jock |
|
|
|
#10 |
|
Messages: n/a
Hébergeur: |
On Nov 11, 3:55 pm, John Dunlop <j...@dunlop.name> wrote:
> Joshua Beall: > > > What do you mean by "pass the character by reference"? > > E.g., <char codepoint="U+000B"/> Ah, I see -- and then I'd have to parse manually out that value later on. What do you think of simply changing the XML prologue to specify XML 1.1? -Josh |
|
![]() |
| Outils de la discussion | |
|
|