Re: mod_proxy_html and nested <body> tags
Bill wrote:
> I have encountered a situation where some of the web pages my company
> has contains nested <body> tags.
>
> <BODY>
> .
> .
> .
> <body>
> .
> .
> .
> </body>
> .
> .
> .
> </BODY>
>
> When I run it through mod_proxy_html, it the internal tags look like
> they are getting dropped. Has anyone run into a similar situation, and
> how did you resolve it?
>
Erm, and that's a problem exactly how?
mod_proxy_html inherits much of its parsing, including this, from
libxml2. If you run your pages through xmllint, you'll see the same
thing. If your markup happened to be well-formed XML, you could
suppress libxml2's html corrections by parsing as XML instead of HTML.
Or you could run mod_publisher to give you more control over parse
modes and handling of broken markup.
--
Nick Kew
|