PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Forums Hébergement > Forum Hébergement serveur > comp.info.servers.unix > mod_proxy_html and nested <body> tags
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
comp.info.servers.unix Web servers for UNIX platforms.

mod_proxy_html and nested <body> tags

Réponse
 
LinkBack Outils de la discussion
Vieux 14/03/2005, 20h44   #1
Bill
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut mod_proxy_html and nested <body> tags

I have encountered a situation where some of the web pages my company
has contains nested <body> tags.

<BODY>
  Réponse avec citation
Vieux 14/03/2005, 21h14   #2
David Dorward
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: mod_proxy_html and nested <body> tags

Bill wrote:

> I have encountered a situation where some of the web pages my company
> has contains nested <body> tags.


> When I run it through mod_proxy_html, it the internal tags look like
> they are getting dropped. Has anyone run into a similar situation, and
> how did you resolve it?


I haven't run into a situation like that, but I'd resolve it by ensuring all
the pages being served were valid HTML in the first place.

--
David Dorward <http://blog.dorward.me.uk/> <http://dorward.me.uk/>
Home is where the ~/.bashrc is
  Réponse avec citation
Vieux 14/03/2005, 23h37   #3
Nick Kew
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: mod_proxy_html and nested <body> tags

Bill wrote:
> I have encountered a situation where some of the web pages my company
> has contains nested <body> tags.
>
> <BODY>
> .
> .
> .
> <body>
> .
> .
> .
> </body>
> .
> .
> .
> </BODY>
>
> When I run it through mod_proxy_html, it the internal tags look like
> they are getting dropped. Has anyone run into a similar situation, and
> how did you resolve it?
>


Erm, and that's a problem exactly how?

mod_proxy_html inherits much of its parsing, including this, from
libxml2. If you run your pages through xmllint, you'll see the same
thing. If your markup happened to be well-formed XML, you could
suppress libxml2's html corrections by parsing as XML instead of HTML.
Or you could run mod_publisher to give you more control over parse
modes and handling of broken markup.

--
Nick Kew
  Réponse avec citation
Vieux 15/03/2005, 16h01   #4
Bill
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: mod_proxy_html and nested <body> tags

Here is a snippet from the original HTML file:


<BODY TEXT="000000" BGCOLOR="F8F0D9" BACKGROUND="">

<FORM><SCRIPT LANGUAGE="JavaScript"
SRC="../files/pp.js/$File/pp.js"></script>
<link rel="stylesheet" href="../files/my_style.css/$File/my_style.css"
type="text/css">
<link rel="stylesheet"
href="../files/my_style2.css/$File/my_style2.css" type="text/css">
<TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0>
<TR VALIGN=top><TD WIDTH="360"><div id="specsTitle">SMW</div><br><BODY
onLoad='javascript...


Please note the beginning of a nested <BODY> tag within the <TABLE>
tag. After being run through mod_proxy_html, this is what is being
served to the browser:


<body text="000000" bgcolor="F8F0D9" background="">
<form><script language="JavaScript"
src="../files/pp.js/$File/pp.js"></script>
<link rel="stylesheet" href="../files/my_style.css/$File/my_style.css"
type="text/css">
<link rel="stylesheet"
href="../files/my_style2.css/$File/my_style2.css" type="text/css">
<table border="0" cellspacing="0" cellpadding="0">
<tr valign="top"><td width="360"><div id="specsTitle">SMW</div><br>
onLoad='javascript...


Notice now that the <BODY> tag within the table has been removed. The
onLoad event is being treated as text and is being displayed on the
page as opposed to occurring when the page is loaded. What could cause
this internal <BODY> tag to get dropped?

  Réponse avec citation
Vieux 16/03/2005, 02h59   #5
Tim
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: mod_proxy_html and nested <body> tags

On 15 Mar 2005 08:01:40 -0800,
"Bill" <gardneriv@yahoo.com> posted:

> Notice now that the <BODY> tag within the table has been removed. The
> onLoad event is being treated as text and is being displayed on the
> page as opposed to occurring when the page is loaded. What could cause
> this internal <BODY> tag to get dropped?


I thought that'd already been explained. But hasn't it occured to you that
if the proxy has problems with MALFORMED HTML, then so will some browsers?
There's one solution, and one solution only: Fix up the broken HTML.

--
If you insist on e-mailing me, use the reply-to address (it's real but
temporary). But please reply to the group, like you're supposed to.

This message was sent without a virus, please delete some files yourself.
  Réponse avec citation
Vieux 16/03/2005, 21h25   #6
Nick Kew
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: mod_proxy_html and nested <body> tags

Tim wrote:

> I thought that'd already been explained. But hasn't it occured to you that
> if the proxy has problems with MALFORMED HTML, then so will some browsers?
> There's one solution, and one solution only: Fix up the broken HTML.
>


Entirely right of course, but not the whole story.

A browser expects to work for a single user on a workstation where it
is one of a very few active tasks, and can itself to oodles of
CPU and memory. So it can put a lot of effort into error-correction.

The proxy doesn't have that luxury. It needs to be able to process
thousands of concurrent requests, and cares a lot more about efficiency
than a browser. So it's less forgiving than a typical browser.

mod_publisher offers more options, including expending more resources
on error correction.

--
Nick Kew
  Réponse avec citation
Réponse


Outils de la discussion

Règles de messages
Vous ne pouvez pas créer de nouvelles discussions
Vous ne pouvez pas envoyer des réponses
Vous ne pouvez pas envoyer des pièces jointes
Vous ne pouvez pas modifier vos messages

Les balises BB sont activées : oui
Les smileys sont activés : oui
La balise [IMG] est activée : oui
Le code HTML peut être employé : non
Trackbacks are oui
Pingbacks are oui
Refbacks are oui


Fuseau horaire GMT +1. Il est actuellement 19h46.


Édité par : vBulletin® version 3.7.2
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC5 Tous droits réservés.
Version française #16 par l'association vBulletin francophone
PHWinfo est un site Éducation Sans Frontières
Ad Management by RedTyger
©Tous droits réservés par les parties respectives
Page generated in 0,11271 seconds with 14 queries