|
|
|
|
||||||
| comp.info.servers.unix Web servers for UNIX platforms. |
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
I have encountered a situation where some of the web pages my company
has contains nested <body> tags. <BODY> |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
Bill wrote:
> I have encountered a situation where some of the web pages my company > has contains nested <body> tags. > When I run it through mod_proxy_html, it the internal tags look like > they are getting dropped. Has anyone run into a similar situation, and > how did you resolve it? I haven't run into a situation like that, but I'd resolve it by ensuring all the pages being served were valid HTML in the first place. -- David Dorward <http://blog.dorward.me.uk/> <http://dorward.me.uk/> Home is where the ~/.bashrc is |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
Bill wrote:
> I have encountered a situation where some of the web pages my company > has contains nested <body> tags. > > <BODY> > . > . > . > <body> > . > . > . > </body> > . > . > . > </BODY> > > When I run it through mod_proxy_html, it the internal tags look like > they are getting dropped. Has anyone run into a similar situation, and > how did you resolve it? > Erm, and that's a problem exactly how? mod_proxy_html inherits much of its parsing, including this, from libxml2. If you run your pages through xmllint, you'll see the same thing. If your markup happened to be well-formed XML, you could suppress libxml2's html corrections by parsing as XML instead of HTML. Or you could run mod_publisher to give you more control over parse modes and handling of broken markup. -- Nick Kew |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
Here is a snippet from the original HTML file:
<BODY TEXT="000000" BGCOLOR="F8F0D9" BACKGROUND=""> <FORM><SCRIPT LANGUAGE="JavaScript" SRC="../files/pp.js/$File/pp.js"></script> <link rel="stylesheet" href="../files/my_style.css/$File/my_style.css" type="text/css"> <link rel="stylesheet" href="../files/my_style2.css/$File/my_style2.css" type="text/css"> <TABLE BORDER=0 CELLSPACING=0 CELLPADDING=0> <TR VALIGN=top><TD WIDTH="360"><div id="specsTitle">SMW</div><br><BODY onLoad='javascript... Please note the beginning of a nested <BODY> tag within the <TABLE> tag. After being run through mod_proxy_html, this is what is being served to the browser: <body text="000000" bgcolor="F8F0D9" background=""> <form><script language="JavaScript" src="../files/pp.js/$File/pp.js"></script> <link rel="stylesheet" href="../files/my_style.css/$File/my_style.css" type="text/css"> <link rel="stylesheet" href="../files/my_style2.css/$File/my_style2.css" type="text/css"> <table border="0" cellspacing="0" cellpadding="0"> <tr valign="top"><td width="360"><div id="specsTitle">SMW</div><br> onLoad='javascript... Notice now that the <BODY> tag within the table has been removed. The onLoad event is being treated as text and is being displayed on the page as opposed to occurring when the page is loaded. What could cause this internal <BODY> tag to get dropped? |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
On 15 Mar 2005 08:01:40 -0800,
"Bill" <gardneriv@yahoo.com> posted: > Notice now that the <BODY> tag within the table has been removed. The > onLoad event is being treated as text and is being displayed on the > page as opposed to occurring when the page is loaded. What could cause > this internal <BODY> tag to get dropped? I thought that'd already been explained. But hasn't it occured to you that if the proxy has problems with MALFORMED HTML, then so will some browsers? There's one solution, and one solution only: Fix up the broken HTML. -- If you insist on e-mailing me, use the reply-to address (it's real but temporary). But please reply to the group, like you're supposed to. This message was sent without a virus, please delete some files yourself. |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
Tim wrote:
> I thought that'd already been explained. But hasn't it occured to you that > if the proxy has problems with MALFORMED HTML, then so will some browsers? > There's one solution, and one solution only: Fix up the broken HTML. > Entirely right of course, but not the whole story. A browser expects to work for a single user on a workstation where it is one of a very few active tasks, and can itself to oodles of CPU and memory. So it can put a lot of effort into error-correction. The proxy doesn't have that luxury. It needs to be able to process thousands of concurrent requests, and cares a lot more about efficiency than a browser. So it's less forgiving than a typical browser. mod_publisher offers more options, including expending more resources on error correction. -- Nick Kew |
|
![]() |
| Outils de la discussion | |
|
|