PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Autres forums > Forum Programmation & Conception > alt.www.webmaster > Strange encoding issue
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
Strange encoding issue

Réponse
 
LinkBack Outils de la discussion
Vieux 25/10/2007, 14h30   #1
Dylan Parry
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Strange encoding issue

Hi folks,

I'm having a bit of a problem with character encoding. For some reason I
am getting things like "»" and "©" appearing on a new site I am
building. The pages are being served up as UTF-8, and were created/saved
as UTF-8 in MS Expression Web.

Strangely, the problem only manifests on /some/ pages but not others.
All were created and served in the same way.

The problem occurs in all of Firefox/IE7/Opera/Safari, so it's not a
browser issue. All browsers are detecting the documents as UTF-8 and
displaying them as such. Manually overriding the character encoding
doesn't fix the problem, and in some cases makes things worse - for
example, I thought it could have been ISO-8859-1 being incorrectly
served as UTF-8, but changing to ISO-8859-1 causes text such as "»"
and "©" to appear instead.

If I open up the pages in Notepad, the code appears exactly how it
should, ie. "»" or "©" with no other characters. If I then save the file
without actually making any changes, then it works fine and the browser
once again shows the document as intended.

Any ideas?

--
Dylan Parry
http://electricfreedom.org | http://webpageworkshop.co.uk

The opinions stated above are not necessarily representative of
those of my cats. All opinions expressed are entirely your own.
  Réponse avec citation
Vieux 25/10/2007, 15h01   #2
Dylan Parry
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Strange encoding issue

Dylan Parry wrote:

[...]
> The pages are being served up as UTF-8, and were created/saved
> as UTF-8 in MS Expression Web.

[...]

I've narrowed down the problem to where it occurs. I've noticed that I
only get this issue in files that have been affected by a global find
and replace operation, ie. find and replace in all files within a project.

So at least I now know what causes it, but it would be nice to be able
to use find and replace without it screwing up my site

--
Dylan Parry
http://electricfreedom.org | http://webpageworkshop.co.uk

The opinions stated above are not necessarily representative of
those of my cats. All opinions expressed are entirely your own.
  Réponse avec citation
Vieux 25/10/2007, 15h05   #3
Christoph Schneegans
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Strange encoding issue

Dylan Parry wrote:

> I'm having a bit of a problem with character encoding. For some reason I
> am getting things like "»" and "©" appearing on a new site I am
> building. The pages are being served up as UTF-8, and were created/saved
> as UTF-8 in MS Expression Web.


Post the URL, please. Do you use ASP.NET? I think it is possible to
misconfigure the web.config file so that ASP.NET reads your UTF-8 encoded
..aspx files as ISO-8859-1.

My next guess would be include files that use a different encoding than the
including page.

> If I open up the pages in Notepad, the code appears exactly how it
> should, ie. "»" or "©" with no other characters. If I then save the file
> without actually making any changes, then it works fine and the browser
> once again shows the document as intended.


Which encoding does Notepad assume? Just open the file, call "File > Save
as..." and check the value of the "Encoding" combobox.

Notepad and xWeb both store UTF-8 files with a byte-order mark. However, one
notable difference is the handling of invalid UTF-8 sequences when loading
a file: Notepad just throws these bytes away, while xWeb tries to preserve
them. Thus, I think it is possible that when you open a UTF-8 encoded file
with some invalid bytes in xWeb and then save it, the invalid bytes might
still be there. On the other hand, UTF-8 encoded files saved from within
Notepad should never contain invalid sequences.

Are you 100 percent sure that your files are perfectly valid UTF-8? If your
files are XHTML, you can temporarily reame them to .xml and then open them
in IE.

--
<http://schneegans.de/lv/> · RFC 4646 compliant language tag validator

  Réponse avec citation
Vieux 25/10/2007, 15h57   #4
Dylan Parry
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Strange encoding issue

Christoph Schneegans wrote:

> Post the URL, please. Do you use ASP.NET? I think it is possible to
> misconfigure the web.config file so that ASP.NET reads your UTF-8 encoded
> .aspx files as ISO-8859-1.


I would normally upload the files, but I can't do so this time as I'm
using ASP.NET and don't currently have access to a suitable server other
than the dev one, which isn't publicly visible. I've checked the
web.config, and nothing in there should be causing this.

> My next guess would be include files that use a different encoding than the
> including page.


It's not that. I've got a couple of included files, but they're
definitely in UTF-8, and aren't the files that contain the affected
characters either.

>> If I open up the pages in Notepad, the code appears exactly how it
>> should, ie. "»" or "©" with no other characters. If I then save the file
>> without actually making any changes, then it works fine and the browser
>> once again shows the document as intended.

>
> Which encoding does Notepad assume? Just open the file, call "File > Save
> as..." and check the value of the "Encoding" combobox.


It shows as UTF-8.

> Notepad and xWeb both store UTF-8 files with a byte-order mark. However, one
> notable difference is the handling of invalid UTF-8 sequences when loading
> a file: Notepad just throws these bytes away, while xWeb tries to preserve
> them. Thus, I think it is possible that when you open a UTF-8 encoded file
> with some invalid bytes in xWeb and then save it, the invalid bytes might
> still be there. On the other hand, UTF-8 encoded files saved from within
> Notepad should never contain invalid sequences.


Ah, that does begin to cast some light on the issue...

> Are you 100 percent sure that your files are perfectly valid UTF-8? If your
> files are XHTML, you can temporarily reame them to .xml and then open them
> in IE.


Now I am not that sure. As I've mentioned in a follow-up post, it's only
occurring in those files affected by a global find and replace - so it
would seem that xWeb is corrupting these files whenever I do one. FWIW,
it works perfectly fine if I manually edit these files in xWeb, so it
would seem that the method xWeb uses to open/edit/save files
automatically in the global find and replace is what it causing this to
happen.

--
Dylan Parry
http://electricfreedom.org | http://webpageworkshop.co.uk

The opinions stated above are not necessarily representative of
those of my cats. All opinions expressed are entirely your own.
  Réponse avec citation
Vieux 25/10/2007, 17h41   #5
Andy Dingley
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Strange encoding issue

On 25 Oct, 14:30, Dylan Parry <use...@dylanparry.com> wrote:
> Hi folks,
>
> I'm having a bit of a problem with character encoding. For some reason I
> am getting things like "»" and "©" appearing on a new site I am
> building. The pages are being served up as UTF-8, and were created/saved
> as UTF-8 in MS Expression Web.


Are you _sure_ they're served as UTF-8 ? Checked the HTTP header and
the browser's own metadata, not just a <meat> element in the header?

Errors of that sort (Accented-A "Â" as a prefix character) are
indicative of UTF-8 content that has been handled as non-UTF-8. Most
likely this happens right at the last moment, when your browser
receives it by HTTP.

Alternatively, something earlier on in the editing process has loaded
them as non-UTF-8, mangled them, then saved them back again as
something that's clearly and obviously non-UTF-8. This is hard to do!
It's hard to actually label a saved files as "not UTF-8". Even if a
broken old 8-bit ANSI editor where to open up UTF-8 and save it again,
so long as it doesn't change these octets (it doesn't have to
understand them), then the file will still remain as valid UTF-8.
That's why, dollars to doughnuts, it's happening at the very last
moment rather than in the previous edit process.

  Réponse avec citation
Vieux 25/10/2007, 17h44   #6
Andy Dingley
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Strange encoding issue

On 25 Oct, 15:01, Dylan Parry <use...@dylanparry.com> wrote:
> I've narrowed down the problem to where it occurs. I've noticed that I
> only get this issue in files that have been affected by a global find
> and replace operation, ie. find and replace in all files within a project.


Is the content preceding the obviously broken characters non-ASCII?
To get this error it's usually necessary to inject some non-ASCII
characters before the well-formed UTF-8, then have them saved with an
ISO-8859-* encoding during storage. On receipt, the final user agent
sees the ISO-8859-* characters first and thus treats the document as
not being well-formed UTF-8.

  Réponse avec citation
Vieux 25/10/2007, 18h01   #7
Dylan Parry
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Strange encoding issue

Andy Dingley wrote:

> Are you _sure_ they're served as UTF-8 ? Checked the HTTP header and
> the browser's own metadata, not just a <meat> element in the header?


Yes. Firefox's "page info" shows the document to be UTF-8, and also in
the drop-down menu for character encoding it's shown as UTF-8. There
isn't a meta element in the head in the offending documents, so nowhere
to get confused. (Incidentally, I think the <meat> element is invalid <g>)

> Errors of that sort (Accented-A "Â" as a prefix character) are
> indicative of UTF-8 content that has been handled as non-UTF-8. Most
> likely this happens right at the last moment, when your browser
> receives it by HTTP.
>
> Alternatively, something earlier on in the editing process has loaded
> them as non-UTF-8, mangled them, then saved them back again as
> something that's clearly and obviously non-UTF-8. This is hard to do!


Heh - I'm pretty sure that's what is happening though. I think it's
likely a bug in the find and replace with xWeb. Having avoided using it
since noticing this problem, no further occurrences have been noted.

--
Dylan Parry
http://electricfreedom.org | http://webpageworkshop.co.uk

The opinions stated above are not necessarily representative of
those of my cats. All opinions expressed are entirely your own.
  Réponse avec citation
Vieux 25/10/2007, 18h05   #8
Chaddy2222
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Strange encoding issue


Dylan Parry wrote:
> Christoph Schneegans wrote:
>
> > Post the URL, please. Do you use ASP.NET? I think it is possible to
> > misconfigure the web.config file so that ASP.NET reads your UTF-8 encoded
> > .aspx files as ISO-8859-1.

>
> I would normally upload the files, but I can't do so this time as I'm
> using ASP.NET and don't currently have access to a suitable server other
> than the dev one, which isn't publicly visible. I've checked the
> web.config, and nothing in there should be causing this.
>
> > My next guess would be include files that use a different encoding thanthe
> > including page.

>
> It's not that. I've got a couple of included files, but they're
> definitely in UTF-8, and aren't the files that contain the affected
> characters either.
>
> >> If I open up the pages in Notepad, the code appears exactly how it
> >> should, ie. "»" or "©" with no other characters. If I then save the file
> >> without actually making any changes, then it works fine and the browser
> >> once again shows the document as intended.

> >
> > Which encoding does Notepad assume? Just open the file, call "File > Save
> > as..." and check the value of the "Encoding" combobox.

>
> It shows as UTF-8.
>
> > Notepad and xWeb both store UTF-8 files with a byte-order mark. However, one
> > notable difference is the handling of invalid UTF-8 sequences when loading
> > a file: Notepad just throws these bytes away, while xWeb tries to preserve
> > them. Thus, I think it is possible that when you open a UTF-8 encoded file
> > with some invalid bytes in xWeb and then save it, the invalid bytes might
> > still be there. On the other hand, UTF-8 encoded files saved from within
> > Notepad should never contain invalid sequences.

>
> Ah, that does begin to cast some light on the issue...
>
> > Are you 100 percent sure that your files are perfectly valid UTF-8? If your
> > files are XHTML, you can temporarily reame them to .xml and then open them
> > in IE.

>
> Now I am not that sure. As I've mentioned in a follow-up post, it's only
> occurring in those files affected by a global find and replace - so it
> would seem that xWeb is corrupting these files whenever I do one. FWIW,
> it works perfectly fine if I manually edit these files in xWeb, so it
> would seem that the method xWeb uses to open/edit/save files
> automatically in the global find and replace is what it causing this to
> happen.
>

Yes, it sounds like an MSEW bug to me.
--
Regards Chad. http://freewebdesign.awardspace.biz

  Réponse avec citation
Vieux 25/10/2007, 18h08   #9
Dylan Parry
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Strange encoding issue

Andy Dingley wrote:

> Is the content preceding the obviously broken characters non-ASCII?


No, it's all ASCII text (at least UTF-8 within the ASCII block) up to
the character that goes "wrong".

--
Dylan Parry
http://electricfreedom.org | http://webpageworkshop.co.uk

The opinions stated above are not necessarily representative of
those of my cats. All opinions expressed are entirely your own.
  Réponse avec citation
Réponse


Outils de la discussion

Règles de messages
Vous ne pouvez pas créer de nouvelles discussions
Vous ne pouvez pas envoyer des réponses
Vous ne pouvez pas envoyer des pièces jointes
Vous ne pouvez pas modifier vos messages

Les balises BB sont activées : oui
Les smileys sont activés : oui
La balise [IMG] est activée : oui
Le code HTML peut être employé : non
Trackbacks are oui
Pingbacks are oui
Refbacks are oui


Fuseau horaire GMT +1. Il est actuellement 04h11.


Édité par : vBulletin® version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC5 Tous droits réservés.
Version française #16 par l'association vBulletin francophone
PHWinfo est un site Éducation Sans Frontières ©2000-2008
Ad Management by RedTyger
©Tous droits réservés par les parties respectives
Page generated in 0,19139 seconds with 17 queries