|
|
|
#1 |
|
Messages: n/a
Hébergeur: |
Hi folks,
I'm having a bit of a problem with character encoding. For some reason I am getting things like "»" and "©" appearing on a new site I am building. The pages are being served up as UTF-8, and were created/saved as UTF-8 in MS Expression Web. Strangely, the problem only manifests on /some/ pages but not others. All were created and served in the same way. The problem occurs in all of Firefox/IE7/Opera/Safari, so it's not a browser issue. All browsers are detecting the documents as UTF-8 and displaying them as such. Manually overriding the character encoding doesn't fix the problem, and in some cases makes things worse - for example, I thought it could have been ISO-8859-1 being incorrectly served as UTF-8, but changing to ISO-8859-1 causes text such as "»" and "©" to appear instead. If I open up the pages in Notepad, the code appears exactly how it should, ie. "»" or "©" with no other characters. If I then save the file without actually making any changes, then it works fine and the browser once again shows the document as intended. Any ideas? -- Dylan Parry http://electricfreedom.org | http://webpageworkshop.co.uk The opinions stated above are not necessarily representative of those of my cats. All opinions expressed are entirely your own. |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
Dylan Parry wrote:
[...] > The pages are being served up as UTF-8, and were created/saved > as UTF-8 in MS Expression Web. [...] I've narrowed down the problem to where it occurs. I've noticed that I only get this issue in files that have been affected by a global find and replace operation, ie. find and replace in all files within a project. So at least I now know what causes it, but it would be nice to be able to use find and replace without it screwing up my site ![]() -- Dylan Parry http://electricfreedom.org | http://webpageworkshop.co.uk The opinions stated above are not necessarily representative of those of my cats. All opinions expressed are entirely your own. |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
Dylan Parry wrote:
> I'm having a bit of a problem with character encoding. For some reason I > am getting things like "»" and "©" appearing on a new site I am > building. The pages are being served up as UTF-8, and were created/saved > as UTF-8 in MS Expression Web. Post the URL, please. Do you use ASP.NET? I think it is possible to misconfigure the web.config file so that ASP.NET reads your UTF-8 encoded ..aspx files as ISO-8859-1. My next guess would be include files that use a different encoding than the including page. > If I open up the pages in Notepad, the code appears exactly how it > should, ie. "»" or "©" with no other characters. If I then save the file > without actually making any changes, then it works fine and the browser > once again shows the document as intended. Which encoding does Notepad assume? Just open the file, call "File > Save as..." and check the value of the "Encoding" combobox. Notepad and xWeb both store UTF-8 files with a byte-order mark. However, one notable difference is the handling of invalid UTF-8 sequences when loading a file: Notepad just throws these bytes away, while xWeb tries to preserve them. Thus, I think it is possible that when you open a UTF-8 encoded file with some invalid bytes in xWeb and then save it, the invalid bytes might still be there. On the other hand, UTF-8 encoded files saved from within Notepad should never contain invalid sequences. Are you 100 percent sure that your files are perfectly valid UTF-8? If your files are XHTML, you can temporarily reame them to .xml and then open them in IE. -- <http://schneegans.de/lv/> · RFC 4646 compliant language tag validator |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
Christoph Schneegans wrote:
> Post the URL, please. Do you use ASP.NET? I think it is possible to > misconfigure the web.config file so that ASP.NET reads your UTF-8 encoded > .aspx files as ISO-8859-1. I would normally upload the files, but I can't do so this time as I'm using ASP.NET and don't currently have access to a suitable server other than the dev one, which isn't publicly visible. I've checked the web.config, and nothing in there should be causing this. > My next guess would be include files that use a different encoding than the > including page. It's not that. I've got a couple of included files, but they're definitely in UTF-8, and aren't the files that contain the affected characters either. >> If I open up the pages in Notepad, the code appears exactly how it >> should, ie. "»" or "©" with no other characters. If I then save the file >> without actually making any changes, then it works fine and the browser >> once again shows the document as intended. > > Which encoding does Notepad assume? Just open the file, call "File > Save > as..." and check the value of the "Encoding" combobox. It shows as UTF-8. > Notepad and xWeb both store UTF-8 files with a byte-order mark. However, one > notable difference is the handling of invalid UTF-8 sequences when loading > a file: Notepad just throws these bytes away, while xWeb tries to preserve > them. Thus, I think it is possible that when you open a UTF-8 encoded file > with some invalid bytes in xWeb and then save it, the invalid bytes might > still be there. On the other hand, UTF-8 encoded files saved from within > Notepad should never contain invalid sequences. Ah, that does begin to cast some light on the issue... > Are you 100 percent sure that your files are perfectly valid UTF-8? If your > files are XHTML, you can temporarily reame them to .xml and then open them > in IE. Now I am not that sure. As I've mentioned in a follow-up post, it's only occurring in those files affected by a global find and replace - so it would seem that xWeb is corrupting these files whenever I do one. FWIW, it works perfectly fine if I manually edit these files in xWeb, so it would seem that the method xWeb uses to open/edit/save files automatically in the global find and replace is what it causing this to happen. -- Dylan Parry http://electricfreedom.org | http://webpageworkshop.co.uk The opinions stated above are not necessarily representative of those of my cats. All opinions expressed are entirely your own. |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
On 25 Oct, 14:30, Dylan Parry <use...@dylanparry.com> wrote:
> Hi folks, > > I'm having a bit of a problem with character encoding. For some reason I > am getting things like "»" and "©" appearing on a new site I am > building. The pages are being served up as UTF-8, and were created/saved > as UTF-8 in MS Expression Web. Are you _sure_ they're served as UTF-8 ? Checked the HTTP header and the browser's own metadata, not just a <meat> element in the header? Errors of that sort (Accented-A "Â" as a prefix character) are indicative of UTF-8 content that has been handled as non-UTF-8. Most likely this happens right at the last moment, when your browser receives it by HTTP. Alternatively, something earlier on in the editing process has loaded them as non-UTF-8, mangled them, then saved them back again as something that's clearly and obviously non-UTF-8. This is hard to do! It's hard to actually label a saved files as "not UTF-8". Even if a broken old 8-bit ANSI editor where to open up UTF-8 and save it again, so long as it doesn't change these octets (it doesn't have to understand them), then the file will still remain as valid UTF-8. That's why, dollars to doughnuts, it's happening at the very last moment rather than in the previous edit process. |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
On 25 Oct, 15:01, Dylan Parry <use...@dylanparry.com> wrote:
> I've narrowed down the problem to where it occurs. I've noticed that I > only get this issue in files that have been affected by a global find > and replace operation, ie. find and replace in all files within a project. Is the content preceding the obviously broken characters non-ASCII? To get this error it's usually necessary to inject some non-ASCII characters before the well-formed UTF-8, then have them saved with an ISO-8859-* encoding during storage. On receipt, the final user agent sees the ISO-8859-* characters first and thus treats the document as not being well-formed UTF-8. |
|
|
|
#7 |
|
Messages: n/a
Hébergeur: |
Andy Dingley wrote:
> Are you _sure_ they're served as UTF-8 ? Checked the HTTP header and > the browser's own metadata, not just a <meat> element in the header? Yes. Firefox's "page info" shows the document to be UTF-8, and also in the drop-down menu for character encoding it's shown as UTF-8. There isn't a meta element in the head in the offending documents, so nowhere to get confused. (Incidentally, I think the <meat> element is invalid <g>) > Errors of that sort (Accented-A "Â" as a prefix character) are > indicative of UTF-8 content that has been handled as non-UTF-8. Most > likely this happens right at the last moment, when your browser > receives it by HTTP. > > Alternatively, something earlier on in the editing process has loaded > them as non-UTF-8, mangled them, then saved them back again as > something that's clearly and obviously non-UTF-8. This is hard to do! Heh - I'm pretty sure that's what is happening though. I think it's likely a bug in the find and replace with xWeb. Having avoided using it since noticing this problem, no further occurrences have been noted. -- Dylan Parry http://electricfreedom.org | http://webpageworkshop.co.uk The opinions stated above are not necessarily representative of those of my cats. All opinions expressed are entirely your own. |
|
|
|
#8 |
|
Messages: n/a
Hébergeur: |
Dylan Parry wrote: > Christoph Schneegans wrote: > > > Post the URL, please. Do you use ASP.NET? I think it is possible to > > misconfigure the web.config file so that ASP.NET reads your UTF-8 encoded > > .aspx files as ISO-8859-1. > > I would normally upload the files, but I can't do so this time as I'm > using ASP.NET and don't currently have access to a suitable server other > than the dev one, which isn't publicly visible. I've checked the > web.config, and nothing in there should be causing this. > > > My next guess would be include files that use a different encoding thanthe > > including page. > > It's not that. I've got a couple of included files, but they're > definitely in UTF-8, and aren't the files that contain the affected > characters either. > > >> If I open up the pages in Notepad, the code appears exactly how it > >> should, ie. "»" or "©" with no other characters. If I then save the file > >> without actually making any changes, then it works fine and the browser > >> once again shows the document as intended. > > > > Which encoding does Notepad assume? Just open the file, call "File > Save > > as..." and check the value of the "Encoding" combobox. > > It shows as UTF-8. > > > Notepad and xWeb both store UTF-8 files with a byte-order mark. However, one > > notable difference is the handling of invalid UTF-8 sequences when loading > > a file: Notepad just throws these bytes away, while xWeb tries to preserve > > them. Thus, I think it is possible that when you open a UTF-8 encoded file > > with some invalid bytes in xWeb and then save it, the invalid bytes might > > still be there. On the other hand, UTF-8 encoded files saved from within > > Notepad should never contain invalid sequences. > > Ah, that does begin to cast some light on the issue... > > > Are you 100 percent sure that your files are perfectly valid UTF-8? If your > > files are XHTML, you can temporarily reame them to .xml and then open them > > in IE. > > Now I am not that sure. As I've mentioned in a follow-up post, it's only > occurring in those files affected by a global find and replace - so it > would seem that xWeb is corrupting these files whenever I do one. FWIW, > it works perfectly fine if I manually edit these files in xWeb, so it > would seem that the method xWeb uses to open/edit/save files > automatically in the global find and replace is what it causing this to > happen. > Yes, it sounds like an MSEW bug to me. -- Regards Chad. http://freewebdesign.awardspace.biz |
|
|
|
#9 |
|
Messages: n/a
Hébergeur: |
Andy Dingley wrote:
> Is the content preceding the obviously broken characters non-ASCII? No, it's all ASCII text (at least UTF-8 within the ASCII block) up to the character that goes "wrong". -- Dylan Parry http://electricfreedom.org | http://webpageworkshop.co.uk The opinions stated above are not necessarily representative of those of my cats. All opinions expressed are entirely your own. |
|
![]() |
| Outils de la discussion | |
|
|