|
|
|
|
||||||
| linux.debian.user debian-user@lists.debian.org. |
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
Hello,
In Debian Etch Mozilla browser (Iceape), I notice that sometimes accented characters are not displayed properly. They are shown as question marks in black diamonds. For example, on this web page (CNN): http://www.time.com/time/nation/arti...0.html?cnn=yes I see this "or his prot�g�s". I assume the last word is protege with accents on the e's. How do I find out what I am missing to have these characters shown properly? Maybe a font? My default locale is en_CA.UTF-8 and many of the international languages are shown properly. I even see accents properly on this web page: http://www.jw-stumpel.nl/stestu.html thanks, ->HS -- To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
On Thu, 08 Mar 2007 09:59:07 -0500
"H.S." <hs.samix@gmail.com> wrote: > Hello, > > In Debian Etch Mozilla browser (Iceape), I notice that sometimes > accented characters are not displayed properly. They are shown as > question marks in black diamonds. For example, on this web page (CNN): > http://www.time.com/time/nation/arti...0.html?cnn=yes > > I see this "or his prot�g�s". I assume the last word is protege with > accents on the e's. How do I find out what I am missing to have these > characters shown properly? Maybe a font? My default locale is > en_CA.UTF-8 and many of the international languages are shown properly. > I even see accents properly on this web page: > http://www.jw-stumpel.nl/stestu.html I get the same thing in Iceweasel (2.0.0.2+dfsg-2) in Sid. Celejar -- To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
On Thu, 08 Mar 2007 09:59:07 -0500
"H.S." <hs.samix@gmail.com> wrote: > Hello, > > In Debian Etch Mozilla browser (Iceape), I notice that sometimes > accented characters are not displayed properly. They are shown as > question marks in black diamonds. For example, on this web page (CNN): > http://www.time.com/time/nation/arti...0.html?cnn=yes > > I see this "or his prot�g�s". I assume the last word is protege with > accents on the e's. How do I find out what I am missing to have these > characters shown properly? Maybe a font? My default locale is > en_CA.UTF-8 and many of the international languages are shown properly. > I even see accents properly on this web page: > http://www.jw-stumpel.nl/stestu.html Incidentally, when I reply to this message, Sylpheed warns: "Can't convert the character encoding of the message body from UTF-8 to ISO-8859-1. Send it as UTF-8 anyway?" (To which I'm replying "yes"). I understand very little of character encodings. Celejar -- To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
On Thu, Mar 08, 2007 at 09:59:07 -0500, H.S. wrote:
> Hello, > > In Debian Etch Mozilla browser (Iceape), I notice that sometimes > accented characters are not displayed properly. They are shown as > question marks in black diamonds. For example, on this web page (CNN): > http://www.time.com/time/nation/arti...0.html?cnn=yes > > I see this "or his prot�g�s". I assume the last word is protege with > accents on the e's. How do I find out what I am missing to have these > characters shown properly? Maybe a font? My default locale is > en_CA.UTF-8 and many of the international languages are shown properly. Try to change to "View > Character Encoding > Western (ISO-8859-1)". Your en_CA.UTF-8 would be able to display this page correctly if time.com would bother to tell your browser that is uses ISO-8859-1. I would have expected time.com to be more professional. > I even see accents properly on this web page: > http://www.jw-stumpel.nl/stestu.html This page uses utf-8, so it matches your locale setting. It also specifies the encoding in the source, so it should display correctly on other locales as well, as long as they have the "é" character at all. (The browser transcodes transparently if it knows what it is dealing with.) -- Regards, Florian |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
Florian Kulzer wrote:
> On Thu, Mar 08, 2007 at 09:59:07 -0500, H.S. wrote: >> Hello, >> >> In Debian Etch Mozilla browser (Iceape), I notice that sometimes >> accented characters are not displayed properly. They are shown as >> question marks in black diamonds. For example, on this web page (CNN): >> http://www.time.com/time/nation/arti...0.html?cnn=yes >> >> I see this "or his prot�g�s". I assume the last word is protege with >> accents on the e's. How do I find out what I am missing to have these >> characters shown properly? Maybe a font? My default locale is >> en_CA.UTF-8 and many of the international languages are shown properly. > > Try to change to "View > Character Encoding > Western (ISO-8859-1)". Yes, that worked. > Your en_CA.UTF-8 would be able to display this page correctly if > time.com would bother to tell your browser that is uses ISO-8859-1. I am not sure I understand this comment. I am not very familiar with encoding. I was assuming the web pages which have international characters are better off by using UTF-8 encoding. I was assuming they should have used UTF-8 along with the language tags around that word. I might be mistaken though. ->HS > I would have expected time.com to be more professional. > >> I even see accents properly on this web page: >> http://www.jw-stumpel.nl/stestu.html > > This page uses utf-8, so it matches your locale setting. It also > specifies the encoding in the source, so it should display correctly on > other locales as well, as long as they have the "é" character at all. > (The browser transcodes transparently if it knows what it is dealing > with.) > -- To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
On Thu, Mar 08, 2007 at 10:47:10 -0500, H.S. wrote:
> Florian Kulzer wrote: > >On Thu, Mar 08, 2007 at 09:59:07 -0500, H.S. wrote: > >>Hello, > >> > >>In Debian Etch Mozilla browser (Iceape), I notice that sometimes > >>accented characters are not displayed properly. They are shown as > >>question marks in black diamonds. For example, on this web page (CNN): > >>http://www.time.com/time/nation/arti...0.html?cnn=yes > >> > >>I see this "or his prot�g�s". I assume the last word isprotege with > >>accents on the e's. How do I find out what I am missing to have these > >>characters shown properly? Maybe a font? My default locale is > >>en_CA.UTF-8 and many of the international languages are shown properly. > > > >Try to change to "View > Character Encoding > Western (ISO-8859-1)". > > Yes, that worked. > > >Your en_CA.UTF-8 would be able to display this page correctly if > >time.com would bother to tell your browser that is uses ISO-8859-1. > > I am not sure I understand this comment. I am not very familiar with > encoding. I was assuming the web pages which have international > characters are better off by using UTF-8 encoding. What I meant was this: Your utf-8 setup (combined with using the proper fonts) is able to encode and display umlauts, accented characters, characters for Slavic languages, Scandinavian, Russian, Greek, (some) Asian characters, etc. This is in contrast to, say, someone using an iso-8859-1 locale who cannot display many of these "foreign" characters. (Unless s/he uses an application which can work around the limitations of the system's encoding, for example LaTeX.) The problem is that a webpage has to tell your browser which encoding it uses to transmit the characters. If the browser has to guess things can go wrong. In your case iceape guessed the page was encoded in utf-8 which goes wrong for many characters outside the standard us-ascii set. Once you told your browser that the page was in iso-8859-1 it could transcode properly. The root of the problem is that the character "é" (the accented e) exists in both utf-8 and iso-8859-1 but it has a different code in the two encodings. > I was assuming they should have used UTF-8 along with the language tags > around that word. I might be mistaken though. This would maybe work if they would encode that word in utf-8. Since they decided to use iso-8859-1 throughout the document they could simply have included <meta http-equiv="CONTENT-TYPE" content="text/html; charset=iso-8859-1"> in the HTML header. -- Regards, Florian |
|
|
|
#7 |
|
Messages: n/a
Hébergeur: |
2007/3/8, H.S. <hs.samix@gmail.com>:
> Florian Kulzer wrote: > > On Thu, Mar 08, 2007 at 09:59:07 -0500, H.S. wrote: > >> ...For example, on this web page (CNN): > >> http://www.time.com/time/nation/arti...0.html?cnn=yes > >> I see this "or his prot�g�s". I assume the last word is protege with > >> ... > > > > Try to change to "View > Character Encoding > Western (ISO-8859-1)". > > Yes, that worked. > <disclaimer>ROUGH EXPLANATIONS</> when one writes a text in a text-editor the text-editor must store it in the disk as a series of numbers (for example ABC will become 65,66,67) this is called encoding the text when your browser renders that text in the screen it must convert the series of numbers to glyphs of letters (for example 65,66,67 will be presented as ABC) this is called decoding in order for this to work the two programs (text editor and browser) should agree in order to use the same rules of conversion (for example A<->65, B<->66,...) this is where everything gets messed up because there are more than one possible encoding rules and web server, a database server, a lot of programmers and sysadmins and heaven knows what else in between the two programs. You the user then, must try a few possible encoding and see what works. Not too difficult just use the view->encoding menu. Still it is annoying in the case of this page the text is really encoded as iso8859-1 (as you can find out if you manually select this encoding when everything displays properly) but the html code reports that it's text is encoded as UTF-8 (as you can see if you look at the first lines of the html source: content="text/html; charset=utf-8" - you can see the source with menu->view->page source). So its a problem that only time.com can solve properly > > Your en_CA.UTF-8 would be able to display this page correctly if > > time.com would bother to tell your browser that is uses ISO-8859-1. > > I am not sure I understand this comment. I am not very familiar with > encoding. I was assuming the web pages which have international > characters are better off by using UTF-8 encoding. all these things I told you regarding character encodings don't aply only to the case of a text-editor producing text to be displayed in a web browser. In fact they aply when ever a computer stores and displays text. Text stored in memory/disk/wherever must be encoded. Text retrieved to be displayed must be decoded. And this is where your default locale comes to play its part: > My default locale is en_CA.UTF-8 and many of the > international languages are shown properly. this (UTF-8) is the encoding YOUR pc uses to store/display characters. When not told to use any other encoding it uses UTF-8. When told that a text is encoded differently it is silently converting it to UTF-8 to handle it internally. That is good because UTF-8 is a good encoding scheme by measure of how many different languages it can handle (almost all). If for example your default encoding was iso-8859-1 you would never be able to see how a Greek or Japanese text would look like[1] So you did your part right. Your computer IS ABLE to display most texts right if they are properly tagged regarding what encoding they use. [1] of course you need also have fonts with Greek / Japanese letters |
|
|
|
#8 |
|
Messages: n/a
Hébergeur: |
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1 H.S. wrote: > Hello, > > In Debian Etch Mozilla browser (Iceape), I notice that sometimes > accented characters are not displayed properly. They are shown as > question marks in black diamonds. For example, on this web page (CNN): > http://www.time.com/time/nation/arti...0.html?cnn=yes > > I see this "or his prot�g�s". I assume the last word is protege with > accents on the e's. How do I find out what I am missing to have these > characters shown properly? Maybe a font? My default locale is > en_CA.UTF-8 and many of the international languages are shown properly. > I even see accents properly on this web page: > http://www.jw-stumpel.nl/stestu.html > > thanks, > ->HS > > Same problem in the latest iceweasel. Konquerer doesn't do it right either. It shows "his prot��", which icedove changes. It should be a box instead of a ? in a black diamond. I think it's a bug in the cnn page. Funny, I hit the site again with iceweasel and I get a blank screen. Perhaps they are fixing it. - -- Registerd Linux user #443289 at http://counter.li.org/ -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFF8DsjiXBCVWpc5J4RArrHAJ9iDk2ikG2Bt4VHD+Wf9n wx8XtPZwCfQ+V9 Vl5s3RBk7EYgDc/cZK6sOU8= =nLgX -----END PGP SIGNATURE----- -- To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org |
|
|
|
#9 |
|
Messages: n/a
Hébergeur: |
Nick Demou wrote:
> <disclaimer>ROUGH EXPLANATIONS</> > > when one writes a text in a text-editor the text-editor must store it > in the disk as a series of numbers (for example ABC will become > 65,66,67) > this is called encoding the text > when your browser renders that text in the screen it must convert the > series of numbers to glyphs of letters (for example 65,66,67 will be > presented as ABC) > this is called decoding > > in order for this to work the two programs (text editor and browser) > should agree in order to use the same rules of conversion (for example > A<->65, B<->66,...) I am familiar with the above. > this is where everything gets messed up because there are more than > one possible encoding rules and web server, a database server, a lot > of programmers and sysadmins and heaven knows what else in between the > two programs. You the user then, must try a few possible encoding and > see what works. Not too difficult just use the view->encoding menu. > Still it is annoying Right. > > in the case of this page the text is really encoded as iso8859-1 (as > you can find out if you manually select this encoding when everything > displays properly) but the html code reports that it's text is encoded > as UTF-8 (as you can see if you look at the first lines of the html > source: content="text/html; charset=utf-8" - you can see the source > with menu->view->page source). > > So its a problem that only time.com can solve properly For a moment pretend that I am the person responsible to do that (HTML programmer or HTML editor or whatever). What would I do to resolve this? My guess: use an HTML editor which supports UTF-8? Then the tag in the web page, content="text/html; charset=utf-8", would specify the encoding, the editor would input proper encoding of the character and my UTF-8 enabled browser should show the characters exactly as they were typed(?) ->HS -- To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org |
|
|
|
#10 |
|
Messages: n/a
Hébergeur: |
On Mar 08 2007, Florian Kulzer wrote:
> On Thu, Mar 08, 2007 at 10:47:10 -0500, H.S. wrote: > > I am not sure I understand this comment. I am not very familiar with > > encoding. I was assuming the web pages which have international > > characters are better off by using UTF-8 encoding. > > What I meant was this: Your utf-8 setup (combined with using the proper > fonts) is able to encode and display umlauts, accented characters, > characters for Slavic languages, Scandinavian, Russian, Greek, (some) > Asian characters, etc. This is in contrast to, say, someone using an > iso-8859-1 locale who cannot display many of these "foreign" characters. > (Unless s/he uses an application which can work around the limitations > of the system's encoding, for example LaTeX.) > > The problem is that a webpage has to tell your browser which encoding it > uses to transmit the characters. If the browser has to guess things can > go wrong. In your case iceape guessed the page was encoded in utf-8 > which goes wrong for many characters outside the standard us-ascii set. > Once you told your browser that the page was in iso-8859-1 it could > transcode properly. The root of the problem is that the character "??" > (the accented e) exists in both utf-8 and iso-8859-1 but it has a > different code in the two encodings. Ok, dumb question time. I have hell's own mess with emails, basically amounting to inability to read non-US characters in text emails, but I was under the impression that there was a simple solution for web pages. Html includes its _own_ encoding for accented, umlauted and otherwise non-US characters, and conformant web pages are supposed to use it - not rely on the lucky browser switching their browser preferences from UTF-6 to ISO-988956-whatever to some-other-bloody-encoding depending on the whim of the web page author. People reading this mail in html may have difficulty if I try to give examples, but I type them into web page source all the time, to get the non-US characters I want - and they work. Perhaps things are different if the web page creator uses GUI-based "authoring" tools, and can't tell that the tool is making stupid decisions under the good ;-) Anyway, example time - á gives you an a with an acute acent. (That's an ampersand symbol followed by the letters "aacute" followed by a semi-colon.) > > I was assuming they should have used UTF-8 along with the language tags > > around that word. I might be mistaken though. > > This would maybe work if they would encode that word in utf-8. Since > they decided to use iso-8859-1 throughout the document they could simply > have included > > <meta http-equiv="CONTENT-TYPE" content="text/html; charset=iso-8859-1"> > > in the HTML header. I see. Since I'm lazy - and unsure precisely what query to feed to a search engine - could you possibly point at a list of these tags. -- Arlie (Arlie Stephens arlie@worldash.org) -- To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org |
|
|
|
#11 |
|
Messages: n/a
Hébergeur: |
2007/3/8, H.S. <hs.samix@gmail.com>:
> Nick Demou wrote: > > ... > > in the case of this page the text is really encoded as iso8859-1 (as > > you can find out if you manually select this encoding when everything > > displays properly) but the html code reports that it's text is encoded > > as UTF-8 (as you can see if you look at the first lines of the html > > source: content="text/html; charset=utf-8" - you can see the source > > with menu->view->page source). > > > > So its a problem that only time.com can solve properly > > For a moment pretend that I am the person responsible to do that (HTML > programmer or HTML editor or whatever). What would I do to resolve this? > > My guess: use an HTML editor which supports UTF-8? Then the tag in the > web page, content="text/html; charset=utf-8", would specify the > encoding, the editor would input proper encoding of the character and my > UTF-8 enabled browser should show the characters exactly as they were > typed(?) > yes this would do the trick however do note that you do not need to have UTF-8 everywhere: you could use an HTML editor that supports iso8859-1 and just make sure that the tag DOES PROPERLY indicate that this is iso-8859-1 text and you would be equally good. UTF-8 everywhere does makes these issues easier (it's just that it is rather recent development and a) a few programs can't handle it b) some programmers users don't know how to set things properly for UTF8 support) -- To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org |
|
|
|
#12 |
|
Messages: n/a
Hébergeur: |
2007/3/8, Arlie Stephens <arlie@worldash.org>:
> On Mar 08 2007, Florian Kulzer wrote: > > ... > > <meta http-equiv="CONTENT-TYPE" content="text/html; charset=iso-8859-1"> > > > > in the HTML header. > > I see. Since I'm lazy - and unsure precisely what query to feed to a > search engine - could you possibly point at a list of these tags. > you did bury your question under too much text but you were lucky ![]() 1) http://en.wikipedia.org/wiki/Categor...acter_encoding 2) http://en.wikipedia.org/wiki/Charset 3) my advice: learn - choose your html editor carefully - test -- To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org |
|
|
|
#13 |
|
Messages: n/a
Hébergeur: |
Florian Kulzer wrote:
> > What I meant was this: Your utf-8 setup (combined with using the proper > fonts) is able to encode and display umlauts, accented characters, > characters for Slavic languages, Scandinavian, Russian, Greek, (some) > Asian characters, etc. This is in contrast to, say, someone using an > iso-8859-1 locale who cannot display many of these "foreign" characters. > (Unless s/he uses an application which can work around the limitations > of the system's encoding, for example LaTeX.) > > The problem is that a webpage has to tell your browser which encoding it > uses to transmit the characters. If the browser has to guess things can > go wrong. In your case iceape guessed the page was encoded in utf-8 > which goes wrong for many characters outside the standard us-ascii set. > Once you told your browser that the page was in iso-8859-1 it could > transcode properly. The root of the problem is that the character "é" > (the accented e) exists in both utf-8 and iso-8859-1 but it has a > different code in the two encodings. Ah, that makes complete sense! >> I was assuming they should have used UTF-8 along with the language tags >> around that word. I might be mistaken though. > > This would maybe work if they would encode that word in utf-8. Since > they decided to use iso-8859-1 throughout the document they could simply > have included > > <meta http-equiv="CONTENT-TYPE" content="text/html; charset=iso-8859-1"> > > in the HTML header. Thanks for your excellent explanation. regards, ->HS -- To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org |
|
![]() |
| Outils de la discussion | |
|
|