|
|
|
|
||||||
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
Hello,
I have a website with accented characters. Do I have to convert them into html entities in XHTML 1.0 strict and charset=iso-8859-1? If so, could you recommend a freeware? Thank you. |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
On 2007-10-30, Jean-Guy Mouton <user@example.net> wrote:
> Hello, > > I have a website with accented characters. Do I have to convert them > into html entities in XHTML 1.0 strict and charset=iso-8859-1? No, just make sure your pages are properly saved in ISO-8859-1 and that the server is configured to deliver the correct charset in the Content-Type header. That's assuming ISO-8859-1 covers all the accented characters you need-- what language is it for? If it's French then you should be fine. If it's Vietnamese (say) then you need a different encoding, probably UTF-8. |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
Ben C wrote:
> No, just make sure your pages are properly saved in ISO-8859-1 and that > the server is configured to deliver the correct charset in the > Content-Type header. How to check about the hosting server please? > > That's assuming ISO-8859-1 covers all the accented characters you need-- > what language is it for? If it's French then you should be fine. If it's Yes that's French. |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
On 30 Oct, 16:38, Jean-Guy Mouton <u...@example.net> wrote:
> I have a website with accented characters. Do I have to convert them > into html entities in XHTML 1.0 strict and charset=iso-8859-1? If you do things correctly, then they'll work equally well in any of three ways (even mixed on the same page). * Directly entered characters "é" * HTML entity references é * numeric character entities é Just make sure that the web server sends a _matching_ encoding for how the document was itself encoded. It doesn't matter which encoding you author in (of encodings that contain the characters you need), so long as you match it with the HTTP content-type header. Ignore <meta> inside the page. It's of no use on the web and is often misleading. If you can't reliably control the HTTP content-type header, then use either form of the entities. If you can have the HTTP content-type header set once, but only once, then set it to UTF-8 (this is quite common in a corporate environment). Some (surprisingly little-known) things that you ought to understand: - Unicode is a character set, UTF-8 is an encoding to represent this as a sequence of data. The two are separate functions. - That Unicode character set is used throughout HTML, whether you like it or not. When you use numeric character entities, even from an ISO-8859-* page, the numbers you use refer to Unicode, not to ISO. I would suggest avoiding ISO-8859-* in favour of UTF-8. Some of your tools will no longer work, but there are plenty that will replace them, and for free. These days a tool that isn't UTF-8 clean has little place in a web design shop. The great advantage of UTF-8 is obviously when you have to support multiple languages - it's near- essential for doing this on the same page, but it's even worth doing if you only have to support different language clients from the same office. Watch out for UTF-16 from some Windows tools! That "Save as Unicode" option is often the wrong thing - look further down for UTF-8. Don't use a BOM (aka UTF-8Y) as that's incompatible with ASCII (and most ISO-8859-* characters) encodings. If your authoring process is only ASCII-clean and you only need Western European characters, then the character entity references (e.g. é rather than for é for "é") are simple and robust against mistakes. If you need characters from outside Western Europpe, then you can't use character entity references (for any encoding). If you use ISO-8859-1 encoding then you MUST use numeric character entities. If you use UTF-8 then you can use either characters entered directly, or numeric character entities. As the numerics are hard to proof-read, this alone is enough reason to favour UTF-8 I'd also suggest dropping XHTML in favour of HTML 4.01 Strict, but that's for HTML reasons, not character encoding. |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
On Oct 30, 5:38 pm, Jean-Guy Mouton <u...@example.net> wrote:
> Hello, > > I have a website with accented characters. Do I have to convert them > into html entities in XHTML 1.0 strict and charset=iso-8859-1? > > If so, could you recommend a freeware? > > Thank you. Use UTF-8 whenever you can. UTF-8 is able to represent any character in the Unicode standard, yet the initial encoding of byte codes and character assignments for UTF-8 is backwards compatible with ASCII. For these reasons, it is steadily becoming the preferred encoding for e-mail, web pages, and other places where characters are stored or streamed. Advantages * UTF-8 is a superset of ASCII. Since a plain ASCII string is also a valid UTF-8 string, no conversion needs to be done for existing ASCII text. Software designed for traditional non-extended ASCII character sets can generally be used with UTF-8 with few or no changes. * Sorting of UTF-8 strings using standard byte-oriented sorting routines will produce the same results as sorting them based on Unicode code points. (This has limited usefulness, though, since it is unlikely to represent the culturally acceptable sort order of any particular language or locale.) * UTF-8 and UTF-16 are the standard encodings for XML documents. All other encodings must be specified explicitly either externally or through a text declaration. [1] * Any byte oriented string search algorithm can be used with UTF-8 data (as long as one ensures that the inputs only consist of complete UTF-8 characters). Care must be taken with regular expressions and other constructs that count characters, however. * UTF-8 strings can be fairly reliably recognized as such by a simple algorithm. That is, the probability that a string of characters in any other encoding appears as valid UTF-8 is low, diminishing with increasing string length. For instance, the octet values C0, C1, F5 to FF never appear. For better reliability, regular expressions can be used to take into account illegal overlong and surrogate values (see the W3 FAQ: Multilingual Forms for a Perl regular expression to validate a UTF-8 string). http://en.wikipedia.org/wiki/UTF-8#Advantages |
|
![]() |
| Outils de la discussion | |
|
|