PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Autres forums > Forum Programmation & Conception > alt.www.webmaster > Converting accented characters to entities
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
Converting accented characters to entities

Réponse
 
LinkBack Outils de la discussion
Vieux 30/10/2007, 17h38   #1
Jean-Guy Mouton
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Converting accented characters to entities

Hello,

I have a website with accented characters. Do I have to convert them
into html entities in XHTML 1.0 strict and charset=iso-8859-1?

If so, could you recommend a freeware?

Thank you.
  Réponse avec citation
Vieux 30/10/2007, 17h55   #2
Ben C
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Converting accented characters to entities

On 2007-10-30, Jean-Guy Mouton <user@example.net> wrote:
> Hello,
>
> I have a website with accented characters. Do I have to convert them
> into html entities in XHTML 1.0 strict and charset=iso-8859-1?


No, just make sure your pages are properly saved in ISO-8859-1 and that
the server is configured to deliver the correct charset in the
Content-Type header.

That's assuming ISO-8859-1 covers all the accented characters you need--
what language is it for? If it's French then you should be fine. If it's
Vietnamese (say) then you need a different encoding, probably UTF-8.
  Réponse avec citation
Vieux 30/10/2007, 17h59   #3
Jean-Guy Mouton
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Converting accented characters to entities

Ben C wrote:
> No, just make sure your pages are properly saved in ISO-8859-1 and that
> the server is configured to deliver the correct charset in the
> Content-Type header.

How to check about the hosting server please?
>
> That's assuming ISO-8859-1 covers all the accented characters you need--
> what language is it for? If it's French then you should be fine. If it's

Yes that's French.
  Réponse avec citation
Vieux 30/10/2007, 19h17   #4
Andy Dingley
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Converting accented characters to entities

On 30 Oct, 16:38, Jean-Guy Mouton <u...@example.net> wrote:

> I have a website with accented characters. Do I have to convert them
> into html entities in XHTML 1.0 strict and charset=iso-8859-1?


If you do things correctly, then they'll work equally well in any of
three ways (even mixed on the same page).
* Directly entered characters "é"
* HTML entity references &eacute;
* numeric character entities é

Just make sure that the web server sends a _matching_ encoding for how
the document was itself encoded. It doesn't matter which encoding you
author in (of encodings that contain the characters you need), so long
as you match it with the HTTP content-type header.

Ignore <meta> inside the page. It's of no use on the web and is often
misleading.

If you can't reliably control the HTTP content-type header, then use
either form of the entities.

If you can have the HTTP content-type header set once, but only once,
then set it to UTF-8 (this is quite common in a corporate
environment).



Some (surprisingly little-known) things that you ought to understand:

- Unicode is a character set, UTF-8 is an encoding to represent this
as a sequence of data. The two are separate functions.

- That Unicode character set is used throughout HTML, whether you
like it or not. When you use numeric character entities, even from an
ISO-8859-* page, the numbers you use refer to Unicode, not to ISO.


I would suggest avoiding ISO-8859-* in favour of UTF-8. Some of your
tools will no longer work, but there are plenty that will replace
them, and for free. These days a tool that isn't UTF-8 clean has
little place in a web design shop. The great advantage of UTF-8 is
obviously when you have to support multiple languages - it's near-
essential for doing this on the same page, but it's even worth doing
if you only have to support different language clients from the same
office.

Watch out for UTF-16 from some Windows tools! That "Save as Unicode"
option is often the wrong thing - look further down for UTF-8.

Don't use a BOM (aka UTF-8Y) as that's incompatible with ASCII (and
most ISO-8859-* characters) encodings.

If your authoring process is only ASCII-clean and you only need
Western European characters, then the character entity references
(e.g. &eacute; rather than for é for "é") are simple and robust
against mistakes.

If you need characters from outside Western Europpe, then you can't
use character entity references (for any encoding). If you use
ISO-8859-1 encoding then you MUST use numeric character entities. If
you use UTF-8 then you can use either characters entered directly, or
numeric character entities. As the numerics are hard to proof-read,
this alone is enough reason to favour UTF-8


I'd also suggest dropping XHTML in favour of HTML 4.01 Strict, but
that's for HTML reasons, not character encoding.

  Réponse avec citation
Vieux 30/10/2007, 20h46   #5
1001 Webs
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Converting accented characters to entities

On Oct 30, 5:38 pm, Jean-Guy Mouton <u...@example.net> wrote:
> Hello,
>
> I have a website with accented characters. Do I have to convert them
> into html entities in XHTML 1.0 strict and charset=iso-8859-1?
>
> If so, could you recommend a freeware?
>
> Thank you.


Use UTF-8 whenever you can.
UTF-8 is able to represent any character in the Unicode standard, yet
the initial encoding of byte codes and character assignments for UTF-8
is backwards compatible with ASCII.
For these reasons, it is steadily becoming the preferred encoding for
e-mail, web pages, and other places where characters are stored or
streamed.

Advantages

* UTF-8 is a superset of ASCII. Since a plain ASCII string is also
a valid UTF-8 string, no conversion needs to be done for existing
ASCII text. Software designed for traditional non-extended ASCII
character sets can generally be used with UTF-8 with few or no
changes.
* Sorting of UTF-8 strings using standard byte-oriented sorting
routines will produce the same results as sorting them based on
Unicode code points. (This has limited usefulness, though, since it is
unlikely to represent the culturally acceptable sort order of any
particular language or locale.)
* UTF-8 and UTF-16 are the standard encodings for XML documents.
All other encodings must be specified explicitly either externally or
through a text declaration. [1]
* Any byte oriented string search algorithm can be used with UTF-8
data (as long as one ensures that the inputs only consist of complete
UTF-8 characters). Care must be taken with regular expressions and
other constructs that count characters, however.
* UTF-8 strings can be fairly reliably recognized as such by a
simple algorithm. That is, the probability that a string of characters
in any other encoding appears as valid UTF-8 is low, diminishing with
increasing string length. For instance, the octet values C0, C1, F5 to
FF never appear. For better reliability, regular expressions can be
used to take into account illegal overlong and surrogate values (see
the W3 FAQ: Multilingual Forms for a Perl regular expression to
validate a UTF-8 string).

http://en.wikipedia.org/wiki/UTF-8#Advantages

  Réponse avec citation
Réponse


Outils de la discussion

Règles de messages
Vous ne pouvez pas créer de nouvelles discussions
Vous ne pouvez pas envoyer des réponses
Vous ne pouvez pas envoyer des pièces jointes
Vous ne pouvez pas modifier vos messages

Les balises BB sont activées : oui
Les smileys sont activés : oui
La balise [IMG] est activée : oui
Le code HTML peut être employé : non
Trackbacks are oui
Pingbacks are oui
Refbacks are oui


Fuseau horaire GMT +1. Il est actuellement 07h29.


Édité par : vBulletin® version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC5 Tous droits réservés.
Version française #16 par l'association vBulletin francophone
PHWinfo est un site Éducation Sans Frontières
Ad Management by RedTyger
©Tous droits réservés par les parties respectives
Page generated in 0,11735 seconds with 13 queries