PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Autres forums > Forum Programmation & Conception > comp.lang.cplus > C++0x two Unicode proposals. A correction one and a different one
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
C++0x two Unicode proposals. A correction one and a different one

Réponse
 
LinkBack Outils de la discussion
Vieux 17/01/2008, 11h23   #1
Ioannis Vranos
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut C++0x two Unicode proposals. A correction one and a different one

Based on a discussion about Unicode in clc++ inside a discussion thread
with subject "next ISO C++ standard", and the data provided in
http://en.wikipedia.org/wiki/C%2B%2B0x , and with the design ideals:

1. To provide Unicode support in C++0x always and explicitly.
2. To provide support to all Unicode sets out there.


I think the implementation of these as:

a) char, char16_t and char32_t types.
b) built-in Unicode literals.

should become:

I) Library, implementation defined types like utf8_char, utf16_char, and
utf32_char, leaving alone and not polluting the existing built in types
like char for now and in the future.

II) Leave b) as it is.


In this way, the built in types are not polluted with additional
ever-growing list of UTFs, while in the future the old ones can easily
be deprecated/obsoleted in the library. The pollution of an ever growing
list of UTF characters and literals will be minimal.

Also I think this UTF implementation change will cause minimal change in
the existing C++0x.

---------------------------------------------------------------------------


My second thought on this, is that Unicode support should also become
optional. This will further decrease pollution of built in types and
string literals. An implementation should be able to choose whether it
will support Unicode and which one.
  Réponse avec citation
Vieux 17/01/2008, 14h07   #2
Phil Endecott
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: C++0x two Unicode proposals. A correction one and a differentone

Ioannis Vranos wrote:
> Based on a discussion about Unicode in clc++ inside a discussion thread
> with subject "next ISO C++ standard", and the data provided in
> http://en.wikipedia.org/wiki/C%2B%2B0x , and with the design ideals:
>
> 1. To provide Unicode support in C++0x always and explicitly.
> 2. To provide support to all Unicode sets out there.
>
>
> I think the implementation of these as:
>
> a) char, char16_t and char32_t types.
> b) built-in Unicode literals.
>
> should become:
>
> I) Library, implementation defined types like utf8_char, utf16_char, and
> utf32_char, leaving alone and not polluting the existing built in types
> like char for now and in the future.


The problem is that if the library does something like this:

typedef uint32_t char32_t;

then when I write

char32_t c = L'a';
cout << c;

It will output c as "64", not 'c', because the overloading of operator<<
can't detect the typedef.

The library could implement a char32_t like

class char32_t {
uint32_t impl;
....
};

but that has its own problems. It all works OK if these are built-in types.

> II) Leave b) as it is.


So if I write a UTF-16 literal using the built-in literal syntax, what
is its type? It has to be a built-in type, not a library type.


Phil.
  Réponse avec citation
Vieux 17/01/2008, 19h13   #3
Ioannis Vranos
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: C++0x two Unicode proposals. A correction one and a differentone

Phil Endecott wrote:
> Ioannis Vranos wrote:
>> Based on a discussion about Unicode in clc++ inside a discussion thread
>> with subject "next ISO C++ standard", and the data provided in
>> http://en.wikipedia.org/wiki/C%2B%2B0x , and with the design ideals:
>>
>> 1. To provide Unicode support in C++0x always and explicitly.
>> 2. To provide support to all Unicode sets out there.
>>
>>
>> I think the implementation of these as:
>>
>> a) char, char16_t and char32_t types.
>> b) built-in Unicode literals.
>>
>> should become:
>>
>> I) Library, implementation defined types like utf8_char, utf16_char, and
>> utf32_char, leaving alone and not polluting the existing built in types
>> like char for now and in the future.

>
> The problem is that if the library does something like this:
>
> typedef uint32_t char32_t;
>
> then when I write
>
> char32_t c = L'a';
> cout << c;
>
> It will output c as "64", not 'c', because the overloading of operator<<
> can't detect the typedef.



Well, then the library should not do that typedef and operator<< of cout
should be implemented to work with the provided character type.


> The library could implement a char32_t like
>
> class char32_t {
> uint32_t impl;
> ....
> };
>
> but that has its own problems. It all works OK if these are built-in
> types.



If your above type suggestion is not possible to be implemented, why not
focusing on providing language tools that make it possible instead?



>
>> II) Leave b) as it is.

>
> So if I write a UTF-16 literal using the built-in literal syntax, what
> is its type? It has to be a built-in type, not a library type.



It can be a library type. AFAIK a built-in type can also look like a
library type, if it is hidden when the equivalent header is not #included.

In any case my main point of my "correction" proposal, is that the C++
built-in types should not be tied with a specific character encoding system.

Consider the possibility if after some years, a now non-existent, new
character system becomes the dominant one, while C++ built in types are
tied with Unicode.

Having any specific character system provided as a library extension
(implementation-defined type), C++ will have the flexibility to adapt to
new character systems that will emerge in the future without messing
with its built in types.

The same way math-specific types should not become built-in in C++ but
as library extensions, I think the same should happen with character
systems, regular expressions etc.

So as another example, although probably not needed in standard C++,
let's consider adding EBCDIC support explicitly as a library extension.

Something like:

#include <whatever>

// ...
std::ebcdic_char *p= EB"This is a text";
std::ebcdic char c= EB'c';


This style can work for whatever character type system. UTF8, UTF16,
UTF32 whatever.

I think tiying any specific character system with built in types, is
Java-style approach (like C#/.NET etc.) which is a whole framework and
not a programming language alone, and can be changed at will.


Apart from this, I also think that wchar_t should be the largest
character system a specific compiler provides, so for example if a
compiler provides UTF32 as its largest character type, for this compiler
wchar_t should be equivalent with the UTF32 character type of this
compiler.
  Réponse avec citation
Réponse


Outils de la discussion

Règles de messages
Vous ne pouvez pas créer de nouvelles discussions
Vous ne pouvez pas envoyer des réponses
Vous ne pouvez pas envoyer des pièces jointes
Vous ne pouvez pas modifier vos messages

Les balises BB sont activées : oui
Les smileys sont activés : oui
La balise [IMG] est activée : oui
Le code HTML peut être employé : non
Trackbacks are oui
Pingbacks are oui
Refbacks are oui


Fuseau horaire GMT +1. Il est actuellement 19h21.


Édité par : vBulletin® version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC5 Tous droits réservés.
Version française #16 par l'association vBulletin francophone
PHWinfo est un site Éducation Sans Frontières ©2000-2008
Ad Management by RedTyger
©Tous droits réservés par les parties respectives
Page generated in 1,17551 seconds with 11 queries