|
|
|
|
||||||
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
Has anyone actually managed to print non-English text by using wcout or
wprintf and the rest of standard, wide character functions? |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
Ioannis Vranos wrote:
> Has anyone actually managed to print non-English text by using wcout or > wprintf and the rest of standard, wide character functions? For example: [john@localhost src]$ cat main.cc #include <iostream> int main() { using namespace std; wcout<< L"Äïêéìáóôéêü ìÞíõìá\n"; } [john@localhost src]$ ./foobar-cpp ??????????? ?????? [john@localhost src]$ |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
Ioannis Vranos wrote:
> Ioannis Vranos wrote: >> Has anyone actually managed to print non-English text by using wcout or >> wprintf and the rest of standard, wide character functions? > > > For example: > > [john@localhost src]$ cat main.cc > #include <iostream> > > int main() > { > using namespace std; > > wcout<< L"Δοκιμαστικό μήνυμα\n"; Are you sure that you stored your source file in the same encoding the compiler expects as source character set? > } > > [john@localhost src]$ ./foobar-cpp > ??????????? ?????? > > [john@localhost src]$ |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
Rolf Magnus wrote:
> Ioannis Vranos wrote: > >> Ioannis Vranos wrote: >>> Has anyone actually managed to print non-English text by using wcout or >>> wprintf and the rest of standard, wide character functions? >> >> For example: >> >> [john@localhost src]$ cat main.cc >> #include <iostream> >> >> int main() >> { >> using namespace std; >> >> wcout<< L"Δοκιμαστικό μήνυμα\n"; > > Are you sure that you stored your source file in the same encoding the > compiler expects as source character set? > >> } >> >> [john@localhost src]$ ./foobar-cpp >> ??????????? ?????? >> >> [john@localhost src]$ Well I created the file with anjuta editor with the message being a Greek one. The Greek message also appears the same when I display the source file in the console. I suppose it is saved as UTF8. Also the code #include <iostream> #include <string> int main() { using namespace std; wstring s; wcin>> s; wcout<< s<< endl; } displays nothing when I enter greek text. Should I mess with locales? |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
Ioannis Vranos wrote:
> Rolf Magnus wrote: >> Ioannis Vranos wrote: >> >>> Ioannis Vranos wrote: >>>> Has anyone actually managed to print non-English text by using wcout or >>>> wprintf and the rest of standard, wide character functions? >>> >>> For example: >>> >>> [john@localhost src]$ cat main.cc >>> #include <iostream> >>> >>> int main() >>> { >>> using namespace std; >>> >>> wcout<< L"Δοκιμαστικό μήνυμα\n"; >> >> Are you sure that you stored your source file in the same encoding the >> compiler expects as source character set? >> >>> } >>> >>> [john@localhost src]$ ./foobar-cpp >>> ??????????? ?????? >>> >>> [john@localhost src]$ > > > Well I created the file with anjuta editor with the message being a > Greek one. The Greek message also appears the same when I display the > source file in the console. > > I suppose it is saved as UTF8. > > > Also the code > > #include <iostream> > #include <string> > > int main() > { > using namespace std; > > wstring s; > > wcin>> s; > > > wcout<< s<< endl; > } > > > displays nothing when I enter greek text. both in g++ under Linux and VC++ 2008 Express under Windows, with the latest saving the source code file as Unicode after it detected non-english text. > Should I mess with locales? |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
Made more precise:
Ioannis Vranos wrote: >>>> For example: >>>> >>>> [john@localhost src]$ cat main.cc >>>> #include <iostream> >>>> >>>> int main() >>>> { >>>> using namespace std; >>>> >>>> wcout<< L"Äïêéìáóôéêü ìÞíõìá\n"; >>> >>> Are you sure that you stored your source file in the same encoding the >>> compiler expects as source character set? >>> >>>> } >>>> >>>> [john@localhost src]$ ./foobar-cpp >>>> ??????????? ?????? >>>> >>>> [john@localhost src]$ >> >> >> Well I created the file with anjuta editor with the message being a >> Greek one. The Greek message also appears the same when I display the >> source file in the console. >> >> I suppose it is saved as UTF8. >> >> >> Also the code >> >> #include <iostream> >> #include <string> >> >> int main() >> { >> using namespace std; >> >> wstring s; >> wcin>> s; >> >> wcout<< s<< endl; >> } >> >> displays the Greek text when I enter it, but outputs nothing. With English text, the text is displayed both when entered and outputed. [john@localhost src]$ ./foobar-cpp Äïêéìáóôéêü [john@localhost src]$ ./foobar-cpp Test Test [john@localhost src]$ > both in g++ under Linux and VC++ 2008 Express under Windows, with the > latest saving the source code file as Unicode after it detected > non-english text. > > >> Should I mess with locales? |
|
|
|
#7 |
|
Messages: n/a
Hébergeur: |
Ioannis Vranos wrote:
> Ioannis Vranos wrote: >> Has anyone actually managed to print non-English text by using wcout >> or wprintf and the rest of standard, wide character functions? > > > For example: > > [john@localhost src]$ cat main.cc > #include <iostream> > > int main() > { > using namespace std; > > wcout<< L"Äïêéìáóôéêü ìÞíõìá\n"; > } > > [john@localhost src]$ ./foobar-cpp > ??????????? ?????? > > [john@localhost src]$ Hmmm... I work almost entirely in English, so this error message is new to me: $ make g++ -ansi -pedantic -Wall main.cc -o main main.cc: In function 'int main()': main.cc:4: error: converting to execution character set: Invalid or incomplete multibyte or wide character make: *** [main] Error 1 |
|
|
|
#8 |
|
Messages: n/a
Hébergeur: |
On Sat, 23 Feb 2008 13:11:09 +0200, Ioannis Vranos
<ivranos@nospam.no.spamfreemail.gr> wrote: > [...] >>> Also the code > [...]displays the Greek text when I enter it, but outputs nothing. With > English text, the text is displayed both when entered and outputed. I don't remember anymore the details but the problem has something to do with codecvt: Your wide characters are automatically converted to narrow characters by wcout. This is something you might not want (and even if you want it the conversion might not work automatically the way you expect .Try writing to wstringstream and converting to UTF-8 explicitly (storing the result eg. in string). If your console supports UTF-8 you can print to cout (otherwise print to a file so you can test the output in an editor). HTH, Boris |
|
|
|
#9 |
|
Messages: n/a
Hébergeur: |
Jeff Schwab wrote:
> Ioannis Vranos wrote: >> Ioannis Vranos wrote: >>> Has anyone actually managed to print non-English text by using wcout >>> or wprintf and the rest of standard, wide character functions? >> >> >> For example: >> >> [john@localhost src]$ cat main.cc >> #include <iostream> >> >> int main() >> { >> using namespace std; >> >> wcout<< L"Äïêéìáóôéêü ìÞÃõìá\n"; >> } >> >> [john@localhost src]$ ./foobar-cpp >> ??????????? ?????? >> >> [john@localhost src]$ > > Hmmm... I work almost entirely in English, so this error message is new > to me: > > $ make > g++ -ansi -pedantic -Wall main.cc -o main > main.cc: In function 'int main()': > main.cc:4: error: converting to execution character set: Invalid or > incomplete multibyte or wide character > make: *** [main] Error 1 I tried the same: [john@localhost src]$ g++ -ansi -pedantic-errors -Wall main.cc -o foobar-cpp [john@localhost src]$ Perhaps when you copy and paste the greek text, you copy garbage (that is, not viewing the message in the correct character set in your newsgroup reader). So, I repost the code in this message which is encoded to Unicode (UTF-8): #include <iostream> int main() { using namespace std; wcout<< L"Δοκιμαστικό μήνυμα\n"; } |
|
|
|
#10 |
|
Messages: n/a
Hébergeur: |
Ioannis Vranos wrote:
> Jeff Schwab wrote: >> Ioannis Vranos wrote: >>> Ioannis Vranos wrote: >>>> Has anyone actually managed to print non-English text by using wcout >>>> or wprintf and the rest of standard, wide character functions? >>> >>> >>> For example: >>> >>> [john@localhost src]$ cat main.cc >>> #include <iostream> >>> >>> int main() >>> { >>> using namespace std; >>> >>> wcout<< L"Äïêéìáóôéêü ìÞÃõìá\n"; >>> } >>> >>> [john@localhost src]$ ./foobar-cpp >>> ??????????? ?????? >>> >>> [john@localhost src]$ >> >> Hmmm... I work almost entirely in English, so this error message is >> new to me: >> >> $ make >> g++ -ansi -pedantic -Wall main.cc -o main >> main.cc: In function 'int main()': >> main.cc:4: error: converting to execution character set: Invalid or >> incomplete multibyte or wide character >> make: *** [main] Error 1 > > > I tried the same: > > [john@localhost src]$ g++ -ansi -pedantic-errors -Wall main.cc -o > foobar-cpp > > [john@localhost src]$ > > > Perhaps when you copy and paste the greek text, you copy garbage (that > is, not viewing the message in the correct character set in your > newsgroup reader). > > > So, I repost the code in this message which is encoded to Unicode (UTF-8): > > > #include <iostream> > > int main() > { > using namespace std; > > wcout<< L"Δοκιμαστικό μήνυμα\n"; > } Thanks, you were correct. Here's what I thought was "supposed" to be the portable solution: #include <iostream> #include <locale> int main() { std::wcout.imbue(std::locale("el_GR.UTF-8")); std::wcout << L"Δοκιμαστικό μήνυμα\n"; } However, my system still shows question marks for this. For whatever it's worth, here's the (probably incorrect) way that appears to work on my system: #include <iostream> #include <locale> int main() { std::cout.imbue(std::locale("")); std::cout << "Δοκιμαστικό μήνυμα\n"; } |
|
|
|
#11 |
|
Messages: n/a
Hébergeur: |
Jeff Schwab wrote:
> >> So, I repost the code in this message which is encoded to Unicode >> (UTF-8): >> >> >> #include <iostream> >> >> int main() >> { >> using namespace std; >> >> wcout<< L"Δοκιμαστικό μήνυμα\n"; >> } > > Thanks, you were correct. > > Here's what I thought was "supposed" to be the portable solution: > > #include <iostream> > #include <locale> > > int main() { > std::wcout.imbue(std::locale("el_GR.UTF-8")); > std::wcout << L"Δοκιμαστικό μήνυμα\n"; > } > > However, my system still shows question marks for this. For whatever > it's worth, here's the (probably incorrect) way that appears to work on > my system: > > #include <iostream> > #include <locale> > > int main() { > std::cout.imbue(std::locale("")); > std::cout << "Δοκιμαστικό μήνυμα\n"; > } "Strangely" these also happen to my Linux box with "gcc version 4.1.2 20070626". cout prints Greek without the L notation to the string literal. The same with wcout prints an empty line. The same with wcout and L notation prints question marks. This made me think to use plain cout, and it also works: #include <iostream> int main() { std::cout << "Δοκιμαστικό μήνυμα\n"; } also prints the Greek message. Seeing this I am assuming char is implemented as unsigned char and this is working because Greek is provided in the extended ASCII character set (values 128-255) supported by my system (I have set the regional settings under GNOME etc). However why does this also work for you? The code #include <iostream> #include <limits> int main() { using namespace std; cout<< static_cast<int>( numeric_limits<char>::max() )<< endl; } produces in my system: [john@localhost src]$ ./foobar-cpp 127 [john@localhost src]$ so I am wrong, char is implemented as signed char, and no extended ASCII takes place. Strange. |
|
|
|
#12 |
|
Messages: n/a
Hébergeur: |
Based on the MSDN example:
// basic_ios_imbue.cpp // compile with: /EHsc #include <iostream> #include <locale> int main( ) { using namespace std; cout.imbue( locale( "french_france" ) ); double x = 1234567.123456; cout << x << endl; } that doesn't work in my GCC, this works: #include <iostream> #include <limits> int main() { using namespace std; cout.imbue( locale( "greek" ) ); cout<< "Δοκιμαστικό\n"; } This also works: #include <iostream> #include <limits> int main() { using namespace std; cout.imbue( locale( "en_US" ) ); cout<< "Δοκιμαστικό\n"; } Crazy stuff. |
|
|
|
#13 |
|
Messages: n/a
Hébergeur: |
It looks like GCC has the opposite stuff, cout, cin, string work as
wcout, wcin, wstring and vice versa! Bug? #include <iostream> int main() { using namespace std; wstring ws; wcin>> ws; cout<< ws.size()<< endl; } [john@localhost src]$ ./foobar-cpp Δοκιμαστικό 0 [john@localhost src]$ #include <iostream> int main() { using namespace std; string s; cin>> s; cout<< s.size()<< endl; } [john@localhost src]$ ./foobar-cpp Δοκιμαστικό 22 [john@localhost src]$ #include <iostream> int main() { using namespace std; string s; cin>> s; cout<< s<< endl; } [john@localhost src]$ ./foobar-cpp Δοκιμαστικό Δοκιμαστικό [john@localhost src]$ #include <iostream> int main() { using namespace std; wstring ws; wcin>> ws; wcout<< ws<< endl; } [john@localhost src]$ ./foobar-cpp Δοκιμαστικό [john@localhost src]$ #include <iostream> int main() { using namespace std; cout<< "Δοκιμαστικό-11\n"; wcout<< "Δοκιμαστικό-22\n"; cout<< L"Δοκιμαστικό-33\n"; wcout<< L"Δοκιμαστικό-44\n"; } [john@localhost src]$ ./foobar-cpp Δοκιμαστικό-11 -22 0x80488c8�������Ĺ��-44 [john@localhost src]$ Conclusion: It appears GCC has the wide character stuff messed up, or I am missing important knowledge. |
|
|
|
#14 |
|
Messages: n/a
Hébergeur: |
Ioannis Vranos wrote:
> It looks like GCC has the opposite stuff, cout, cin, string work as > wcout, wcin, wstring and vice versa! Bug? .... > Conclusion: It appears GCC has the wide character stuff messed up, or I > am missing important knowledge. You and me both. I would be very surprised if this were a GCC bug (I'm using 4.2.4 pre-release), but I'm guessing somebody here knows a lot more about this than we do, and is willing to enlighten us. ![]() |
|
|
|
#15 |
|
Messages: n/a
Hébergeur: |
* Jeff Schwab:
> Ioannis Vranos wrote: >> It looks like GCC has the opposite stuff, cout, cin, string work as >> wcout, wcin, wstring and vice versa! Bug? > ... >> Conclusion: It appears GCC has the wide character stuff messed up, or >> I am missing important knowledge. > > You and me both. I would be very surprised if this were a GCC bug (I'm > using 4.2.4 pre-release), but I'm guessing somebody here knows a lot > more about this than we do, and is willing to enlighten us. ![]() As has been remarked else-thread, by Rolf Magnus, one issue, relevant for literal strings, is the compiler's translation (or lack of translation) of the source code text's character set to the execution character set. Ans as has also been remarked else-thread, by Boris, one issue, relevant for i/o, is that the wide character streams convert to and from narrow characters. wcout converts to narrow characters, and wcin converts from narrow characters. They're not wide character streams, they're wide character converters. Assuming no issue with translation from source code character set to execution character set, if you use only the narrow character streams you avoid most translation. There's still translation of newlines and possibly other characters (e.g. Ctrl Z in Windows). Thus, using UTF-8 source code and UTF-8 execution environment character set, and (mostly) non-translating narrow character streams, everything should work swimmingly. Another reason to avoid the wide character streams is that they're not supported by the MingW Windows port of g++. At least, not in the version I have. And as I understand it UTF-8 is the usual in the *nix world. For an interactive Windows program, you can set the console's narrow character stream translation (to/from UCS2, which is what a console window uses internally) temporarily to UTF-8 via Windows' console API functions. Disclaimer: I've never tried this for greek text + UTF-8 encoding, because I've not had to deal with that particular issue. Cheers, & hth., - Alf -- A: Because it messes up the order in which people normally read text. Q: Why is it such a bad thing? A: Top-posting. Q: What is the most annoying thing on usenet and in e-mail? |
|
|
|
#16 |
|
Messages: n/a
Hébergeur: |
Alf P. Steinbach wrote:
> * Jeff Schwab: >> Ioannis Vranos wrote: >>> It looks like GCC has the opposite stuff, cout, cin, string work as >>> wcout, wcin, wstring and vice versa! Bug? >> ... >>> Conclusion: It appears GCC has the wide character stuff messed up, or >>> I am missing important knowledge. >> >> You and me both. I would be very surprised if this were a GCC bug >> (I'm using 4.2.4 pre-release), but I'm guessing somebody here knows a >> lot more about this than we do, and is willing to enlighten us. ![]() > > As has been remarked else-thread, by Rolf Magnus, one issue, relevant > for literal strings, is the compiler's translation (or lack of > translation) of the source code text's character set to the execution > character set. A good point. I know my source is in UTF-8. I don't know what influences the execution character set, or how to tweak it. > Ans as has also been remarked else-thread, by Boris, one issue, relevant > for i/o, is that the wide character streams convert to and from narrow > characters. wcout converts to narrow characters, and wcin converts from > narrow characters. They're not wide character streams, they're wide > character converters. Clear as mud. ![]() > Assuming no issue with translation from source code character set to > execution character set, if you use only the narrow character streams > you avoid most translation. There's still translation of newlines and > possibly other characters (e.g. Ctrl Z in Windows). Thus, using UTF-8 > source code and UTF-8 execution environment character set, and (mostly) > non-translating narrow character streams, everything should work > swimmingly. |
|
|
|
#17 |
|
Messages: n/a
Hébergeur: |
Alf P. Steinbach wrote:
> * Jeff Schwab: >> Ioannis Vranos wrote: >>> It looks like GCC has the opposite stuff, cout, cin, string work as >>> wcout, wcin, wstring and vice versa! Bug? >> ... >>> Conclusion: It appears GCC has the wide character stuff messed up, or >>> I am missing important knowledge. >> >> You and me both. I would be very surprised if this were a GCC bug >> (I'm using 4.2.4 pre-release), but I'm guessing somebody here knows a >> lot more about this than we do, and is willing to enlighten us. ![]() > > As has been remarked else-thread, by Rolf Magnus, one issue, relevant > for literal strings, is the compiler's translation (or lack of > translation) of the source code text's character set to the execution > character set. There isn't such issue here, cout prints Greek literal correctly and wcout not. Also cin and string read and store Greek text correctly while wcin and wstring look like they do not work for Greek text input. > Ans as has also been remarked else-thread, by Boris, one issue, relevant > for i/o, is that the wide character streams convert to and from narrow > characters. wcout converts to narrow characters, and wcin converts from > narrow characters. They're not wide character streams, they're wide > character converters. I am not sure I understand this. Isn't L"some text" a wide character string literal? Don't wcout, wcin and wstring provide operator<< and operator>> overloads for wide characters and wide character strings? > Assuming no issue with translation from source code character set to > execution character set, if you use only the narrow character streams > you avoid most translation. What do you mean by "narrow character" streams? char streams right? > There's still translation of newlines and > possibly other characters (e.g. Ctrl Z in Windows). Thus, using UTF-8 > source code and UTF-8 execution environment character set, and (mostly) > non-translating narrow character streams, everything should work > swimmingly. > > Another reason to avoid the wide character streams is that they're not > supported by the MingW Windows port of g++. This is irrelevant. MINGW's problems are MINGW problems, I am using GCC under Linux (Scientific Linux 5.1 which is essentially Red Hat Enterprise Linux 5.1 source code recompiled, like CentOS - give them a try). Also I have MS Visual C++ 2008 Express installed. > At least, not in the version I have. > > And as I understand it UTF-8 is the usual in the *nix world. > > For an interactive Windows program, you can set the console's narrow > character stream translation (to/from UCS2, which is what a console > window uses internally) temporarily to UTF-8 via Windows' console API > functions. > > > Disclaimer: I've never tried this for greek text + UTF-8 encoding, > because I've not had to deal with that particular issue. Can you pinpoint where our code is wrong? Essentially the following: #include <iostream> #include <string> int main() { using namespace std; wcout<< "Give wide character input: "; wstring ws; wcin>> ws; wcout<< "You gave: "<< ws << endl; } It produces: [john@localhost src]$ ./foobar-cpp Give wide character input: Δοκιμαστικό You gave: [john@localhost src]$ while the code: #include <iostream> #include <string> int main() { using namespace std; cout<< "Give wide character input: "; string s; cin>> s; cout<< "You gave: "<< s << endl; } produces: [john@localhost src]$ ./foobar-cpp Give wide character input: Δοκιμαστικό You gave: Δοκιμαστικό [john@localhost src]$ |
|
|
|
#18 |
|
Messages: n/a
Hébergeur: |
I posted the following to c.l.c., and I think it is useful to post it
here too: [The current message encoding is set to Unicode (UTF-8) because it contains Greek] The following code does not work as expected: #include <wchar.h> #include <locale.h> #include <stdio.h> #include <stddef.h> int main() { char *p= setlocale( LC_ALL, "Greek" ); wchar_t input[50]; if (!p) printf("NULL returned!\n"); fgetws(input, 50, stdin); wprintf(L"%s\n", input); return 0; } Under Linux: [john@localhost src]$ ./foobar-cpp Test T [john@localhost src]$ [john@localhost src]$ ./foobar-cpp Δοκιμαστικό � [john@localhost src]$ Under MS Visual C++ 2008 Express: Test Test Press any key to continue . . . Δοκιμαστικό ??????ε???? Press any key to continue . . . Am I missing something? |
|
|
|
#19 |
|
Messages: n/a
Hébergeur: |
On Feb 23, 11:33 am, Rolf Magnus <ramag...@t-online.de> wrote:
> Ioannis Vranos wrote: > > Ioannis Vranos wrote: > >> Has anyone actually managed to print non-English text by > >> using wcout or wprintf and the rest of standard, wide > >> character functions? > > For example: > > [john@localhost src]$ cat main.cc > > #include <iostream> > > int main() > > { > > using namespace std; > > wcout<< L"Δοκιμαστικό μήνυμα\n"; > Are you sure that you stored your source file in the same > encoding the compiler expects as source character set? Are you sure the compiler even allows anything but US ASCII as input? The standard makes most of this implementation defined. (Logically, if you think about it. I wouldn't expect any of my files to compile without being transcoded on a machine which uses EBCDIC.) Before going any further, we have to know 1) how the Greek characters are encoded. (Probably UTF-8, since that what my editor is configured for, and I'm seeing them correctly.) And which compiler he's using, which options, and what the compiler documentation says about input file encodings. Most likely, he'll have to ask in a group for his compiler what it accepts, and how to make it accept what he's got. -- James Kanze (GABI Software) email:james.kanze@gmail.com Conseils en informatique orientée objet/ Beratung in objektorientierter Datenverarbeitung 9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34 |
|
|
|
#20 |
|
Messages: n/a
Hébergeur: |
On Feb 23, 2:59 pm, Jeff Schwab <j...@schwabcenter.com> wrote:
> Ioannis Vranos wrote: > > Jeff Schwab wrote: > > Perhaps when you copy and paste the greek text, you copy > > garbage (that is, not viewing the message in the correct > > character set in your newsgroup reader). > > So, I repost the code in this message which is encoded to > > Unicode (UTF-8): > > #include <iostream> > > int main() > > { > > using namespace std; > > wcout<< L"Δοκιμαστικό μήνυμα\n"; > > } > Thanks, you were correct. > Here's what I thought was "supposed" to be the portable solution: > #include <iostream> > #include <locale> > int main() { > std::wcout.imbue(std::locale("el_GR.UTF-8")); > std::wcout << L"Δοκιμαστικό μήνυμα\n"; > } > However, my system still shows question marks for this. For > whatever it's worth, here's the (probably incorrect) way that > appears to work on my system: > #include <iostream> > #include <locale> > int main() { > std::cout.imbue(std::locale("")); > std::cout << "Δοκιμαστικό μήνυμα\n"; > } You're still not telling us a lot of important information. What is the actual encoding used in the source file, and what are the bytes actually output. (FWIW: I think g++, and most other compilers, just pass the bytes through transparently in a narrow character string. Which means that your second code will output whatever your editor put in the source file. If you're using the same encoding everywhere, it will seem to work.) Note that there isn't really any portable solution, because so much depends on things the C++ compiler has no control over. Run the same code in two different xterm, and it can output two different things, completely; just specify a different font (option -fn) with a different encoding for one of the xterm. (And of course, it's pretty much par for the course to see one thing when you cat to the screen, and something else when you output the same file to the printer.) -- James Kanze (GABI Software) email:james.kanze@gmail.com Conseils en informatique orientée objet/ Beratung in objektorientierter Datenverarbeitung 9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34 |
|
|
|
#21 |
|
Messages: n/a
Hébergeur: |
James Kanze wrote:
> > You're still not telling us a lot of important information. > What is the actual encoding used in the source file, and what > are the bytes actually output. (FWIW: I think g++, and most > other compilers, just pass the bytes through transparently in a > narrow character string. Which means that your second code will > output whatever your editor put in the source file. If you're > using the same encoding everywhere, it will seem to work.) > > Note that there isn't really any portable solution, because so > much depends on things the C++ compiler has no control over. > Run the same code in two different xterm, and it can output two > different things, completely; just specify a different font > (option -fn) with a different encoding for one of the xterm. > (And of course, it's pretty much par for the course to see one > thing when you cat to the screen, and something else when you > output the same file to the printer.) I posted a C95 question in c.l.c., about this (which is a subset of C++03) and I got a C95 working code. My last message there: > Ben Bacarisse wrote: > > You need "%ls". This is very important with wprintf since without it > %s denotes a multi-byte character sequence. printf("%ls\n" input) > should also work. You need the w version if you want the multi-byte > conversion of %s or if the format has to be a wchar_t pointer. Perhaps you may me understand better. We have the usual char encoding which is implementation defined (usually ASCII). wchar_t is wide character encoding, which is the "largest character set supported by the system", so I suppose Unicode under Linux and Windows. What exactly is a multi-byte character? I have to say that I am talking about C95 here, not C99. > >> return 0; >> } >> >> >> Under Linux: >> >> >> [john@localhost src]$ ./foobar-cpp >> Test >> T >> [john@localhost src]$ >> >> >> [john@localhost src]$ ./foobar-cpp >> Δοκιμαστικό >> � >> [john@localhost src]$ > > The above my not be the only problem. In cases like this, you need to > say way encoding your terminal is using. You are somehow correct on this. My terminal encoding was UTF-8 and I added Greek(ISO-8859-7). Under the last, the following code works OK: #include <wchar.h> #include <locale.h> #include <stdio.h> #include <stddef.h> int main() { char *p= setlocale( LC_ALL, "Greek" ); wprintf(L"Δοκιμαστικό\n"); return 0; } [john@localhost src]$ ./foobar-cpp Δοκιμαστικό [john@localhost src]$ Also the original, fixed according to your suggestion: #include <wchar.h> #include <locale.h> #include <stdio.h> #include <stddef.h> int main() { char *p= setlocale( LC_ALL, "Greek" ); wchar_t input[50]; if (!p) printf("NULL returned!\n"); fgetws(input, 50, stdin); wprintf(L"%ls", input); return 0; } works OK too: [john@localhost src]$ ./foobar-cpp Δοκιμαστικό Δοκιμαστικό [john@localhost src]$ It works OK under Terminal UTF-8 default encoding too. So "%ls" is what was really needed. BTW, how can we define UTF-8 as the locale? Thanks a lot. |
|
|
|
#22 |
|
Messages: n/a
Hébergeur: |
On Feb 23, 5:07 pm, Ioannis Vranos <ivra...@nospam.no.spamfreemail.gr>
wrote: > Jeff Schwab wrote: [...] > > However, my system still shows question marks for this. For > > whatever it's worth, here's the (probably incorrect) way > > that appears to work on my system: > > #include <iostream> > > #include <locale> > > int main() { > > std::cout.imbue(std::locale("")); > > std::cout << "Δοκιμαστικό μήνυμα\n"; > > } > "Strangely" these also happen to my Linux box with "gcc > version 4.1.2 20070626". > cout prints Greek without the L notation to the string > literal. > The same with wcout prints an empty line. I don't think the problem is so much wcout, as the wide character literal. The compiler is obliged to do interpret the contents of the literal in some way, and I would guess that it's not doing this in a way conform with the input you've given it. What does the compiler documentation say about how it processes characters outside of the basic character set? What happens if you replace your characters with their UCN, e.g.: std::wcout << L"\u0394\u03BF..." ; ? > The same with wcout and L notation prints question marks. > This made me think to use plain cout, and it also works: > #include <iostream> > int main() > { > std::cout << "Δοκιμαστικό μήνυμα\n"; > } > also prints the Greek message. > Seeing this I am assuming char is implemented as unsigned char > and this is working because Greek is provided in the extended > ASCII character set (values 128-255) supported by my system (I > have set the regional settings under GNOME etc). However why > does this also work for you? Most likely, the compiler is just generating code which copies the characters bit patterns, without ever looking at their numeric values. So the signedness of char is irrelevant (here---in other places, it can cause problems). > The code > #include <iostream> > #include <limits> > int main() > { > using namespace std; > cout<< static_cast<int>( numeric_limits<char>::max() )<< endl; > } > produces in my system: > [john@localhost src]$ ./foobar-cpp > 127 In other words, plain char is signed. (It usually is, for some reason.) > [john@localhost src]$ > so I am wrong, char is implemented as signed char, and no > extended ASCII takes place. There's no such thing as "extended ASCII":-). Still, I regularly used ISO 8859-15 in plain char's, on machines which are signed. If I look at the numeric value of the char, it's wrong, but the bits are right, and they get copied through correctly. I just have to be careful when I use functions which expect an int in the range [0...UCHAR_MAX]. (Those in the <cctype> header, for example.) -- James Kanze (GABI Software) email:james.kanze@gmail.com Conseils en informatique orientée objet/ Beratung in objektorientierter Datenverarbeitung 9 place Sémard, 78210 St.-Cyr-l'École, France, +33 (0)1 30 23 00 34 |
|
|
|
#23 |