PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Forums Hébergement > Forum Serveur - Sécurité et techniques > comp.unix.shell > Windows 1252 to iso-8859-1 without iconv or recode?
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
comp.unix.shell Using and programming the Unix shell.

Windows 1252 to iso-8859-1 without iconv or recode?

Réponse
 
LinkBack Outils de la discussion
Vieux 23/04/2008, 22h29   #1
dutone
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Windows 1252 to iso-8859-1 without iconv or recode?

I have some text files that were saved in Windows as ASCII which,
unfortunately, causes the text file to contain non-control chars in
the range that iso-8859-1 defines control chars.

iconv and recode do not convert or drop these 1252 codes (145,146, and
147) to the appropriate iso-8859-1 equivalents and instead give me
garbage.

Is there a utility that I can use to convert the chars appropriately?

  Réponse avec citation
Vieux 23/04/2008, 22h43   #2
dutone
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Windows 1252 to iso-8859-1 without iconv or recode?

On Apr 23, 2:29 pm, dutone <dut...@hotmail.com> wrote:
> I have some text files that were saved in Windows as ASCII which,
> unfortunately, causes the text file to contain non-control chars in
> the range that iso-8859-1 defines control chars.
>
> iconv and recode do not convert or drop these 1252 codes (145,146, and
> 147) to the appropriate iso-8859-1 equivalents and instead give me
> garbage.
>
> Is there a utility that I can use to convert the chars appropriately?


Note that I can do this via Perl or Sed via perl -pe"s/\x92/'/g"

But was wondering if there was an existing util and/or why iconv and
recode don't convert when possible.

  Réponse avec citation
Vieux 23/04/2008, 22h47   #3
Lew Pitcher
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Windows 1252 to iso-8859-1 without iconv or recode?

In comp.unix.shell, dutone wrote:

> I have some text files that were saved in Windows as ASCII which,
> unfortunately, causes the text file to contain non-control chars in
> the range that iso-8859-1 defines control chars.


That would be impossible to do with /ASCII/. I'm sure that you mean that you
saved the text files in the CP1252 characterset (/not/ the ASCII
characterset), and are having problems converting from CP1252 to ISO-8859-1

> iconv and recode do not convert or drop these 1252 codes (145,146, and
> 147)


Yup, that's not ASCII. ASCII characters range from 0 to 127 inclusive. If
the character value exceeds 127, then you /don't/ have ASCII

> to the appropriate iso-8859-1 equivalents and instead give me
> garbage.
>
> Is there a utility that I can use to convert the chars appropriately?


In CP1252,
character 145 is LEFT SINGLE QUOTATION MARK,
character 146 is RIGHT SINGLE QUOTATION MARK, and
character 147 is LEFT DOUBLE QUOTATION MARK
(courtesy of the ISO Internationalization working group's characterset map
at http://anubis.dkuug.dk/i18n/charmaps/CP1252 )

In ISO-8895-1 (http://anubis.dkuug.dk/i18n/charmaps/ISO_8859-1) there
doesn't seem to be a corresponding character (codepoint) for any of those
three characters. By rights, they all should map to the 0x1a (SUB)
character.

I know of no utility save iconv that would convert these for you. Perhaps
you can convert in two stages: CP1252 to Unicode, and Unicode to
ISO-8895-1.

Luck be with you
--
Lew Pitcher

Master Codewright & JOAT-in-training | Registered Linux User #112576
http://pitcher.digitalfreehold.ca/ | GPG public key available by request
---------- Slackware - Because I know what I'm doing. ------


  Réponse avec citation
Vieux 24/04/2008, 00h25   #4
dutone
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Windows 1252 to iso-8859-1 without iconv or recode?

On Apr 23, 2:47 pm, Lew Pitcher <lpitc...@teksavvy.com> wrote:
> In comp.unix.shell, dutone wrote:
> > I have some text files that were saved in Windows as ASCII which,
> > unfortunately, causes the text file to contain non-control chars in
> > the range that iso-8859-1 defines control chars.

>
> That would be impossible to do with /ASCII/. I'm sure that you mean that you
> saved the text files in the CP1252 characterset (/not/ the ASCII
> characterset), and are having problems converting from CP1252 to ISO-8859-1


I'm sorry, I meant ANSI. Notepad's Save As ANSI option does not save
it as iso-8859-1, rather 1252.

> > iconv and recode do not convert or drop these 1252 codes (145,146, and
> > 147)

>
> Yup, that's not ASCII. ASCII characters range from 0 to 127 inclusive. If
> the character value exceeds 127, then you /don't/ have ASCII


I would expect a Windows-1252 to iso-8859-1 conversion to replace
145,146 with 39 and ,147 with 34.

Guess I'm sticking with Perl for the conversion.

Thanks.
  Réponse avec citation
Vieux 24/04/2008, 00h55   #5
Gary Johnson
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Windows 1252 to iso-8859-1 without iconv or recode?

dutone <dutone@hotmail.com> wrote:
> On Apr 23, 2:47 pm, Lew Pitcher <lpitc...@teksavvy.com> wrote:
>> In comp.unix.shell, dutone wrote:
>> > I have some text files that were saved in Windows as ASCII which,
>> > unfortunately, causes the text file to contain non-control chars in
>> > the range that iso-8859-1 defines control chars.

>>
>> That would be impossible to do with /ASCII/. I'm sure that you mean that you
>> saved the text files in the CP1252 characterset (/not/ the ASCII
>> characterset), and are having problems converting from CP1252 to ISO-8859-1

>
> I'm sorry, I meant ANSI. Notepad's Save As ANSI option does not save
> it as iso-8859-1, rather 1252.
>
>> > iconv and recode do not convert or drop these 1252 codes (145,146, and
>> > 147)

>>
>> Yup, that's not ASCII. ASCII characters range from 0 to 127 inclusive. If
>> the character value exceeds 127, then you /don't/ have ASCII

>
> I would expect a Windows-1252 to iso-8859-1 conversion to replace
> 145,146 with 39 and ,147 with 34.
>
> Guess I'm sticking with Perl for the conversion.


You can use iconv for this, but you have to add the //TRANSLIT suffix,
like this:

iconv -c -f windows-1252 -t iso-8859-1//TRANSLIT

That tells iconv to choose a symbol from the output character set that
is close to the desired symbol.

--
Gary Johnson
  Réponse avec citation
Vieux 24/04/2008, 03h07   #6
dutone
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Windows 1252 to iso-8859-1 without iconv or recode?

On Apr 23, 4:55 pm, Gary Johnson <garyj...@eskimo.com> wrote:
> dutone <dut...@hotmail.com> wrote:
> > On Apr 23, 2:47 pm, Lew Pitcher <lpitc...@teksavvy.com> wrote:
> >> In comp.unix.shell, dutone wrote:
> >> > I have some text files that were saved in Windows as ASCII which,
> >> > unfortunately, causes the text file to contain non-control chars in
> >> > the range that iso-8859-1 defines control chars.

>
> >> That would be impossible to do with /ASCII/. I'm sure that you mean that you
> >> saved the text files in the CP1252 characterset (/not/ the ASCII
> >> characterset), and are having problems converting from CP1252 to ISO-8859-1

>
> > I'm sorry, I meant ANSI. Notepad's Save As ANSI option does not save
> > it as iso-8859-1, rather 1252.

>
> >> > iconv and recode do not convert or drop these 1252 codes (145,146, and
> >> > 147)

>
> >> Yup, that's not ASCII. ASCII characters range from 0 to 127 inclusive. If
> >> the character value exceeds 127, then you /don't/ have ASCII

>
> > I would expect a Windows-1252 to iso-8859-1 conversion to replace
> > 145,146 with 39 and ,147 with 34.

>
> > Guess I'm sticking with Perl for the conversion.

>
> You can use iconv for this, but you have to add the //TRANSLIT suffix,
> like this:
>
> iconv -c -f windows-1252 -t iso-8859-1//TRANSLIT


Oh, cool. They should mention that suffix in GNU's iconv man page.

Thanks
  Réponse avec citation
Réponse


Outils de la discussion

Règles de messages
Vous ne pouvez pas créer de nouvelles discussions
Vous ne pouvez pas envoyer des réponses
Vous ne pouvez pas envoyer des pièces jointes
Vous ne pouvez pas modifier vos messages

Les balises BB sont activées : oui
Les smileys sont activés : oui
La balise [IMG] est activée : oui
Le code HTML peut être employé : non
Trackbacks are oui
Pingbacks are oui
Refbacks are oui


Fuseau horaire GMT +1. Il est actuellement 02h46.


Édité par : vBulletin® version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC5 Tous droits réservés.
Version française #16 par l'association vBulletin francophone
PHWinfo est un site Éducation Sans Frontières
Ad Management by RedTyger
©Tous droits réservés par les parties respectives
Page generated in 0,13572 seconds with 14 queries