PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Forums Hébergement > Forum Serveur - Sécurité et techniques > comp.unix.shell > Removing octal characters from a file
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
comp.unix.shell Using and programming the Unix shell.

Removing octal characters from a file

Réponse
 
LinkBack Outils de la discussion
Vieux 07/09/2007, 17h29   #1
paintedjazz@gmail.com
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Removing octal characters from a file

Is there a way to remove octal characters e.g. \302\271 or
\342\204\242 using perl or sed or awk. What I would prefer to do is
remove all globally with one command. I'm not sure how I would enter
them as a range or even if that's even possible.

If it's not asking too much, is there also a way to incorporate one
exception into this to replace (rather than remove) octal chars used
by Microsoft instead of a simple apostrophe. Thanks a bunch for any
.

  Réponse avec citation
Vieux 07/09/2007, 18h43   #2
Icarus Sparry
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Removing octal characters from a file

On Fri, 07 Sep 2007 16:29:39 +0000, paintedjazz wrote:

> Is there a way to remove octal characters e.g. \302\271 or \342\204\242
> using perl or sed or awk. What I would prefer to do is remove all
> globally with one command. I'm not sure how I would enter them as a
> range or even if that's even possible.
>
> If it's not asking too much, is there also a way to incorporate one
> exception into this to replace (rather than remove) octal chars used by
> Microsoft instead of a simple apostrophe. Thanks a bunch for any .


With perl it is easy, the only question is what characters to keep.
Here I keep \010 (backspace) \011 (tab) and \012 (linefeed, used as
newline by unix), fram the control characters, \040 (space) to \176 (~).

perl -pi.bak -e 's/[\000-\007\013-\037\177-\377]//g;' filename

If you know what characters microsoft use, then you can certainly use

perl -pi.bak -e 's/MQC/'\''/g; s/[\000-\007\013-\037\177-\377]//g' filename

where MQC is the microsoft code for a quote.
  Réponse avec citation
Vieux 07/09/2007, 18h49   #3
Janis Papanagnou
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Removing octal characters from a file

paintedjazz@gmail.com wrote:
> Is there a way to remove octal characters


What do you think "octal characters" are?

> e.g. \302\271 or
> \342\204\242 using perl or sed or awk.


The above are just character strings; escaped representations of 8-bit
values composed by octal digits.

Do you want to remove the character that may be displayed as "\302" or
do you want to remove the four-character-sequence '\', '3', '0', '2'
from your data?

> What I would prefer to do is
> remove all globally with one command. I'm not sure how I would enter
> them as a range or even if that's even possible.


If you have just charcter ranges to remove then use tr -d .

But if you have the character string representation as given above it
might be easier to do it in two steps; first transform valid sequences
of "\[0-3][0-7][0-7]" into the respective character, then use tr -d
with a range of characters (possibly also specified octal) to delete.

If you explain your task clearer we can you further in detail.

> If it's not asking too much, is there also a way to incorporate one
> exception into this to replace (rather than remove) octal chars used
> by Microsoft instead of a simple apostrophe. Thanks a bunch for any
> .


What do you mean by "octal chars used by Microsoft"?

Again, if it's just the characters then use tr (in this case without
option -d) as in, for example, tr \' \" or tr A-Z a-z

If you choose to use awk you may use the same function gsub() for both,
replace and remove.

Janis
  Réponse avec citation
Vieux 07/09/2007, 20h39   #4
Cyrus Kriticos
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: Removing octal characters from a file

paintedjazz@gmail.com wrote:
> Is there a way to remove octal characters e.g. \302\271 or
> \342\204\242 using perl or sed or awk. What I would prefer to do is
> remove all globally with one command.


$ echo 'ab\cdd23444\342333\204\242' | sed 's/\\[0-3][0-7][0-7]//g'
ab\cdd23444333

--
Best regards | "The only way to really learn scripting is to write
Cyrus | scripts." -- Advanced Bash-Scripting Guide
  Réponse avec citation
Réponse


Outils de la discussion

Règles de messages
Vous ne pouvez pas créer de nouvelles discussions
Vous ne pouvez pas envoyer des réponses
Vous ne pouvez pas envoyer des pièces jointes
Vous ne pouvez pas modifier vos messages

Les balises BB sont activées : oui
Les smileys sont activés : oui
La balise [IMG] est activée : oui
Le code HTML peut être employé : non
Trackbacks are oui
Pingbacks are oui
Refbacks are oui


Fuseau horaire GMT +1. Il est actuellement 14h06.


Édité par : vBulletin® version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC5 Tous droits réservés.
Version française #16 par l'association vBulletin francophone
PHWinfo est un site Éducation Sans Frontières ©2000-2008
Ad Management by RedTyger
©Tous droits réservés par les parties respectives
Page generated in 0,12906 seconds with 12 queries