|
|
|
|
||||||
| comp.unix.shell Using and programming the Unix shell. |
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
Is there a way to remove octal characters e.g. \302\271 or
\342\204\242 using perl or sed or awk. What I would prefer to do is remove all globally with one command. I'm not sure how I would enter them as a range or even if that's even possible. If it's not asking too much, is there also a way to incorporate one exception into this to replace (rather than remove) octal chars used by Microsoft instead of a simple apostrophe. Thanks a bunch for any . |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
On Fri, 07 Sep 2007 16:29:39 +0000, paintedjazz wrote:
> Is there a way to remove octal characters e.g. \302\271 or \342\204\242 > using perl or sed or awk. What I would prefer to do is remove all > globally with one command. I'm not sure how I would enter them as a > range or even if that's even possible. > > If it's not asking too much, is there also a way to incorporate one > exception into this to replace (rather than remove) octal chars used by > Microsoft instead of a simple apostrophe. Thanks a bunch for any . With perl it is easy, the only question is what characters to keep. Here I keep \010 (backspace) \011 (tab) and \012 (linefeed, used as newline by unix), fram the control characters, \040 (space) to \176 (~). perl -pi.bak -e 's/[\000-\007\013-\037\177-\377]//g;' filename If you know what characters microsoft use, then you can certainly use perl -pi.bak -e 's/MQC/'\''/g; s/[\000-\007\013-\037\177-\377]//g' filename where MQC is the microsoft code for a quote. |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
paintedjazz@gmail.com wrote:
> Is there a way to remove octal characters What do you think "octal characters" are? > e.g. \302\271 or > \342\204\242 using perl or sed or awk. The above are just character strings; escaped representations of 8-bit values composed by octal digits. Do you want to remove the character that may be displayed as "\302" or do you want to remove the four-character-sequence '\', '3', '0', '2' from your data? > What I would prefer to do is > remove all globally with one command. I'm not sure how I would enter > them as a range or even if that's even possible. If you have just charcter ranges to remove then use tr -d . But if you have the character string representation as given above it might be easier to do it in two steps; first transform valid sequences of "\[0-3][0-7][0-7]" into the respective character, then use tr -d with a range of characters (possibly also specified octal) to delete. If you explain your task clearer we can you further in detail. > If it's not asking too much, is there also a way to incorporate one > exception into this to replace (rather than remove) octal chars used > by Microsoft instead of a simple apostrophe. Thanks a bunch for any > . What do you mean by "octal chars used by Microsoft"? Again, if it's just the characters then use tr (in this case without option -d) as in, for example, tr \' \" or tr A-Z a-z If you choose to use awk you may use the same function gsub() for both, replace and remove. Janis |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
paintedjazz@gmail.com wrote:
> Is there a way to remove octal characters e.g. \302\271 or > \342\204\242 using perl or sed or awk. What I would prefer to do is > remove all globally with one command. $ echo 'ab\cdd23444\342333\204\242' | sed 's/\\[0-3][0-7][0-7]//g' ab\cdd23444333 -- Best regards | "The only way to really learn scripting is to write Cyrus | scripts." -- Advanced Bash-Scripting Guide |
|
![]() |
| Outils de la discussion | |
|
|