PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Forums Hébergement > Forum Serveur - Sécurité et techniques > comp.unix.shell > split text file chunks by regex line
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
comp.unix.shell Using and programming the Unix shell.

split text file chunks by regex line

Réponse
 
LinkBack Outils de la discussion
Vieux 16/03/2008, 16h55   #1
nobody@cares
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut split text file chunks by regex line

Hi,
Being a beginner in Linux I hit this problem which I know can be
solved by scripting. I recovered a bunch of files from a disk crash
that contains all my emails but they were in 2 mb chunks and in no
particular demarkation. So i combined them into one huge file.

To break them into smaller messages I need to search for a line say
"---Next_Part ...blah bla ----
an take everything between the two lines and create the new text
file as 00000????.txt. Thats all that need to be done in order for
me to read these email or resend them to myself. Sorry if I am
missing the obvious but as I said my knowledge in scripting is very
limited. Like upto shebang These emails are urgent otherwise I
would have spent some time to learn a bit more scripting.

Any is much appreciated.

Regards.

superdu
=============
Just a noob

--
--------------------------------- --- -- -
Posted with NewsLeecher v3.8 Final
Web @ http://www.newsleecher.com/?usenet
------------------- ----- ---- -- -

  Réponse avec citation
Vieux 16/03/2008, 17h29   #2
Michael Heiming
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: split text file chunks by regex line

In comp.unix.shell nobody@cares <superdu>:
> Hi,
> Being a beginner in Linux I hit this problem which I know can be
> solved by scripting. I recovered a bunch of files from a disk crash
> that contains all my emails but they were in 2 mb chunks and in no
> particular demarkation. So i combined them into one huge file.


No backup and using doze?

> To break them into smaller messages I need to search for a line say
> "---Next_Part ...blah bla ----
> an take everything between the two lines and create the new text
> file as 00000????.txt. Thats all that need to be done in order for


awk 'BEGIN{RS="---Next_Part"}{print $0 > "mail"NR}' infile

Something in the lines should do the trick.

Good luck

--
Michael Heiming (X-PGP-Sig > GPG-Key ID: EDD27B94)
mail: echo zvpunry@urvzvat.qr | perl -pe 'y/a-z/n-za-m/'
#bofh excuse 14: sounds like a Windows problem, try calling
Microsoft support
  Réponse avec citation
Vieux 16/03/2008, 17h48   #3
pk
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: split text file chunks by regex line

superdu (nobody@cares) wrote:

> Hi,
> Being a beginner in Linux I hit this problem which I know can be
> solved by scripting. I recovered a bunch of files from a disk crash
> that contains all my emails but they were in 2 mb chunks and in no
> particular demarkation. So i combined them into one huge file.
>
> To break them into smaller messages I need to search for a line say
> "---Next_Part ...blah bla ----
> an take everything between the two lines and create the new text
> file as 00000????.txt. Thats all that need to be done in order for
> me to read these email or resend them to myself. Sorry if I am
> missing the obvious but as I said my knowledge in scripting is very
> limited. Like upto shebang These emails are urgent otherwise I
> would have spent some time to learn a bit more scripting.
>
> Any is much appreciated.


There are many specialized script around (especially perl scripts) which
deal specifically with mailbox manipulation, if it's in a standard format
(eg, mbox).

In your specific case, I doubt that splitting the file using the NextPart
lines will produce something useful or immediately usable as email. Some
messages might not even have a NextPart line. Furthermore, that demarcation
is not standard and depends on the MUA used to compose the mail. The
particular string used in each email can be usually found in the
Content-Type header. I have some messages in my inbox that
use '===============12394855====.ALT' instead of NextPart.

That said, one command to split a file using a regular expression as
demarcation is csplit. To do what you want (but whose result might not be
quite what you expect) you could use eg

(GNU csplit, -z and -b seem to be nonstandard)

$ csplit -z -f 00000 -b '%04d.txt' bigfile.txt '/---Next_Part/' '{*}'

The above command will create a number of files 000000000.txt,
000000001.txt, etc., each containing the text between two
successive '---Next_Part' lines.

If the above does not suit your needs, then provide some sample input and
expected output (or look for some specific mailbox manipulation tools).

--
All the commands are tested with bash and GNU tools, so they may use
nonstandard features. I try to mention when something is nonstandard (if
I'm aware of that), but I may miss something. Corrections are welcome.
  Réponse avec citation
Vieux 16/03/2008, 19h46   #4
Ed Morton
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: split text file chunks by regex line



On 3/16/2008 10:55 AM, nobody@cares wrote:
> Hi,
> Being a beginner in Linux I hit this problem which I know can be
> solved by scripting. I recovered a bunch of files from a disk crash
> that contains all my emails but they were in 2 mb chunks and in no
> particular demarkation. So i combined them into one huge file.
>
> To break them into smaller messages I need to search for a line say
> "---Next_Part ...blah bla ----
> an take everything between the two lines and create the new text
> file as 00000????.txt. Thats all that need to be done in order for
> me to read these email or resend them to myself. Sorry if I am
> missing the obvious but as I said my knowledge in scripting is very
> limited. Like upto shebang These emails are urgent otherwise I
> would have spent some time to learn a bit more scripting.
>
> Any is much appreciated.
>
> Regards.
>
> superdu
> =============
> Just a noob
>
> --
> --------------------------------- --- -- -
> Posted with NewsLeecher v3.8 Final
> Web @ http://www.newsleecher.com/?usenet
> ------------------- ----- ---- -- -
>


it sounds like all you need is something like:

awk '/---Next_Part/{outfile="foo" ++cnt ".txt"} { print > outfile }' infile

Ed.

  Réponse avec citation
Vieux 16/03/2008, 20h01   #5
jayrom01@gmail.com
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: split text file chunks by regex line

On Mar 16, 8:55 am, superdu (nobody@cares) wrote:
> Hi,
> Being a beginner in Linux I hit this problem which I know can be
> solved by scripting. I recovered a bunch of files from a disk crash
> that contains all my emails but they were in 2 mb chunks and in no
> particular demarkation. So i combined them into one huge file.
>
> To break them into smaller messages I need to search for a line say
> "---Next_Part ...blah bla ----
> an take everything between the two lines and create the new text
> file as 00000????.txt. Thats all that need to be done in order for
> me to read these email or resend them to myself. Sorry if I am
> missing the obvious but as I said my knowledge in scripting is very
> limited. Like upto shebang These emails are urgent otherwise I
> would have spent some time to learn a bit more scripting.
>
> Any is much appreciated.
>
> Regards.
>
> superdu
> =============
> Just a noob
>
> --
> --------------------------------- --- -- -
> Posted with NewsLeecher v3.8 Final
> Web @http://www.newsleecher.com/?usenet
> ------------------- ----- ---- -- -


If the input file contains:
---Next_Part
line 1
line 2
---Next_Part
line 2.1
line 2.2
---Next_Part

Then the following Shell program:
#!/bin/sh

i=0
/bin/cat $1 |
while read r; do
if echo "$r" | /bin/grep "^---Next_Part" 2>&1 >/dev/null; then
[ -f /tmp/mail.$$ ] && /bin/mv /tmp/mail.$$ mail.$i
i=`/bin/expr $i + 1`
continue
fi
echo "$r" >>/tmp/mail.$$
done

Will create two files with the following names and contents:
# cat mail.1
line 1
line 2
# cat mail.2
line 2.1
line 2.2
#

Shawn Ayromloo
  Réponse avec citation
Vieux 16/03/2008, 21h06   #6
Bill Marcum
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: split text file chunks by regex line

On 2008-03-16, superdu (nobody@cares) <superdu> wrote:
>
>
> Hi,
> Being a beginner in Linux I hit this problem which I know can be
> solved by scripting. I recovered a bunch of files from a disk crash
> that contains all my emails but they were in 2 mb chunks and in no
> particular demarkation. So i combined them into one huge file.
>
> To break them into smaller messages I need to search for a line say
> "---Next_Part ...blah bla ----
> an take everything between the two lines and create the new text
> file as 00000????.txt. Thats all that need to be done in order for
> me to read these email or resend them to myself. Sorry if I am
> missing the obvious but as I said my knowledge in scripting is very
> limited. Like upto shebang These emails are urgent otherwise I
> would have spent some time to learn a bit more scripting.
>
> Any is much appreciated.
>
> Regards.
>
> superdu
>=============
> Just a noob
>

If you have formail, you might use it to split the mail files.
  Réponse avec citation
Réponse


Outils de la discussion

Règles de messages
Vous ne pouvez pas créer de nouvelles discussions
Vous ne pouvez pas envoyer des réponses
Vous ne pouvez pas envoyer des pièces jointes
Vous ne pouvez pas modifier vos messages

Les balises BB sont activées : oui
Les smileys sont activés : oui
La balise [IMG] est activée : oui
Le code HTML peut être employé : non
Trackbacks are oui
Pingbacks are oui
Refbacks are oui


Fuseau horaire GMT +1. Il est actuellement 02h54.


Édité par : vBulletin® version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC5 Tous droits réservés.
Version française #16 par l'association vBulletin francophone
PHWinfo est un site Éducation Sans Frontières ©2000-2008
Ad Management by RedTyger
©Tous droits réservés par les parties respectives
Page generated in 0,15236 seconds with 14 queries