Afficher un message
Vieux 16/03/2008, 18h48   #3
pk
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: split text file chunks by regex line

superdu (nobody@cares) wrote:

> Hi,
> Being a beginner in Linux I hit this problem which I know can be
> solved by scripting. I recovered a bunch of files from a disk crash
> that contains all my emails but they were in 2 mb chunks and in no
> particular demarkation. So i combined them into one huge file.
>
> To break them into smaller messages I need to search for a line say
> "---Next_Part ...blah bla ----
> an take everything between the two lines and create the new text
> file as 00000????.txt. Thats all that need to be done in order for
> me to read these email or resend them to myself. Sorry if I am
> missing the obvious but as I said my knowledge in scripting is very
> limited. Like upto shebang These emails are urgent otherwise I
> would have spent some time to learn a bit more scripting.
>
> Any is much appreciated.


There are many specialized script around (especially perl scripts) which
deal specifically with mailbox manipulation, if it's in a standard format
(eg, mbox).

In your specific case, I doubt that splitting the file using the NextPart
lines will produce something useful or immediately usable as email. Some
messages might not even have a NextPart line. Furthermore, that demarcation
is not standard and depends on the MUA used to compose the mail. The
particular string used in each email can be usually found in the
Content-Type header. I have some messages in my inbox that
use '===============12394855====.ALT' instead of NextPart.

That said, one command to split a file using a regular expression as
demarcation is csplit. To do what you want (but whose result might not be
quite what you expect) you could use eg

(GNU csplit, -z and -b seem to be nonstandard)

$ csplit -z -f 00000 -b '%04d.txt' bigfile.txt '/---Next_Part/' '{*}'

The above command will create a number of files 000000000.txt,
000000001.txt, etc., each containing the text between two
successive '---Next_Part' lines.

If the above does not suit your needs, then provide some sample input and
expected output (or look for some specific mailbox manipulation tools).

--
All the commands are tested with bash and GNU tools, so they may use
nonstandard features. I try to mention when something is nonstandard (if
I'm aware of that), but I may miss something. Corrections are welcome.
  Réponse avec citation
 
Page generated in 0,05321 seconds with 9 queries