|
|
|
|
||||||
| comp.unix.shell Using and programming the Unix shell. |
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
Hi all. I know this has been discussed before, but with no solution
that I can find. Here's the tools I'm restricted to use: linux pdksh GNU find GNU xargs The directory contains over 5000 files in various levels of sub-folders. Problem: I want to select the 50 most recent files to feed into a rsync process to update a remote server. $ find . -type f -print0 | xargs -0 ls -ltd | head -50 Logically this command should give me the 50 most recent files, except that through the actions of xargs the files are split between separate ls processes, so that the top 50 lines of the output are not necessarily the ones I'm looking for. I have to use xargs because of the large number of files & cannot use the find -mtime switch because I need the most recent files, not necessarily those modified within a particular timespan because the directory is updated at random, possibly between large intervals in time. Ideas & suggestions? Thanks in advance. -- GROG! EMAIL: uber [dot] grog [at] gmail [dot] com |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
On 2008-04-28, GROG! <INVALID_EMAIL_CHECK_MY_SIG@GROG.ORG> wrote:
> The directory contains over 5000 files in various levels of > sub-folders. > > Problem: I want to select the 50 most recent files to feed into a > rsync process to update a remote server. > > $ find . -type f -print0 | xargs -0 ls -ltd | head -50 > > Logically this command should give me the 50 most recent files, except > that through the actions of xargs the files are split between separate > ls processes, so that the top 50 lines of the output are not > necessarily the ones I'm looking for. I have to use xargs because of > the large number of files & cannot use the find -mtime switch because > I need the most recent files, not necessarily those modified within a > particular timespan because the directory is updated at random, > possibly between large intervals in time. Try this (quick draft, syntax is unreliable) : - make your script 'touch' a time stamp file (i.e. ~/.timestamp) - use find to only search files changed since the last copy, and list those in a date-sortable way: find /dir/ -type f -newer ~/.timestamp) -exec stat -c '%y %n' - pipe the output of this search trough sort and then tail to get the last 50 files - copy the 50 files in your list update the timestamp: touch ~/.timestamp BTW, this sounds like homework. Who would want to copy a _number_ of files, rather then update the latest. Sounds like the kind of problem only a teacher would come up with. :-) What if 52 files got updated in the last few minutes. You'd only copy 50? Those 2 other files would end up never been copied... Suggestion: Just let rsync do it's job in sync'ing what's new. Or use the above, but copy all that was changed since you last ran your script. -- There is an art, it says, or rather, a knack to flying. The knack lies in learning how to throw yourself at the ground and miss. Douglas Adams |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
On Apr 28, 11:42am, GROG! <INVALID_EMAIL_CHECK_MY_...@GROG.ORG>
wrote: > Hi all. I know this has been discussed before, but with no solution > that I can find. Here's the tools I'm restricted to use: > > linux > pdksh > GNU find > GNU xargs > > The directory contains over 5000 files in various levels of > sub-folders. > > Problem: I want to select the 50 most recent files to feed into a > rsync process to update a remote server. > > $ find . -type f -print0 | xargs -0 ls -ltd | head -50 > > Logically this command should give me the 50 most recent files, except > that through the actions of xargs the files are split between separate > ls processes, so that the top 50 lines of the output are not > necessarily the ones I'm looking for. I have to use xargs because of > the large number of files & cannot use the find -mtime switch because > I need the most recent files, not necessarily those modified within a > particular timespan because the directory is updated at random, > possibly between large intervals in time. > > Ideas & suggestions? > Thanks in advance. > > -- > GROG! > EMAIL: uber [dot] grog [at] gmail [dot] com One one hand you said you have to use xargs; on the other hand you showed that xargs didn't work for you. Basically if something doesn't work, don't stick with it; try something else. Maybe list the time stamp of each file to a logfile. Then sort the logfile entries according to the time stamps. And fetch the top 50 entries. |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
On 2008-04-28, GROG! <INVALID_EMAIL_CHECK_MY_SIG@GROG.ORG> wrote:
> Hi all. I know this has been discussed before, but with no solution > that I can find. Here's the tools I'm restricted to use: > > linux > pdksh > GNU find > GNU xargs > > The directory contains over 5000 files in various levels of > sub-folders. > > Problem: I want to select the 50 most recent files to feed into a > rsync process to update a remote server. > > $ find . -type f -print0 | xargs -0 ls -ltd | head -50 > > Logically this command should give me the 50 most recent files, except > that through the actions of xargs the files are split between separate > ls processes, so that the top 50 lines of the output are not > necessarily the ones I'm looking for. I have to use xargs because of > the large number of files & cannot use the find -mtime switch because > I need the most recent files, not necessarily those modified within a > particular timespan because the directory is updated at random, > possibly between large intervals in time. > If you can use sort and head (or tail) (they're part of the standard UNIX binaries and should be on any system), it shouldn't be that tough. If you can't, I'd look for another jobs, because your boss is telling to pound nails and then confiscating your hammer. -- Christopher Mattern NOTICE Thank you for noticing this new notice Your noticing it has been noted And will be reported to the authorities |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
On 04-28 13:42 CDT, GROG! wrote:
> Problem: I want to select the 50 most recent files to feed into a > rsync process to update a remote server. Thanks for the suggestions all. And no Rikishi, this isn't a homework assignment . I've been employed as a programmer for over 10 yearsnow. Not that I wouldn't rather go back to school (it's a lot more fun just learning for the sake of it rather than working at what you _mostly_ already know), but that's an entirely different topic .What I need to do is to keep 50 of the most recent files updated on the remote server. If I were to find files updated since the last update as Rikishi suggested, then wouldn't rsync on the remote server delete all the files that weren't on the list, only leaving me with the new ones? That's what I'm trying to avoid. However, the idea that Rikishi & Harry both suggested about saving the file list to a temp file & sorting that lead me along a track that's arrived at the solution (which is dependant on GNU ls as well): $ find . -type f -print0 | xargs -0 ls -l --time-style=+'%Y%m%d%H%M%S' | sort -n -r -k6 | head -50 | while IFS= read LINE; do shift -- $LINE [ $# -lt 7 ] && continue shift 6 echo "$@" done Not implemented yet, but should work. Now all I have to do is feed this output to rsync. Thanks for ing me think this through. -- GROG! EMAIL: uber [dot] grog [at] gmail [dot] com |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
2008-04-28, 18:42(+00), GROG!:
> Hi all. I know this has been discussed before, but with no solution > that I can find. Here's the tools I'm restricted to use: > > linux > pdksh > GNU find > GNU xargs > > The directory contains over 5000 files in various levels of > sub-folders. > > Problem: I want to select the 50 most recent files to feed into a > rsync process to update a remote server. > > $ find . -type f -print0 | xargs -0 ls -ltd | head -50 find . -type f -printf '%T@\t%p\0' | tr '\n\0' '\0\n' | sort -rg | head -n 50 | cut -f2- | tr '\n\0' '\0\n' | xargs -r0 ls -lU find's -printf, sort's -g, xargs's -r and -0, ls' -U options and the ability of those commands to accept NUL characters in their input are non-standard GNU extensions. With zsh: ls -lt -- **/*(D.om[1,50]) -- Stéphane |
|
|
|
#7 |
|
Messages: n/a
Hébergeur: |
On 04-29 10:03 CDT, Stephane CHAZELAS wrote:
> 2008-04-28, 18:42(+00), GROG!: >> Problem: I want to select the 50 most recent files to feed into a >> rsync process to update a remote server. > > find . -type f -printf '%T@\t%p\0' | > tr '\n\0' '\0\n' | > sort -rg | > head -n 50 | > cut -f2- | > tr '\n\0' '\0\n' | > xargs -r0 ls -lU Excellent Stephane. Much better than my solution. And with rsync also supporting the print0 option, the job is done. Thanks very much. -- GROG! EMAIL: uber [dot] grog [at] gmail [dot] com |
|
|
|
#8 |
|
Messages: n/a
Hébergeur: |
On 04-29 10:03 CDT, Stephane CHAZELAS wrote:
> 2008-04-28, 18:42(+00), GROG!: >> Problem: I want to select the 50 most recent files to feed into a >> rsync process to update a remote server. > > find . -type f -printf '%T@\t%p\0' | > tr '\n\0' '\0\n' | > sort -rg | > head -n 50 | > cut -f2- | > tr '\n\0' '\0\n' | > xargs -r0 ls -lU Not to quibble, but is there any reason not to simplify a bit more by removing the first tr? find . -type f -printf '%T@\t%p\n' | sort -rg | head -n 50 | cut -f2- | tr '\n\0' '\0\n' | xargs -r0 ls -lU Thanks again. -- GROG! EMAIL: uber [dot] grog [at] gmail [dot] com |
|
|
|
#9 |
|
Messages: n/a
Hébergeur: |
On Tuesday 29 April 2008 17:55, GROG! wrote:
> On 04-29 10:03 CDT, Stephane CHAZELAS wrote: >> find . -type f -printf '%T@\t%p\0' | >> tr '\n\0' '\0\n' | >> sort -rg | >> head -n 50 | >> cut -f2- | >> tr '\n\0' '\0\n' | >> xargs -r0 ls -lU > > Not to quibble, but is there any reason not to simplify a bit more by > removing the first tr? Stephane's solution handles filenames containing newlines (an unusual scenario). If you don't have such filenames, you can simplify it to: find . -type f -printf '%T@\t%p\n' | sort -rg | head -n 50 | cut -f2- | xargs -r ls -lU -- D. |
|
|
|
#10 |
|
Messages: n/a
Hébergeur: |
On Tuesday 29 April 2008 18:22, Dave B wrote:
> Stephane's solution handles filenames containing newlines (an unusual > scenario). If you don't have such filenames, you can simplify it to: > > find . -type f -printf '%T@\t%p\n' | > sort -rg | > head -n 50 | > cut -f2- | > xargs -r ls -lU I forgot to say that if you have filenames with spaces and tabs, you have to modify IFS before executing the above commands. IFS=$'\n' # or IFS="\n" find ... and possibly save the old value of IFS and restore it afterwards. My advice is to just use Stephane's solution and be done with it. -- D. |
|
|
|
#11 |
|
Messages: n/a
Hébergeur: |
On Tuesday 29 April 2008 18:32, Dave B wrote:
> On Tuesday 29 April 2008 18:22, Dave B wrote: > >> Stephane's solution handles filenames containing newlines (an unusual >> scenario). If you don't have such filenames, you can simplify it to: >> >> find . -type f -printf '%T@\t%p\n' | >> sort -rg | >> head -n 50 | >> cut -f2- | >> xargs -r ls -lU > > I forgot to say that if you have filenames with spaces and tabs, you have > to modify IFS before executing the above commands. Ok (note to self) never post while doing something else! :-) The new IFS must be exported, and -d '\n' must be added to the call to xargs. Another reason for using the proposed solution verbatim. -- D. |
|
|
|
#12 |
|
Messages: n/a
Hébergeur: |
On Tuesday 29 April 2008 18:50, Dave B wrote:
> Ok (note to self) never post while doing something else! :-) > The new IFS must be exported, and -d '\n' must be added to the call to > xargs. It seems to me that IFS is not important here, since all the data flows into the pipes and the shell never does word splitting. > Another reason for using the proposed solution verbatim. Agreed. -- All the commands are tested with bash and GNU tools, so they may use nonstandard features. I try to mention when something is nonstandard (if I'm aware of that), but I may miss something. Corrections are welcome. |
|
|
|
#13 |
|
Messages: n/a
Hébergeur: |
2008-04-29, 18:32(+02), Dave B:
> On Tuesday 29 April 2008 18:22, Dave B wrote: > >> Stephane's solution handles filenames containing newlines (an unusual >> scenario). If you don't have such filenames, you can simplify it to: >> >> find . -type f -printf '%T@\t%p\n' | >> sort -rg | >> head -n 50 | >> cut -f2- | >> xargs -r ls -lU > > I forgot to say that if you have filenames with spaces and tabs, you have to > modify IFS before executing the above commands. spaces and tabs... and single quotes and double quotes and backslashes and possibly more are a problem here. > IFS=$'\n' # or IFS="\n" [...] IFS="\n" is for splitting on backslashes and ns, $'...' is a non-standard ksh93, zsh and bash only feature (the OP mentionned pdksh) but anyway as someone else mentionned, IFS is not involved here. -- Stéphane |
|
|
|
#14 |
|
Messages: n/a
Hébergeur: |
On Tuesday 29 April 2008 20:01, Stephane CHAZELAS wrote:
>> IFS=$'\n' # or IFS="\n" > [...] > > IFS="\n" is for splitting on backslashes and ns, $'...' is a > non-standard ksh93, zsh and bash only feature (the OP mentionned > pdksh) but anyway as someone else mentionned, IFS is not > involved here. Yes, in fact the IFS part was all wrong (again, sorry for that). Btw, is there a standard equivalent to $'\n'? -- D. |
|
|
|
#15 |
|
Messages: n/a
Hébergeur: |
2008-04-29, 20:08(+02), Dave B:
> On Tuesday 29 April 2008 20:01, Stephane CHAZELAS wrote: > >>> IFS=$'\n' # or IFS="\n" >> [...] >> >> IFS="\n" is for splitting on backslashes and ns, $'...' is a >> non-standard ksh93, zsh and bash only feature (the OP mentionned >> pdksh) but anyway as someone else mentionned, IFS is not >> involved here. > > Yes, in fact the IFS part was all wrong (again, sorry for that). > Btw, is there a standard equivalent to $'\n'? For other than \n, there's "$(printf '\x')" You can do: IFS=' ' or eval "$(printf 'IFS="\n"')" -- Stéphane |
|
|
|
#16 |
|
Messages: n/a
Hébergeur: |
On Tuesday 29 April 2008 20:34, Stephane CHAZELAS wrote:
>> Btw, is there a standard equivalent to $'\n'? > > For other than \n, there's "$(printf '\x')" > > You can do: > > IFS=' > ' > > or > > eval "$(printf 'IFS="\n"')" Ok, but I truly meant something that can be used like a variable, as $'\n', not just a way to set a variable to \n (I know it does not matter in practice, but I was just curious). -- D. |
|
|
|
#17 |
|
Messages: n/a
Hébergeur: |
On 04-29 10:03 CDT, Stephane CHAZELAS wrote:
> 2008-04-28, 18:42(+00), GROG!: >> Problem: I want to select the 50 most recent files to feed into a >> rsync process to update a remote server. > > find . -type f -printf '%T@\t%p\0' | > tr '\n\0' '\0\n' | > sort -rg | > head -n 50 | > cut -f2- | > tr '\n\0' '\0\n' | > xargs -r0 ls -lU For file selection this does work, but hopefully not getting too far off topic, it seems that rsync (version 2.6.9) isn't working as expected. Here's the entire command I'm using (src & dest are of courses previously set in my script): cd $src find . -type f -printf '%T@\t%p\0' | tr '\n\0' '\0\n' | sort -rg | head -n 50 | cut -f2- | tr '\n\0' '\0\n' | rsync --from0 --verbose --delete --files-from=- . $dest What's happening is that rsync is transferring all the files each time it's run, whether they've changed or not. As well, if the file listing changes, rsync is not deleting the older files from the destination. -- GROG! EMAIL: uber [dot] grog [at] gmail [dot] com |
|
|
|
#18 |
|
Messages: n/a
Hébergeur: |
>>>>> "GROG!" == GROG! <INVALID_EMAIL_CHECK_MY_SIG@GROG.ORG> writes:
GROG!> find . -type f -printf '%T@\t%p\0' | GROG!> tr '\n\0' '\0\n' | GROG!> sort -rg | GROG!> head -n 50 | GROG!> cut -f2- | GROG!> tr '\n\0' '\0\n' | GROG!> rsync --from0 --verbose --delete --files-from=- . $dest And people think *Perl* looks cryptic? :-) -- Randal L. Schwartz - Stonehenge Consulting Services, Inc. - +1 503 777 0095 <merlyn@stonehenge.com> <URL:http://www.stonehenge.com/merlyn/> Perl/Unix/security consulting, Technical writing, Comedy, etc. etc. See PerlTraining.Stonehenge.com for onsite and open-enrollment Perl training! |
|
|
|
#19 |
|
Messages: n/a
Hébergeur: |
On 2008-04-29, GROG! <INVALID_EMAIL_CHECK_MY_SIG@GROG.ORG> wrote:
> What I need to do is to keep 50 of the most recent files updated on > the remote server. If I were to find files updated since the last > update as Rikishi suggested, then wouldn't rsync on the remote server > delete all the files that weren't on the list, only leaving me with > the new ones? That's what I'm trying to avoid. Only if you told it to do so, with the --delete option. The default action is to add new files, and update changed files. Basic set of options: rsync -var s/ d/ (s=source, d=destination) rsync -var -e ssh s/ username@remote::/full/path/d/ (use ssh connection to remote) Remove 'v' to make less verbose. > However, the idea that Rikishi & Harry both suggested about saving the > file list to a temp file & sorting that lead me along a track that's > arrived at the solution (which is dependant on GNU ls as well): > > $ find . -type f -print0 | > xargs -0 ls -l --time-style=+'%Y%m%d%H%M%S' | > sort -n -r -k6 | head -50 | > while IFS= read LINE; do > shift -- $LINE > [ $# -lt 7 ] && continue > shift 6 > echo "$@" > done > > Not implemented yet, but should work. Now all I have to do is feed > this output to rsync. Feed it to scp. No point of using rsync, if you allready know what to copy. You could even use cp, in some cases. (i.e. nfs share) -- There is an art, it says, or rather, a knack to flying. The knack lies in learning how to throw yourself at the ground and miss. Douglas Adams |
|
|
|
#20 |
|
Messages: n/a
Hébergeur: |
On 2008-04-29, GROG! <INVALID_EMAIL_CHECK_MY_SIG@GROG.ORG> wrote:
> What's happening is that rsync is transferring all the files each time > it's run, whether they've changed or not. As well, if the file listing > changes, rsync is not deleting the older files from the destination. Add the -a flag to store dates (amongst others). Add --delete to remove files files from dest, if they don't exist in source anymore. -- There is an art, it says, or rather, a knack to flying. The knack lies in learning how to throw yourself at the ground and miss. Douglas Adams |
|
|
|
#21 |
|
Messages: n/a
Hébergeur: |
On 2008-04-29, Rikishi 42 <skunkworks@rikishi42.net> wrote:
> > > On 2008-04-29, GROG! <INVALID_EMAIL_CHECK_MY_SIG@GROG.ORG> wrote: > >> What's happening is that rsync is transferring all the files each time >> it's run, whether they've changed or not. As well, if the file listing >> changes, rsync is not deleting the older files from the destination. > > Add the -a flag to store dates (amongst others). > > Add --delete to remove files files from dest, if they don't exist in source > anymore. Forgot to say: I'm using rsync version 2.6.6 protocol version 29 But the behaviour described was there in previous releases, too. -- There is an art, it says, or rather, a knack to flying. The knack lies in learning how to throw yourself at the ground and miss. Douglas Adams |
|
|
|
#22 |
|
Messages: n/a
Hébergeur: |
On 04-29 16:13 CDT, Rikishi 42 wrote:
> On 2008-04-29, Rikishi 42 <skunkworks@rikishi42.net> wrote: >> On 2008-04-29, GROG! <INVALID_EMAIL_CHECK_MY_SIG@GROG.ORG> wrote: >> >>> What's happening is that rsync is transferring all the files each time >>> it's run, whether they've changed or not. As well, if the file listing >>> changes, rsync is not deleting the older files from the destination. >> >> Add the -a flag to store dates (amongst others). >> >> Add --delete to remove files files from dest, if they don't exist in source >> anymore. > > Forgot to say: I'm using > > rsync version 2.6.6 protocol version 29 > > But the behaviour described was there in previous releases, too. But I did have --delete specified, if you look back at my code: > rsync --from0 --verbose --delete --files-from=- . $dest Which is what I'm concerned about as I said above, rsync is *not* deleting the files on dest even with that option included. -- GROG! EMAIL: uber [dot] grog [at] gmail [dot] com |
|
|
|
#23 |
|
Messages: n/a
Hébergeur: |
On 2008-04-29, GROG! <INVALID_EMAIL_CHECK_MY_SIG@GROG.ORG> wrote:
> But I did have --delete specified, if you look back at my code: > >> rsync --from0 --verbose --delete --files-from=- . $dest > > Which is what I'm concerned about as I said above, rsync is *not* > deleting the files on dest even with that option included. Sorry, missed that one. But you might wanna look into the -a flag anyway. Try this (no need to cd into the source) : rsync -var --delete $src/ $dest/ -- There is an art, it says, or rather, a knack to flying. The knack lies in learning how to throw yourself at the ground and miss. Douglas Adams |
|
|
|
#24 |
|
Messages: n/a
Hébergeur: |
Rikishi 42 wrote:
> On 2008-04-29, GROG! <INVALID_EMAIL_CHECK_MY_SIG@GROG.ORG> wrote: > >> But I did have --delete specified, if you look back at my code: >> >>> rsync --from0 --verbose --delete --files-from=- . $dest >> Which is what I'm concerned about as I said above, rsync is *not* >> deleting the files on dest even with that option included. > > Sorry, missed that one. > But you might wanna look into the -a flag anyway. > > > > Try this (no need to cd into the source) : > > rsync -var --delete $src/ $dest/ > > Right, do not forget the trailing slashes :-) rsync behaves quite differently with the slashes involved. BTW, you might want to see -n/--dry-run options to the rsync command, would let you do a dummy run of the command and to see what is being synced/updated/deleted/created before the actual sync is run. Nikhil |
|
|
|
#25 |
|
Messages: n/a
Hébergeur: |
On 04-29 15:03 CDT, GROG! wrote:
> 2008-04-28, 18:42(+00), GROG!: > >> Problem: I want to select the 50 most recent files to feed into a >> rsync process to update a remote server. > > [...snip...] it seems that rsync (version 2.6.9) isn't working as > expected. Here's the entire command I'm using (src & dest are of > courses previously set in my script): > > cd $src > > find . -type f -printf '%T@\t%p\0' | > tr '\n\0' '\0\n' | > sort -rg | > head -n 50 | > cut -f2- | > tr '\n\0' '\0\n' | > rsync --from0 --verbose --delete --files-from=- . $dest > > What's happening is that rsync is transferring all the files each time > it's run, whether they've changed or not. As well, if the file listing > changes, rsync is not deleting the older files from the destination. After hacking at it a bit more I've found that rsync seems to fail in it's use of the --files-from switch. It seems that for rsync to properly update new files & delete old ones, an actual directory has to be referenced. Instead of actually duplicating the files, I've found that creating a temporary directory of symlinks seems to work fine. Here's the relevant part of my script: trap "rm -Rf $TMPDIR; exit" 0 1 2 3 15 cd $SRCDIR || exit 1 find . -type f -printf '%T@\t%p\n' | sort -rg | head -n $NUMFILES | cut -f2- | while read FILE; do mkdir -p "$TMPDIR/${FILE%/*}" || exit 1 ln -s "$SRCDIR/$FILE" "$TMPDIR/$FILE" done cd $TMPDIR || exit 1 rsync --verbose --archive --copy-links --delete . $DESTDIR This works, but if there's an easier/cleaner way to create the tmpdir, I'm open to suggestions. Thanks again for the . -- GROG! EMAIL: uber [dot] grog [at] gmail [dot] com |
|
![]() |
| Outils de la discussion | |
|
|