|
|
|
|
||||||
| comp.unix.shell Using and programming the Unix shell. |
![]() |
|
|
LinkBack | Outils de la discussion |
|
|
#1 |
|
Messages: n/a
Hébergeur: |
I have a small script that will find some files and then grep for a
specific keywords in them. The number of files is ~ 20,000 and they are all small 1-2 KB. If I use exec grep it takes way longer (20-30s) to complete than xargs grep (2-3s). I would love to know why. |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
On Mar 16, 10:43pm, freightcar <freight...@gmail.com> wrote:
> I have a small script that will find some files and then grep for a > specific keywords in them. The number of files is ~ 20,000 and they > are all small 1-2 KB. If I use exec grep it takes way longer (20-30s) > to complete than xargs grep (2-3s). > > I would love to know why. forgot to mention that I am using "find" |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
freightcar wrote:
> I have a small script that will find some files and then grep for a > specific keywords in them. The number of files is ~ 20,000 and they > are all small 1-2 KB. If I use exec grep it takes way longer (20-30s) > to complete than xargs grep (2-3s). > > I would love to know why. It is likely that the time of this command is dominated by the time it takes to create a new process (for the grep command). The common way to use find with -exec is: find ... -exec command '{}' \; That will be slower than: find ... |xargs command because the first way runs the command once per file. In your case, that means starting 20,000 processes. With xargs (or a more sophisticated "find ... -exec" command) far fewer processes are started, perhaps only one. -Wayne |
|
|
|
#4 |
|
Messages: n/a
Hébergeur: |
2008-03-16, 23:16(-04), Wayne:
> freightcar wrote: >> I have a small script that will find some files and then grep for a >> specific keywords in them. The number of files is ~ 20,000 and they >> are all small 1-2 KB. If I use exec grep it takes way longer (20-30s) >> to complete than xargs grep (2-3s). >> >> I would love to know why. > > It is likely that the time of this command is dominated by the > time it takes to create a new process (for the grep command). > The common way to use find with -exec is: > find ... -exec command '{}' \; > That will be slower than: > find ... |xargs command > because the first way runs the command once per file. In > your case, that means starting 20,000 processes. > > With xargs (or a more sophisticated "find ... -exec" command) > far fewer processes are started, perhaps only one. [...] The output of find -print is not post processable because it outputs a list of file names separated by NL characters while NL is as valid as any other character in a file name. And the default format expected by xargs is not a newline separated list, it's a space (including NL) separated list where quotes and backslashes also have their role to play. xargs also has stupid limitations on the length of the arguments. All that makes it very difficult to use xargs reliably unless you use GNU's -0 option. Standard implementations of find have -exec cmd {} + which will run fewer commands, so you don't need xargs. -- Stéphane |
|
|
|
#5 |
|
Messages: n/a
Hébergeur: |
Stephane CHAZELAS <this.address@is.invalid> writes:
> All that makes it very difficult to use xargs reliably unless > you use GNU's -0 option. Unless you have control over the filenames......... |
|
|
|
#6 |
|
Messages: n/a
Hébergeur: |
On Mar 17, 2:38 am, Stephane CHAZELAS <this.addr...@is.invalid> wrote:
> 2008-03-16, 23:16(-04), Wayne: > xargs also has stupid limitations on the length of the > arguments. I do not believe it's xargs, but rather the kernel. BASH has similar limits for `` or $() on non internal commands. Copying a little from my xterm: ~ $yes | tr -d '\n' | head -c 1000000 | xargs /bin/true xargs: argument line too long ~ $/bin/true `yes | tr -d '\n' | head -c 1000000` bash: /bin/true: Argument list too long ~ $true `yes | tr -d '\n' | head -c 1000000` note that bash has a builtin true and has no arbitrary limits on the length of command line arguments. -Ed -- (You can't go wrong with psycho-rats.)(http://mi.eng.cam.ac.uk/~er258) /d{def}def/f{/Times s selectfont}d/s{11}d/r{roll}d f 2/m{moveto}d -1 r 230 350 m 0 1 179{ 1 index show 88 rotate 4 mul 0 rmoveto}for/s 12 d f pop 235 420 translate 0 0 moveto 1 2 scale show showpage |
|
|
|
#7 |
|
Messages: n/a
Hébergeur: |
2008-03-17, 08:27(-07), Edward Rosten:
> On Mar 17, 2:38 am, Stephane CHAZELAS <this.addr...@is.invalid> wrote: >> 2008-03-16, 23:16(-04), Wayne: > >> xargs also has stupid limitations on the length of the >> arguments. > > I do not believe it's xargs, but rather the kernel. BASH has similar > limits for `` or $() on non internal commands. Copying a little from > my xterm: > > ~ $yes | tr -d '\n' | head -c 1000000 | xargs /bin/true > xargs: argument line too long > ~ $/bin/true `yes | tr -d '\n' | head -c 1000000` > bash: /bin/true: Argument list too long > ~ $true `yes | tr -d '\n' | head -c 1000000` > > note that bash has a builtin true and has no arbitrary limits on the > length of command line arguments. [...] You must be refering to the execve(2) system call limitation on the size of envp+argv (which of course doesn't affect shell builtins as there's no execve for them). xargs is the tool to overcome that limitation by breaking the arg list and run as many commands as necessary so that the execve(2)'s limit is not reached. But that's not the limitation I was thinking of. I know some xargs implementations have a very low limit (around 255 bytes) on the size of an argument, lower than LINE_MAX or the max length of a path for instance, but now that I'm looking for supporting information, it may be for the -I option only (where POSIX limits it to 255 bytes). -- Stéphane |
|
|
|
#8 |
|
Messages: n/a
Hébergeur: |
Edward Rosten <Edward.Rosten@gmail.com> writes:
> > xargs also has stupid limitations on the length of the > > arguments. > > I do not believe it's xargs, but rather the kernel. Yup. Check the value of ARG_MAX in /usr/include/linux/limits.h On my system it's: #define ARG_MAX 131072 /* # bytes of args + environ for exec() */ |
|
![]() |
| Outils de la discussion | |
|
|