find
command, there are a few different ways this can be achieved (some more efficient than others):
-exec command {} \;This is the traditional way. The end of the command must be punctuated by an escaped semicolon. The command argument {} is replaced by the current path name found by
find
. Here is a simple command which echoes file paths.
sharfah@starship:~> find . -type f -exec echo {} \; . ./1.txt ./2.txtThis is very inefficient, because whenever
find
finds a file, it forks a process for your command, waits for this child process to complete and then searches for the next file. In this example, you will get the following child processes: echo .; echo ./1.txt; echo ./2.txt
. So if there are 1000 files, there are 1000 child processes and find
waits.
-exec command {} +If you use a plus (+) instead of the escaped semicolon, the arguments will be grouped together before being passed to the command. The arguments must be at the end of the command.
sharfah@starship:~> find . -type f -exec echo {} + . ./1.txt ./2.txtIn this case, only one child process is created:
echo . ./1.txt ./2.txt
, which is much more efficient, because it avoids a fork/exec for each single argument.
xargsThis is similar to the approach above, in that files found are bundled up (usually in batches of about 20-50 names) and sent to the command as few times as possible.
find
doesn't wait for your command to finish.
sharfah@starship:~> find . -type f | xargs echo . ./1.txt ./2.txtThis approach is efficient and works well as long as you do not have funny characters (e.g. spaces) in your filenames as they won't be escaped. Performance Testing
So which one of the above approaches is fastest? I ran a test across a directory with 10,000 files out of which 5,600 matched my
find
pattern. I ran the test 10 times, changing the order of the finds each time, but the results were always the same. xargs
and +
were very close, with \;
always finishing last. Here is one result:
time find . -name "*20090430*" -exec touch {} + real 0m31.98s user 0m0.06s sys 0m0.49s time find . -name "*20090430*" | xargs touch real 1m8.81s user 0m0.13s sys 0m1.07s time find . -name "*20090430*" -exec touch {} \; real 1m42.53s user 0m0.17s sys 0m2.42sI'm going to be using the
-exec command {} +
method, because it is faster and can handle my funny filenames.
You can use 'find ... -print0 | xargs -0 ...' to resolve the issue with filenames with spaces.
ReplyDelete