More Unix Nerdiness: xargs and tar
I didn’t want to do two Unix posts in a row, but something came up last week that made me dust off the xargs command. Even if you do a lot of logfile parsing, you probably won’t use xargs very much. But when you need it, it can save a ton of time.
Commands used in this post:
xargs - allows you to use a list generated by one command as arguments for another command
mv - moves or renames a file
ls - lists files
tar - an archive utility that lets you store and extract multiple files; although it can be used to write files to tape, one very useful property is that it can also write output to a regular file, which is extremely handy if you need to transfer a bunch of files to your PC (WinZip can handle tarfiles)
Let’s say you have a lot of files in your directory named test1.txt, test2.txt, test3.txt, and so on, and you want to append .old to all of them. You could run a mv (move) command on each individual file, or you could rename all of them at once using xargs as follows:
ls -1 test* | xargs ABC mv ABC ABC.old
• ls -1 (that’s dash-”one”, not a lowercase “L”) lists the files so that there is only one file per line
• Each file name is temporarily held in a variable called ABC, and the rest is just substitution (mv test1.txt test1.txt.old, etc.).
I ran into a situation last week with a batch job that spits out thousands of reports, each into its own subdirectory. After all the reports had run, an issue was found that affected several hundred of them, which were then rerun. Because the reports had already been archived and moved to their final location (not difficult, but time-consuming due to sheer volume), we only wanted to re-archive the reports that had been rerun that day. This was relatively easy to do thanks to xargs. Here’s the command I used:
ls -l */*.out | grep “Jan 21″ | sed ‘s/^.*Jan 21 ..:.. //’ | xargs tar cvf reportreruns.tar
• ls -l (this time it’s a lowercase “L”, not a “one”) does a long listing, which includes permissions, size, owner, date stamp, etc. The first few lines for the listing look something like this:
-rw-r–r– 1 owner group 1234567 Jan 18 22:07 subdir1/report.out
-rw-r–r– 1 owner group 1234567 Jan 21 05:46 subdir2/report.out
-rw-r–r– 1 owner group 1234567 Jan 19 09:31 subdir3/report.out
-rw-r–r– 1 owner group 1234567 Jan 21 05:57 subdir4/report.out
• The grep command (see prior post for more information) pulls out only those lines that include the string “Jan 21″.
• The sed command (see prior post for more information) strips off everything up to the subdirectory and file. Quite literally, it’s substituting everything from the beginning of the line (^) to “Jan 21″, a space, two characters (. is a wildcard for a single character), a colon, two more characters (that just took care of the time stamp), and another space – and replaces it with nothing.
• The xarg command takes all of those files and passes them to the tar command. Notice I didn’t use the variable (ABC) this time: xargs will automatically put the arguments at the end of the following command, so you only need to use a variable if the arguments appear elsewhere in the command.
• The cvf modifier to tar tells it to create (”c”) the archive, see verbose (”v”) messages (it’ll output a message as each file is added: it’s optional), and write the archive to a file (”f”) named reportreruns.tar. The tar command needs to be followed by a list of files to put into the archive, but xargs has already taken care of that.
Photo courtesy of stock.xchng.