<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>showmeanalytics.com &#187; Unix</title>
	<atom:link href="http://showmeanalytics.com/tag/unix/feed/" rel="self" type="application/rss+xml" />
	<link>http://showmeanalytics.com</link>
	<description>Analytics from the Show Me State</description>
	<lastBuildDate>Wed, 05 May 2010 23:56:40 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>More Unix Nerdiness: xargs and tar</title>
		<link>http://showmeanalytics.com/2009/01/more-unix-nerdiness-xargs-and-tar/</link>
		<comments>http://showmeanalytics.com/2009/01/more-unix-nerdiness-xargs-and-tar/#comments</comments>
		<pubDate>Sat, 24 Jan 2009 16:44:54 +0000</pubDate>
		<dc:creator>angie</dc:creator>
				<category><![CDATA[Unix]]></category>
		<category><![CDATA[ls]]></category>
		<category><![CDATA[mv]]></category>
		<category><![CDATA[tar]]></category>
		<category><![CDATA[xargs]]></category>

		<guid isPermaLink="false">http://showmeanalytics.com/?p=39</guid>
		<description><![CDATA[I didn&#8217;t want to do two Unix posts in a row, but something came up last week that made me dust off the xargs command. Even if you do a lot of logfile parsing, you probably won&#8217;t use xargs very much. But when you need it, it can save a ton of time.
Commands used in [...]]]></description>
			<content:encoded><![CDATA[<p><img class="alignright size-full wp-image-45" title="smallnerdglasses" src="http://showmeanalytics.com/wp-content/uploads/2009/01/smallnerdglasses.jpg" alt="smallnerdglasses" width="200" height="133" />I didn&#8217;t want to do two Unix posts in a row, but something came up last week that made me dust off the xargs command. Even if you do a lot of logfile parsing, you probably won&#8217;t use xargs very much. But when you need it, it can save a ton of time.</p>
<p>Commands used in this post:</p>
<p><strong>xargs </strong>- allows you to use a list generated by one command as arguments for another command</p>
<p><strong>mv </strong>- moves or renames a file</p>
<p><strong>ls </strong>- lists files</p>
<p><strong>tar </strong>- an archive utility that lets you store and extract multiple files; although it can be used to write files to tape, one very useful property is that it can also write output to a regular file, which is extremely handy if you need to transfer a bunch of files to your PC (WinZip can handle tarfiles)</p>
<p>Let&#8217;s say you have a lot of files in your directory named test1.txt, test2.txt, test3.txt, and so on, and you want to append .old to all of them. You could run a mv (move) command on each individual file, or you could rename all of them at once using xargs as follows:</p>
<p><span style="color: #800080;">ls -1 test* | xargs ABC mv ABC ABC.old</span></p>
<p>•	ls -1 (that&#8217;s dash-&#8221;one&#8221;, not a lowercase &#8220;L&#8221;) lists the files so that there is only one file per line</p>
<p>•	Each file name is temporarily held in a variable called ABC, and the rest is just substitution (mv test1.txt test1.txt.old, etc.).</p>
<p>I ran into a situation last week with a batch job that spits out thousands of reports, each into its own subdirectory. After all the reports had run, an issue was found that affected several hundred of them, which were then rerun. Because the reports had already been archived and moved to their final location (not difficult, but time-consuming due to sheer volume), we only wanted to re-archive the reports that had been rerun that day. This was relatively easy to do thanks to xargs. Here&#8217;s the command I used:</p>
<p><span style="color: #800080;">ls -l */*.out | grep &#8220;Jan 21&#8243; | sed ‘s/^.*Jan 21 ..:.. //&#8217;  | xargs tar cvf reportreruns.tar</span></p>
<p>•	ls -l (this time it&#8217;s a lowercase &#8220;L&#8221;, not a &#8220;one&#8221;) does a long listing, which includes permissions, size, owner, date stamp, etc. The first few lines for the listing look something like this:</p>
<p>-rw-r&#8211;r&#8211;   1 owner  group   1234567  Jan 18  22:07 subdir1/report.out<br />
-rw-r&#8211;r&#8211;   1 owner  group   1234567  Jan 21  05:46 subdir2/report.out<br />
-rw-r&#8211;r&#8211;   1 owner  group   1234567  Jan 19  09:31 subdir3/report.out<br />
-rw-r&#8211;r&#8211;   1 owner  group   1234567  Jan 21  05:57 subdir4/report.out</p>
<p>•	The grep command (see prior post for more information) pulls out only those lines that include the string &#8220;Jan 21&#8243;.</p>
<p>•	The sed command (see prior post for more information) strips off everything up to  the subdirectory and file. Quite literally, it&#8217;s substituting everything from the beginning of the line (^) to &#8220;Jan 21&#8243;, a space, two characters (. is a wildcard for a single character), a colon, two more characters (that just took care of the time stamp), and another space &#8211; and replaces it with nothing.</p>
<p>•	The xarg command takes all of those files and passes them to the tar command. Notice I didn&#8217;t use the variable (ABC) this time: xargs will automatically put the arguments at the end of the following command, so you only need to use a variable if the arguments appear elsewhere in the command.</p>
<p>•	The cvf modifier to tar tells it to create (&#8221;c&#8221;) the archive, see verbose (&#8221;v&#8221;) messages (it&#8217;ll output a message as each file is added: it&#8217;s optional), and write the archive to a file (&#8221;f&#8221;) named reportreruns.tar. The tar command needs to be followed by a list of files to put into the archive, but xargs has already taken care of that.</p>
<address><em>Photo courtesy of <a title="stock.xchng" href="http://www.sxc.hu">stock.xchng</a>.</em><br />
</address>
]]></content:encoded>
			<wfw:commentRss>http://showmeanalytics.com/2009/01/more-unix-nerdiness-xargs-and-tar/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Unix commands for logfile parsing</title>
		<link>http://showmeanalytics.com/2009/01/unix-commands-for-logfile-parsing/</link>
		<comments>http://showmeanalytics.com/2009/01/unix-commands-for-logfile-parsing/#comments</comments>
		<pubDate>Mon, 12 Jan 2009 04:18:41 +0000</pubDate>
		<dc:creator>angie</dc:creator>
				<category><![CDATA[Unix]]></category>
		<category><![CDATA[awk]]></category>
		<category><![CDATA[cut]]></category>
		<category><![CDATA[grep]]></category>
		<category><![CDATA[Logfiles]]></category>
		<category><![CDATA[server logs]]></category>
		<category><![CDATA[sort]]></category>
		<category><![CDATA[uniq]]></category>

		<guid isPermaLink="false">http://showmeanalytics.com/?p=19</guid>
		<description><![CDATA[Although many web analysts use a JavaScript-tagged solution, some of us still do log analysis on one or more sites. Even when JS data is used, sometimes you have a troubleshooting situation that requires you to go back to your logs. If you have access to a Unix environment, commands like grep, cut, and awk [...]]]></description>
			<content:encoded><![CDATA[<p class="MsoNormal">Although many web analysts use a JavaScript-tagged solution, some of us still do log analysis on one or more sites. Even when JS data is used, sometimes you have a troubleshooting situation that requires you to go back to your logs. If you have access to a Unix environment, commands like grep, cut, and awk are invaluable for prowling through large files. You can also download these commands to use in a PC/DOS environment, although I’ve found the DOS version to be a little more awkward to use.</p>
<p class="MsoNormal">Here is an introduction to some of my favorite commands:</p>
<p class="MsoNormal"><strong>grep</strong> – used to find lines that contain a certain string or regular expression (note that regular expressions are not fully supported in the default grep command for some Unix systems)</p>
<p class="MsoNormal"><strong>cut</strong> – used to pull out specific columns from a file based on a specified delimiter;  most logs are space-delimited</p>
<p class="MsoNormal"><strong>awk</strong> – a programming language that can parse through text files; short pieces of code can be used on the command line</p>
<p class="MsoNormal"><strong>sort</strong> – sort the output; the –n modifier sorts numerically; can use a –t modifier to sort on something other than the beginning of the line</p>
<p class="MsoNormal"><strong>uniq</strong> – eliminate duplicate lines; the –c modifier shows a count of how many times each line appears</p>
<p class="MsoNormal">In order to make “cut” work, you need to know which fields contain your data of interest. If you use the <a href="http://httpd.apache.org/docs/1.3/logs.html">“combined” log format</a>, the following table lists the fields where data is located. Cutting out cookie data can be a bit more difficult: we’re using cut with a space delimiter, but spaces can be contained in the user agent field so pulling out cookie values takes a little more work.</p>
<table border="1" cellspacing="0" cellpadding="0">
<tbody>
<tr>
<td width="73" valign="top">
<p class="MsoNormal"><strong>Field #</strong></p>
</td>
<td style="padding: 0in 5.4pt; width: 238.5pt;" width="318" valign="top">
<p class="MsoNormal"><strong>Information</strong></p>
</td>
</tr>
<tr>
<td style="padding: 0in 5.4pt; width: 54.9pt;" width="73" valign="top">
<p class="MsoNormal">1</p>
</td>
<td style="padding: 0in 5.4pt; width: 238.5pt;" width="318" valign="top">
<p class="MsoNormal">IP address</p>
</td>
</tr>
<tr>
<td style="padding: 0in 5.4pt; width: 54.9pt;" width="73" valign="top">
<p class="MsoNormal">3</p>
</td>
<td style="padding: 0in 5.4pt; width: 238.5pt;" width="318" valign="top">
<p class="MsoNormal">Auth (userid) field; note   it’s not always populated</p>
</td>
</tr>
<tr>
<td style="padding: 0in 5.4pt; width: 54.9pt;" width="73" valign="top">
<p class="MsoNormal">4</p>
</td>
<td style="padding: 0in 5.4pt; width: 238.5pt;" width="318" valign="top">
<p class="MsoNormal">Timestamp</p>
</td>
</tr>
<tr>
<td style="padding: 0in 5.4pt; width: 54.9pt;" width="73" valign="top">
<p class="MsoNormal">7</p>
</td>
<td style="padding: 0in 5.4pt; width: 238.5pt;" width="318" valign="top">
<p class="MsoNormal">Request URL</p>
</td>
</tr>
<tr>
<td style="padding: 0in 5.4pt; width: 54.9pt;" width="73" valign="top">
<p class="MsoNormal">9</p>
</td>
<td style="padding: 0in 5.4pt; width: 238.5pt;" width="318" valign="top">
<p class="MsoNormal">Status code</p>
</td>
</tr>
<tr>
<td style="padding: 0in 5.4pt; width: 54.9pt;" width="73" valign="top">
<p class="MsoNormal">11</p>
</td>
<td style="padding: 0in 5.4pt; width: 238.5pt;" width="318" valign="top">
<p class="MsoNormal">Referrer</p>
</td>
</tr>
<tr>
<td style="padding: 0in 5.4pt; width: 54.9pt;" width="73" valign="top">
<p class="MsoNormal">12-</p>
</td>
<td style="padding: 0in 5.4pt; width: 238.5pt;" width="318" valign="top">
<p class="MsoNormal">User agent (the dash means   go through the end of the line; UA can contain spaces and thus spans several   columns)</p>
</td>
</tr>
<tr>
<td style="padding: 0in 5.4pt; width: 54.9pt;" width="73" valign="top">
<p class="MsoNormal">(varies)</p>
</td>
<td style="padding: 0in 5.4pt; width: 238.5pt;" width="318" valign="top">
<p class="MsoNormal">Cookies</p>
</td>
</tr>
</tbody>
</table>
<p class="MsoNormal">In Unix environments, you are allowed to view your results page by page on the screen, or to save them to a file. To page through the results on screen, pipe the command through “more” as shown:</p>
<p class="MsoNormal"><span style="color: #800080;"><em>command</em> | more</span></p>
<p class="MsoNormal">To save the results to a file, redirect your output to a file of your choice using the greater than symbol:</p>
<p class="MsoNormal"><span style="color: #800080;"><em>command</em> &gt; <em>outputfile</em></span></p>
<p class="MsoNormal">To open a logfile, use gunzip –c (the –c will only gunzip it to the screen instead of uncompressing and saving your file) if your file is ends in a .gz, which indicates it is compressed. Use the “cat” command if the logfile is not compressed. To take a peek at one of your logfiles, you would do the following:</p>
<p class="MsoNormal"><span style="color: #800080;">gunzip –c <em>file.gz</em> | more</span> or                     <span style="color: #800080;">cat <em>file</em> | more</span></p>
<p class="MsoNormal">The remainder of our examples assume we are examining a compressed logfile.</p>
<p class="MsoNormal"><strong>To pull out all records from one IP address (1.2.3.4, for example):</strong></p>
<p class="MsoNormal"><span style="color: #800080;">gunzip –c <em>file.gz</em> | grep “1.2.3.4” &gt; <em>outputfile</em></span></p>
<p class="MsoNormal"><strong>To pull out all records from any IP address that begins with 12: </strong></p>
<p class="MsoNormal"><span style="color: #800080;">gunzip –c <em>file.gz</em> | grep “^12.” &gt; <em>outputfile</em></span></p>
<p class="MsoNormal">Notes:</p>
<ul>
<li>A caret (^) is the how you specify the beginning of a line with a regular expression</li>
<li>The backslash () tells the regular expression you are looking for an actual period instead of a wildcard</li>
</ul>
<p class="MsoNormal"><strong>To look at the requests made by one IP address (1.2.3.4, for example):</strong></p>
<p class="MsoNormal"><span style="color: #800080;">gunzip –c <em>file.gz</em> | grep “^1.2.3.4 “  | cut –d’ ‘ –f7 &gt; <em>outputfile</em></span></p>
<p class="MsoNormal"><strong>Pull out “page” requests only (status code = 200, and not an image, css, or javascript file):</strong></p>
<p class="MsoNormal"><span style="color: #800080;">gunzip –c <em>file.gz</em> | grep “ 200 “ | grep –v “.jpg “ | grep –v “.gif “ | grep –v “.png “ | grep –v “.css “ | grep –v “.ico “ | grep –v “.js “ &gt; <em>outputfile</em></span></p>
<p class="MsoNormal">Notes:</p>
<ul style="margin-top: 0in;" type="disc">
<li class="MsoNormal">You can exclude any other      file extensions you wish by piping another grep –v into your command; ending      the grep string ends with a space ensures you will only eliminate lines      where those extensions are the request, and not embedded in a query string      value.</li>
<li class="MsoNormal">If you do a lot of logfile      parsing, you may wish to put all the grep –v commands into a script so you      don’t have to type all the commands every time you want to limit your      output to pages.</li>
</ul>
<p class="MsoNormal"><strong>Make a list of the most popular referrer fields for the /index.html page:</strong></p>
<p class="MsoNormal"><span style="color: #800080;">gunzip –c <em>file.gz</em> | grep “GET /index.html” | cut –d’ ‘ –f11 | sort | uniq –c | sort –nr &gt; <em>outputfile</em></span></p>
<p class="MsoNormal">Notes:</p>
<ul style="margin-top: 0in;" type="disc">
<li class="MsoNormal">The output will be a      sorted list of lines with a number and a URL; the number is how many times      the referrer occurred, and the URL is the referrer</li>
<li class="MsoNormal">The uniq command must be      executed on sorted input, which is why we sort the output first</li>
<li class="MsoNormal">The second sort command      lists the output by most to least popular referrer; -n is numeric and –r is      reverse order</li>
</ul>
<p class="MsoNormal"><strong>Pull out all the records from userid “angie”, and sort them by timestamp:</strong></p>
<p class="MsoNormal"><span style="color: #800080;">gunzip –c <em>file.gz</em> | grep “ angie “ | sort –t’ ‘ +3 &gt; <em>outputfile</em></span></p>
<p class="MsoNormal">Note:</p>
<p class="MsoNormal">The sort command is modified as follows: -t’ ‘ says the input is space-delimited, while the +3 says to sort on the fourth column (defaults to first column, but we need to move it over three columns)</p>
<p class="MsoNormal"><strong>Find all requests that are more than 1000 characters long:</strong></p>
<p class="MsoNormal">Very long requests are often a sign that something is wrong: they can indicate a problem with your website’s code or they can be indicative of someone trying to hack into your website (especially if the requests contain any SQL code words).</p>
<p class="MsoNormal"><span style="color: #800080;">gunzip –c <em>file.gz</em> | awk ‘length &gt; 1000’ &gt; <em>outputfile</em></span></p>
<p class="MsoNormal">Stay tuned for more posts with additional commands.</p>
]]></content:encoded>
			<wfw:commentRss>http://showmeanalytics.com/2009/01/unix-commands-for-logfile-parsing/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>
