Take Time to Love Your Log Files

Take Time to Love Your Log Files

In a conversation Monday night after the Boston SEO Meetup, I was reminded of the criticism I received from a post on Search Engine Journal called 3 Reasons Google Analytics Fails for SEO.

And for an unknown reason, about ten minutes past 3AM on a Sunday morning, I find myself sifting through one of my favorite elements of organic optimization.  Log files. Ugly to the untrained eye, these plain text files are a continual mess of what some would determine to be useless information.

I find them wonderfully mesmerizing. My server logs are more than some silly ASCII art fascination too. They’re the foundation from which my career in this industry was built on.

My post about Google Analytics on SEJ was written more than a full year ago. In that time Google Analytics has become prettier and more polished. It allows for more flexibility in reporting and even runs faster, tracking more goals, sources, conversion points… And whatever else the marketing big wigs deem useful.

I agree that analytics are critical to any marketing effort so please don’t get me wrong on that. But how many SEOs do you know out there who make it part of their regular practice to parse and review server logs? Very few. I guarantee it. I’m not trying to jump on the soap box here either. Some SEOs wouldn’t know what to do with a log file. Others simply don’t care.

But here’s a few reasons I not only care, but encourage others to love their logs…

Spidering Behaviors
Before any page of content is being ranked, it needs to be indexed. And in order to be indexed, a spider has to come on through. The very instant a spider requests one of your pages – a server log records it. What the user agent was. What URL was pulled. If it came from a referring URL. What type of HTTP response the server generated. All of these things are available for every single page requested.

Deep crawls are more a memory know with the advancement of social media – but some large and content rich sites love to see their site getting hammered by spiders eager to gobble up their content and whisk it off to be run through some algorithm of secrecy.

I like to use logs to answer some funky questions… Things like, how often specific pages are being pulled. What the delay is between initial spidering and inclusion in search indexes. What factors help control the acceleration of that process.

There’s a wealth of information in there if you’re hungry enough to learn.

404 Errors
Ever have someone link to your site / blog / domain / content with the wrong destination URL? Well, if you’re relying on page tagging analytics like GA, Omniture or otherwise… You’ll never know it.

Seven years ago I wrote an article with a really cheesy title: Never Ignore the 404.

Give that a read through because non surprisngly, the value of locating and understanding your 404 errors is simple and conventional. In fact… It hasn’t changed. I don’t think it ever will, either.

Historical Value
I’ve been with my hosting provider for years. So many that I actually have logs archived from the same sites dating back to ’99.  Every week my provider rolls over the log files meaning that I’ll always have a 7 day supply on the server.

I scheduled an FTP app to pull the logs and archive them locally. When the proverbial shit hits the fan and I’m left wondering when the last time X happened… I crack open the log file analyzers and start tracking historical patterns.

In August of 2001 there was a massive spread of the Code Red Worm that was running wild  and generating bogus traffic in analytical reports. The firm I co founded with Andrew Gerhart, Top Site Listings saw a crazy spike in traffic that we knew was suspect:

Without any major or radical changes in link popularity campaigns, without any new submissions, advertisements, or email distributions, TSL saw this:

• A 400% increase in daily visitor sessions
• Over 25 newly identified TOP 10 traffic referrers – in one week
• An increase of over 2000% of overall site hits

And on that note… I’m finally tired. Download some log files and have fun though. Seriously. There’s too much information crammed in there not have fun. Be warned though, as Stephan Spencer would say… You may find yourself trying to drink from the fire hose.

2 thoughts on “Take Time to Love Your Log Files

  1. Good information.I have lots of blog posts and I try to deep link from a post to another when appropriate.It makes it easier for the search engines to crawl and then possibly index more of the pages.

Leave a Reply

Your email address will not be published. Required fields are marked *