Blog Stats Madness

      No Comments on Blog Stats Madness
Blog Stats

Blog stats tell us who comes to our blogs and what they do while they are there. Or, at least, that’s the idea. When I first started this blog, like a lot of people, I thought the built in stats app would suit my stat needs, but I quickly realized otherwise. The stats plugin built into Jetpack is neither accurate nor detailed enough. Thus, I set out to find something better. This is how that turned out.

Blog Stats with Google Analytics

Google Analytics was the first and most obvious choice. It’s easy to set up and quite thorough. Google Analytics is especially great if you are using AdSense or AdWords (I don’t), because it integrates with both effortlessly.

There are several plugins that can insert the analytics code into your blog automatically, but I didn’t have luck with them. I ended up just sticking it into my theme header myself, which worked fine. The one plugin I found that I did like is Analytics360 from MailChimp. It doesn’t put the code in your pages, but what it can do is pull your analytics data into your WordPress dashboard. It also has a cool graph that shows how your recent blog activity and how it relates to your visitor stats. You have to sign up with MailChimp to use it, but the free account is all your need.

Thing were going fine with Google Analytics until I started using CloudFlare, and then I started to question the Analytics data.

CloudFlare Analytics

CloudFlare is not a stats tool, not really. It’s a proxy service that routes your blog traffic through their own DNS servers. This lets them do a couple of things. First, they can cache your static content, such as images and scripts, and serve it from a cookie-free domain that allows them to load faster. Second, they can monitor the traffic and detect if there is an attack on the site. If they detect an attack, they can then block that traffic.

The fact that all of your traffic goes through the CloudFlare DNS means that they are in a position to see everyone and every thing that hits your site. Their free service offers some limited statistics in their analytics dashboard, which includes page views, unique visitors, bots, and threats. What surprised me was that CloudFlare was showing significantly higher page views and unique visitors than Google, as in 25x higher. CloudFlare offered the following explanation as to why:

Google Analytics and other web-based analytics programs track visitors that trigger JavaScript. As a result, threats, bots and automated crawlers are not recorded since these visitors typically do not trigger JavaScript. These services also don’t track visitors who leave a page before it is fully loaded or have Javascript disabled. CloudFlare tracks all of your traffic by requests, so your CloudFlare visitor number is most likely higher.

Is this possible? Are this many people really visiting my site with JavaScript disabled or leaving before the page loads? Maybe a few, but this many? On one particular day Google showed one unique visitor (yes, my blog is hopping) and CloudFlare showed 26. On another day, Google showed 14 unique users and CloudFlare showed 216.

While there people may be visiting with JavaScript disabled, noScript blocking tracking, or not letting the page load, I thought it was unlikely that this many were.

After doing some research, I decided to install another analytics program, Piwik.

Blog Statistics with Piwik

Piwik is different from other analytic platforms, because you install it yourself on your own server. Like Google Analytics, it uses JavaScript tracking code to track your statistics, but it also uses a 1px X 1px image to catch some data about people without JavaScript installed. So, if CloudFlare’s stance that these phantom visitors are users who, for whatever reason, aren’t triggering Javascript, Piwik should still pick it up.

I installed Piwik and compared it with CloudFlare and Google Analytics for a few days. What it told me was basically the same thing Google Analytics told me, no one is coming here (so if you’re reading this, thanks).

So, what’s the deal? Where are all these extra visitors that CloudFlare sees coming from? To find the answer, I went into the raw visitor logs.

Ah ha! Oh, no…

The Apache visitor logs told the tale. Once I looked at them (something I should have done sooner), it was really clear where the discrepancy was. So, who was right? CloudFlare? Google? Piwik? The answer is: All of the above.

CloudFlare was correct that there were dozens, at times even hundreds, of “visitors” to my blog with unique ip addresses. What it was missing, however, was that most of them, 99% of them, were spammers, hackers, and bots.

Spammers, Hackers, and Bots! Oh my!

CloudFlare recognizes bots and doesn’t include them in their visitor data, but only certain bots, the nice bots, bots like GoogleBot and Bing. Most of the mysterious site traffic came from other bots. Some bots were bad bots and some were just bots that CloudFlare doesn’t sort out, like The Way Back Machine. Since these crawlers come in with various IP addresses, they were registering as unique users.

Along with the bots, the Apache logs revealed numerous hacking attempts. There were several brute force attempts to break my WordPress password, all unsuccessful. There were also a lot of 404 errors from hackers looking for known vulnerabilities. In theory, CloudFlare should be blocking these, since that is sort of its purpose, but instead of blocking them, it was registering them as unique visitors.

There were also comment spammers and pingback spammers in the mix.

The Verdict

Seeing all those hackers and spammers was a bit disconcerting, but I have taken some measures to protect the site in response. That, however, is a topic for another post.

Google Analytics and Piwik were both accurate in telling me how often actual humans were visiting my site. CloudFlare was accurate in telling me how many hits my site got from various sources, many nefarious.

I like Piwik. I like that I control it. I like that it honors Do Not Track. I like the way to lays out the information. It takes a little more effort to set it up, since you need to install it on your server, but it is worth the effort. I plan to continue using it.

Google Analytics is fine, but I am planning on removing it from my site. There’s just no point in having both Piwik and Google Analytics. If I used AdWords or AdSense I might feel different, but I don’t.

CloudFlare Analytics is not necessarily useless, but it doesn’t really give you useful blog stats. It’s fine for seeing how much traffic you are getting from all sources, but it doesn’t help you figure out how many human people are reading your posts.

So, when it comes to BLOG STATS my recommendation is Piwik. And, I’ll have more to say about blog security at a later time.

Blog Stats Madness
Article Name
Blog Stats Madness
This article discusses the various methods for getting blog stats and their effectiveness.

Leave a Reply

Your email address will not be published. Required fields are marked *