GoAccess

CloudFlare recently informed me that this website was getting thousands of hits every day, which is an unusual occurrence. I pulled up Google Analytics to figure out where all that traffic was coming from, only to be informed that I was getting three to four hundred daily visitors at most.

This felt a bit suspicious, so I dug into my Nginx logs to see if something was up. I pulled up the logs in Vim, but it was too hard to make sense of any of the raw data by reading it line-by-line. I needed something that could help me visualize my logs. I asked around and found a little tool called GoAccess.

GoAccess is an open-source log analyzer that can help you visualize your server logs in the terminal or use them to produce an HTML report. I installed it from the Ubuntu repositories and ran it:

$ goaccess /var/log/nginx/access.log

I was greeted with a dialog listing a bunch of popular log formats, asking me which one my file conformed to. After searching the Web for a bit, I figured out that Nginx uses something called the NCSA Combined Log Format for its messages. I picked that in the dialog and was on my way.

After a few minutes of looking at the aggregated data, I felt that analyzing just one file wasn’t telling me the entire story. I wondered if I could analyze all the logs produced by Nginx in the last month at once. After searching the Web a bit more, I found that I could use zcat to unzip the older logs and print them to stdout, and then pipe that into GoAccess. So I did this:

$ zcat -f /var/log/nginx/access.log* | goaccess --log-format=COMBINED

Turns out someone was trying to exploit my website by by sending malicious inputs to WordPress’s xmlrpc.php. From the access patterns, it looked more like a drive-by automated attack than a human trying to break in. Since I didn’t need any of the features enabled by the WordPress API, I blocked access to xmlrpc.php entirely:

location = /xmlrpc.php {
    deny all;
    access_log off;
    log_not_found off;
}

The downside to this is that I can’t post to my website from the WordPress app on my iPhone. But that’s not something I do often, so losing that feature is not a big deal.

After this incident, I also removed Google Analytics from my website. I’m fundamentally opposed to business models that are based on surveillance-based advertising, which is why I try to stay away from Google products as much as possible. My server logs give me enough data to judge how well my posts are doing, and with GoAccess I now have a reasonable way of querying and visualizing that data. That’s pretty much all I need.