Did you know that around 100 million websites (and counting) now use Google Analytics? If yours is one of them, there’s a chance you pay the tool a visit every working day.

It might surprise you then, to learn that there may be things you are missing. Ever noticed that your analytics don’t match your business sales? Or that you have two Analytics implementations that don’t quite line up? This is usually the result of Google skipping over certain types of content on your site.

Google’s crawlers process assets like Word documents and PDFs in the same way as HTML pages. If our search overlords deem the content of your assets relevant, it allows them to gain organic ranking and drive invisible traffic to the tune of an extra 15%.

Yet surely this is good news? Well, not always. What if you don’t necessarily want Google giving your asset away for free?

Consider the ‘if a tree falls in the forest’ problem. People may be accessing your assets, people may be looking at them – they may even be getting value from them. But if you can’t measure that value, then what’s the point?

There’s a reason why we measure. We want to know if people read our content or bounced right off our page; we want to know how they got there, where they went next, and why. We can then gather this information and use what we’ve learned to make our next material even better. If our content is doing well, we want to know why…in glorious, technicolour detail.

What in Sergey Brin’s name is going on?

So how do we get to the root of the problem? No-one said it was going to be easy. If you want to squeeze all the juice you can out of your Analytics, Medium suggests going about your mission in the following ways:

  1. Map out the content on your domain that has been indexed by Google by using the following Search Operator: “site:mydomain.com inurl:pdf”. This should show data for any PDF document. Make it more specific by adjusting the domain, e.g. “blog.example.com”. Identify any results that you don’t want showing.
  2. Filter the Pages in Search Console (Search Analytics report) to show pages that feature the string “pdf”. This will show you which of your assets are driving traffic and allow you to estimate just how big your PDF ‘blind spot’ is.

OK, Google: how do I fix the problem?

This is the bit that makes a difference. In terms of using the new information bestowed upon you, you have a number of options…

  1. Performing well? Leave it well enough alone

Found some old content ranking highly for a specific keyword? So long as the content on the page is relevant, uses your up-to-date branding, and doesn’t contain sensitive information, why create extra work for yourself? You can make minor tweaks to the document using the same link.

  1. Got some poorly ranking PDFs? Freak out

Le freak, c’est chic. This isn’t necessary, but if you have a piece of PDF content ranking poorly for a specific keyword, don’t be surprised. HTML pages rank more highly than documents, that’s just the way it’s always been.

  1. Don’t want to share your content for free? Redirect your traffic

If there’s content out there you’d prefer to keep gated, use a 301 redirect to replace the existing document’s path, or (better still) a 401 response to let bots know that the content is still available. If you take this path, Medium recommends leaving an excerpt of the content on the page to keep its ‘organicness’ ticking over.

  1. Update your Analytics to include PDFs

You can do this in a number of different ways (as Google mentions here). If you use Tag Manager you can use tag triggers to track the click performance of your PDFs. Or if you’re a true Analytics aficionado, you can also create a custom dimension to track PDFs (so long as they have your Analytics code on). 

Extra bits and bobs

Once you’ve fixed the PDF problem, there are other things you’ll want to be aware of such as Google Web Light referral traffic. Google Web Light is a ‘speed over style’ service that Google offers to its users in markets where internet and/or mobile signal infrastructure is poor. It essentially serves your content in an AMP-style page that strips back anything ‘fancy’ to help with load speed. This then often appears as referral traffic in Google Analytics, but it’s a perfectly feasible argument to say this is actually organic traffic.

Dark traffic is a loose term we use to describe direct traffic that arrived in a distinctly non-direct way, e.g. untagged email campaigns, and campaigns and data sent from botched tracking implementations. Having taken the road less travelled, data of this type isn’t captured by Google and may leave a black hole in your Analytics.

Then there’s ad blockers. Having grown in popularity for some time – in line with consumer demand for increased privacy – some ad blockers block web analytics platforms by default, while others can be configured to do so. This emphasis on security means there’s not a lot you can do to find out what you’re missing here.

Are you searching markets where Google doesn’t dominate? Something else to be mindful of is organic traffic sources mislabelled as referral traffic in Analytics – SERPs like Baidu, Bing, Yandex, Naver and Yahoo. This is often a problem for the ‘other’ search engines and is anecdotally something that happens with Yahoo’s localised search engines.

Want to talk through your content strategy? Our SEO wizards are waiting to hear from you. Get in touch for a friendly chat here.