Skip to main content
Education

Removing referrer spam in Google Analytics

In this article we are going to cover how to remove referrer spam from your Google Analytics data, allowing you to view clean and accurate statistics that have not been affected by spam.

One of the universal problems that website owners and analytics manages face derives from the fact that spammers will regularly change referral names, keywords, methods, etc.

This presents an issue where central solutions which are reliant on specific spam naming, are rendered inefficient or completely redundant. It’s worth noting also, that spam handling scripts or plugins on the server side won’t help you because ghost spam doesn’t arrive on the website and is instead send via measurement protocol directly to your Google Analytics code.

For this reason, we are going to focus on a combination of methods that are efficient and also safe. Following these 4 steps will protect your real data and leaving it intact.

A Specific Solution!

You should prepare 4 specific filters so you can avoid spam from ghosts, crawlers, internal traffic, and bots.

Ghost spam sent to your Google Analytics code via the measurement protocol will leave the telltale sign of using a fake or undefined hostname, which then is captured within your reports as having a “not set” hostname. Upon realising this fact, you will see the solution is to create a filter which only allows valid hostname traffic to pass. The method works for all referral types and provides an elegant solution.

Ghost Spam in Google AnalyticsLet’s now explain the steps required for creating this filter :

A) Identify your hostnames and compile them into a list. These can be found within the Network reporting sections of your Analytics account.

B) Build a regular expression that contains all of your hostnames

C) Create an include filter of your valid hostnames.

Crawler Spam Exclusion

Next you will need to create another filter that is just for crawler referral spam, which won’t be as easy to detect as ghost spam, because it actually uses a valid hostname.
Google Analytics Filter

From your admin section in Analytics, add a new filter from the view column, then give it the name ‘Crawler Spam’ and assign a custom filter type that excludes ‘campaign source’ using a filter pattern. You will need to create a unique filter for each of the offending expressions.

NOTE:  Based on your spam referring domains, do a Google Search to find an expression that will assist you in blocking this traffic.

Then save your filters after copying & pasting each of the expressions above.

Language Filters

Next we will create a language filter with the field type set to ‘language settings’ with the pattern field set to use this expression

Google Analytics Filter

\s[^\s]*\s|.{15,}|\.|,|^c$

Then verify your language filter to check that it works.

Once that is completed setup an additional layer of protection with a filter using a field type of ‘page title’ and then past the expression into the pattern field.

NOTE:  Based on your spam referring domains, do a Google Search to find an expression with the ‘page title’ that will assist you in blocking this traffic.

Exclude Internal Traffic

Now we want to exclude internal traffic from our data, even though it is not referral spam. Impressions created by you, and others in the business, can skew your results so be sure to exclude each IPv6.

Filter Bots Out

Finally, filter out the bot traffic from your logs using the view settings within your Analytics admin section. You will find a check box for ‘bot filtering’ so be sure to enable that and then save the view.

Google Analytics Filter

Confused Yet?

If all of this information went a bit over your head, or you just don’t have the time to figure this out, give the team at PN Digital a call – we can help tidy your Google Analytics account up and get it running optimally.