If you’ve read anything about SEO in the last year, you know duplicate content is a big no-no. You may even understand why. It’s a detailed subject, and Dr. Peter Meyer, resident Marketing Scientist at Moz, already did a great job explaining duplicate content, so I won’t beat a dead horse.
Instead, I’ll share a real-life scenario about how innocent duplicate content was holding a client’s site back. Our fix made a significant positive impact on how often content was being indexed by Google and how frequently the site was displaying in SERPs, meaning we had more people visit the site, which often leads to increases in conversions and, ultimately, more money in your pocket.
This story starts with duplicate URLs.
URL-based duplicate content is the most common form of duplicate content – and one that many inexperienced webmasters or marketers might overlook entirely. For example, two users might type in www.example.com and www.Example.com when arriving at your site. One page to the user, two pages to Google.
Most of the time, the content on these URLs is exactly the same. The user may not even notice a difference, but because Google sees the URL with the capitalized “E” and the URL with the lower-case “e” as two separate pages with identical content, it is considered duplicate content.
And there is an infinite number of URL combinations that a visitor might – for whatever reason – be tempted to try. For example, all of these URLs would take a visitor to the same content on your homepage:
And that’s just a handful of common variations on capitalization. If your site architecture appends a trailing slash to your URLs, you’ll notice an even greater problem. Even a stray “&” popping up can cause unexpected duplicate content issues.
The bottom line: If two pages with different URLs of any form have the same content, you may be in trouble. Here are the steps we took to fix the issue as quickly as possible and what came out of the change.
Identify the duplicate URLs
First, we manually checked to determine the different variations of URLs that existed. We pulled duplicate URLs using a scraper tool called Screaming Frog. From there we identified patterns in URL issues that we’d need to report to our software developer. For example, we found that www.example.com, www.example.com/home.aspx, www.example.com/default.aspx and all variations thereof loaded the same content but did not use a redirect or a canonical tag to establish the page that should receive the credit for the content.
So, what’s a canonical?
Create the canonicals
There are a handful of ways you can fix duplicate content problems. In this instance, we used the canonical tag – a simple coding signal that tells Google to look at the preferred URL of a page, and only that URL. It looks like this:
<link rel=”canonical” href=”http://www.domain.com” />
With that code on the page, you’re telling Google that the content on this page is actually best read on the URL in the quotation marks, and that it should ignore any other possible URLs when indexing content.
On our list of duplicate URLs, we specified where it was duplicated and the correct URL that should be indexing. We handed this list to our developer along with the appropriate canonicalization (we prefer to avoid capital letters, .aspx and other extensions, as well as the trailing slash).
Once you get this information to a developer, the code change is rather simple, and soon Google knew to ignore duplicate URLs and give sole credit to the proper address.
Let Google Know What’s Up
When fixing an issue like duplicate content, you’ll also want to make sure that you are linking to the canonical URLs internally throughout your site. You can use a tool like the Moz crawler or Xenu to identify these links. Once the canonicals were created and the internal links were checked, we resubmitted the sitemap to Google Webmaster Tools, making sure that the preferred canonical URLs were referenced in the sitemap. Your sitemap has to match the canonicals, and the URLs that your site allows users to navigate to internally should match the sitemap wherever possible (and that should be almost everywhere).
Within weeks of making the canonical tag updates and resubmitting the sitemap to Webmaster Tools, we were seeing results. The site saw more than a 50 percent increase in the number of Google search queries for which it was being displayed. This led to a 25 percent increase in impressions, which resulted in an increase in clicks and a steady increase in visits over the next several months. Because we were able to consolidate the credit for the content, Google rewarded us.
We saw more than a 50 percent increase in Google Search Queries after the Canonical change was implemented.
Huge increases in new visits and Goal Completions in the 3 months following our changes.
The query screenshot from Google Webmaster Tools was taken before Google began stripping most, or all, of client keyword data, but the increase in traffic speaks volumes for the efficacy of this simple fix.
We’ll be the first people to tell you that impressions and traffic isn’t usually the best barometer for determining the success of a website, but this example just goes to show the impact that a small tweak can have on your website’s SEO.
So, be on the look out for duplicate content issues. The duplicate content described in this post went under the radar for much longer than it should have. Even if you think a site is set up properly, it’s smart to do a check at least every few months to be sure everything is up to snuff. Few if any SEO tactics are a one-and-done fix. Regular evaluation of your websites and your strategies will allow you to work proactively, instead of reactively, as the digital industry continues to evolve.