What actually is duplicate content?

Blog
William Bachmann

Duplicate content on websites can result in the corresponding content being given a lower ranking by Google. This is how you recognize and avoid unintentional duplicate content.

How is duplicate content created?

Duplicate content is used when large blocks of content are repeated or almost repeated on the same or different domains. Duplicate content can exist internally and externally.

Internal duplicate content can occur for the following reasons, for example:

  • A page on your own website can be reached via several URLs: http: // domain .de, http://www.domain.de, http://domain.de/
  • There is a print-optimized version of one or more pages
  • In addition to a regular website, a shortened page aimed at mobile devices is also generated

The following reasons, for example, can lead to duplicate content on multiple domains:

  • In the context of content cooperations, other sites publish your content lawfully
  • Other sites are using your content illegally
  • You use the same content for the different country versions of your website
  • With a domain move

Loss of ranking or penalty due to duplicate content

Google has no problem with duplicate content on one or more domains if they are identified. Google wants to avoid that multiple search results lead to the same content. If they are not recognized, they otherwise lead to a negative user experience for search engine users.

If Google considers the duplicate content to be a deception, it can happen that the corresponding website is removed from the search results. “Such unfair behavior can lead to a negative user experience, as visitors are shown basically the same content in a series of search results,” says Google. It is therefore important to find duplicate content and make it visible or avoid it.

However, duplicate content that does not result in a penalty by Google can also damage the website, for example because indexing problems occur. It should be clear to Google at all times which page contains the most relevant content or the original content for a search query.

Find duplicate content

There are different ways to find duplicate content. You can search for concise sentences or text excerpts directly in Google search. You enter this in quotation marks in the search mask. If you get multiple hits, there is duplicate content.

In order not to display a lot of the same content, Google hides most of the duplicate search results and displays the following note: “To ensure that you only get the most relevant results, some entries that are very similar to the 2 hits displayed have been left out. If necessary, you can repeat the search taking into account the results you skipped. “

screenshot-2016-10-28-um

If you carry out the search again, including the duplicate content, you can check whether it is duplicate content on your own or another domain. There are also numerous free tools, so-called duplicate content checkers, with which you can track down duplicate content. It should be noted here that these tools also sometimes find very small text excerpts, such as teasers et cetera, which usually do not cause any problems.

There are also free tools to identify duplicate content on your own site, for example Siteliner . The tool outputs a comprehensive report that can be used to check the duplicate content.

screenshot-2016-10-28-umscreenshot-2016-10-28-um

Dealing with duplicate content

There are various ways of informing Google what the original content or the preferred content is, or of avoiding duplicate content from the outset:

Domain redirection in the .htaccess

With the following entry in the .htaccess file, the domain without www can be redirected to the domain with www or vice versa:


RewriteEngine on
RewriteCond %{HTTP_HOST} ^beispiel.de
RewriteRule ^(.*)$ https://beispiel.de$1 [R=301,L]

or


RewriteEngine on
RewriteCond %{HTTP_HOST} ^www.beispiel.de
RewriteRule ^(.*)$ http://beispiel.de$1 [R=301,L]

Permanent 301 redirect via .htaccess

If you want to redirect an old, no longer existing domain or file to a new one, a server-side redirect with 301 redirect via .htaccess makes sense. The user and the link force are thus transferred to the new target.

This setting is made in the .htaccess file in the root directory:


RedirectPermanent / https://domain-neu.de

noindex reference in the META tags

Another possibility to prevent duplicate content is a noindex comment in the META tags for the URLs that do not contain the most relevant content. This prevents these pages from being indexed.

Canonical URL

Canonical tags can be used to tell Google which version of the same content is relevant. Google writes: “Mark the canonical page and all associated variants with a link element rel =“ canonical ”. Add a <link> element with the rel = “canonical“ attribute to the <head> section of these pages: <link rel = “canonical“ href = “https://blog.example.com/dresses/green-dresses -are-awesome “/>“. It is important that you specify absolute instead of relative paths for the link element rel = “canonical”.

You can find more ways in which you can use canonical URLs on the Google support page .

Avoid re-using content

Avoid using large blocks of text twice as much as possible. This is of course not always possible, but where possible you should create individual content. If another site operator is using your content lawfully, he should ensure that the relevant content is provided with a canonical tag that refers to the original.

Conclusion

The SEO experts disagree on exactly how dangerous duplicate content can be for the ranking – especially if it is only about smaller snippets and teasers. After all, this can hardly be avoided entirely. What is certain, however, is that the constantly changing search engine algorithm prefers unique content and wants to keep the search results heterogeneous. So it can’t hurt to keep an eye on it and avoid unnecessary duplication.

Leave A Comment