What are crawl errors?
Crawl errors occur when a search engine tries to reach a page on your website but fails. Let’s shed some more light on crawling first. Crawling is the process where a search engine tries to visit every page of your website via a bot. A search engine bot finds a link to your website and starts to find all your public pages. The bot crawls the pages, indexes all the contents for use in Google, and adds all the links on these pages to the pages it still has to crawl. Your main goal as a website owner is to ensure the search engine bot can get to all pages on the site. Failing this process returns what we call crawl errors.
Your goal is to ensure that every link on your website leads to an actual page. That might be via a 301 redirect, but the page at the end of that link should always return a 200 OK server response.
Google divides crawl errors into two groups:
- Site errors. You don’t want these, as they mean your entire site can’t be crawled.
- URL errors. You don’t want these, but since they only relate to one specific URL per error, they are easier to maintain and fix.
Let’s elaborate on that.
Site errors
Site errors are all the crawl errors that prevent the search engine bot from accessing your website. That can have many reasons, these being the most common:
- DNS Errors. This means a search engine isn’t able to communicate with your server. It might be down, for instance, meaning your website can’t be visited. This is usually a temporary issue. Google will come back to your website later and crawl your site anyway. If you see notices of this in your Google Search Console at crawl errors, that probably means Google has tried a couple of times and still wasn’t able to.
- Server errors. The bot couldn’t access your website if your Search Console showed server errors. The request might have timed out. The search engine (f.i.) tried to visit your site, but it took so long to load that the server served an error message. Server errors also occur when there are flaws in your code that prevent a page from loading. It can also mean that your site has so many visitors that the server just couldn’t handle all the requests. Many of these errors are returned as 5xx status codes, like the 500 and 503 status codes.
- Robots failure. Before crawling, (f.i.) Googlebot tries to crawl your robots.txt file as well, just to see if there are any areas on your website you’d rather not have indexed. If that bot can’t reach the robots.txt file, Google will postpone the crawl until it can reach the robots.txt file. So always make sure it’s available.
That explains a tad bit about crawl errors related to your entire site. Now let’s see what crawl errors might occur for specific pages.
URL errors
As mentioned, URL errors refer to crawl errors that occur when a search engine bot tries to crawl a specific page of your website. When we discuss URL errors, we tend to discuss crawl errors like (soft) 404 Not Found errors first. You should frequently check for these errors (use Google Search Console or Bing webmaster tools) and fix them. If the page/subject of that page is gone, never to return to your website, serve a 410 page. If you have similar content on another page, please use a 301 redirect instead. Make sure your sitemap and internal links are up to date as well.
We found that a lot of these URL errors are caused by internal links, by the way. So a lot of these errors are your fault. If you remove a page from your site at some point, adjust or remove any inbound links to it as well. These links have no use anymore. If that link remains the same, a bot will find and follow it, only to find a dead end (404 Not found error) on your website. You need to do some maintenance now and then on your internal links!
Another common URL error is the one with the words ‘submitted URL’ in the title. These errors appear as soon as Google detects inconsistent behavior. On the one hand, you submitted the URL for indexing, so you’re telling Google: “Yes, I want you to index this page.” On the other hand, something else is telling Google: “No, don’t index this page.” A possible reason could be that your robots.txt file blocks your page. Or that the page is marked ‘noindex’ by a meta tag or HTTP header. If you don’t fix the inconsistent message, Google will not index your URL.
Among these common errors might be an occasional DNS error or server error for that specific URL. Re-check that URL later and see if the error has vanished. Be sure to use fetch as Google and mark the error as fixed in Google Search Console if that is your primary monitoring tool.
Very specific URL errors
Some URL errors apply to certain sites only. That’s why I’d like to list these separately:
- Mobile-specific URL errors. This refers to page-specific crawl errors that occur on a modern smartphone. If you have a responsive website, these are unlikely to surface. You might run into more errors if you maintain a separate mobile subdomain like m.example.com. Things along the lines of faulty redirects from your desktop site to that mobile site. You might even have blocked some of that mobile site with a line in your robots.txt.
- Malware errors. If you encounter malware errors in your webmaster tools, this means Bing or Google has found malicious software on that URL. That might mean that software is found that is used, for instance, “to gather guarded information, or to disrupt their operation in general.”(Wikipedia). You need to investigate that page and remove the malware.
- Google News errors. There are some specific Google News errors. There’s quite a list of these possible errors in Google’s documentation, so if your website is in Google News, you might get these crawl errors. They vary from the lack of a title to errors that tell you that your page doesn’t seem to contain a news article. Be sure to check for yourself if this applies to your site.
Fix your crawl errors
The bottom line in this article is definitely: if you encounter crawl errors, fix them. It should be part of your site’s maintenance schedule to check for crawl errors now and then.
Read more: Google Search Console: Crawl »
Coming up next!
-
Event
WordCamp Europe 2024
June 13 - 15, 2024 Team Yoast is at Attending, Organizing, Sponsoring, Volunteering WordCamp Europe 2024! Click through to see who will be there, what we will do, and more! See where you can find us next » -
SEO webinar
Webinar: How to start with SEO (May 7, 2024)
07 May 2024 Learn how to start your SEO journey the right way with our free webinar. Get practical tips and answers to all your questions in the live Q&A! All Yoast SEO webinars »
Great information! I will check now for errors in my website thanks to your post :)
Does this type errors affect on SEO?
Thanks.
Yes, crawl errors can negatively affect your SEO!
True!
Great post! Thank you so much for this important information i will share it with my friends for sure :)
Thank you very much for an interesting and useful review!
Collecting and fixing crawl errors comes under technical SEO. If we do it in a correct way, our pages will and can see improved site raking.
A complete article about crawl errors. I was looking for this information.
Please i always get crawled late and most of my post don’t make it to front page
Hi Cletus, perhaps this post https://yoast.com/how-to-get-google-to-crawl-your-site-faster/ can help you fix the problem?
Thanks for your page! Your share the information it helped me alot!
I’ve been thinking lately whether I should remove old articles or revamping it by giving it some SEO basics. This made me rethink about removing it (404 errors, internal linking etc.). Thanks a lot for the useful information!
Hi Yuen Mi, Perhaps this is also an interesting read for you: https://yoast.com/republish-old-content/
Yes, I had to work hard because of that error and repair independently on my website that can be generated due to changes the URL in the URL and other articles which are sometimes included in the code on my website.
Hi Firnandus, It can be a lot of work indeed! But worth the effort. If you have the Premium version of Yoast SEO, we can help you prevent crawl errors when changing URLs: https://yoast.com/wordpress/plugins/seo/redirects-manager/
I get a lot of crawl errors linked from spam sites. Hundreds a month. This wasn’t mentioned in the article. I don’t know whether to ignore them and let them pile up, keep deleting them or maybe a disavow file? The sites often format the URL incorrectly thus creating the 404 error in GSC. Contacting a website from China or Russia and asking them to stop is a scary thought as well.
Hi Mark,
That’s a bummer. I think reaching out to a specialized company like Linkdetox would be a good idea? You can keep on disavowing these links, but a more thorough approach might be better.
I was looking for this information. Thanks
Thank you so much for telling me importance of crawl error.
Hi,
I recommend to add a notification badge icon in yoast, whenever a 404 url is found.
In that way we can fix these find out and fix this errors faster.
I also opened an issue about this on github:
https://github.com/Yoast/wordpress-seo/issues/9396
That would be very nice if you add this feature.
Thanks
Thanks Hossein, we’ll have a look into it!
Is there a simple way to check for internal 404 errors before the Google crawl robot arrives?
Hi Bill, You can use Screaming Frog https://www.screamingfrog.co.uk/seo-spider/ to find your 404s if you don’t want to use Google Search Console. Not sure if you will be faster than Google though, that probably depends on how new your site is and how often it gets crawled.
Thank you for the well summarized article on crawl errors