Google’s John Mueller answered a query in regards to the curious circumstance of Search Console reporting hundreds of URLs as listed regardless of being blocked by robots.txt. Mueller helped clarify how this occurs and what to do about it.
Content material Listed Regardless of Being Blocked By Robots.txt
A Redditor requested for recommendation as a result of Google Search Console was reporting greater than 51,000 pages below the standing “Listed, although blocked by robots.txt.” The affected URLs have been primarily WooCommerce product URLs containing add-to-cart URL parameters like “?add-to-cart=”.
As a result of the problem appeared all of a sudden, the location proprietor questioned whether or not the robots.txt guidelines themselves have been answerable for creating the issue. Additionally they needed to know whether or not eradicating the principles would assist Google course of the canonical alerts and eradicate the reported URLs from Search Console.
The individual requested:
“I’ve WooCommerce web site and all of a sudden since previous month we face this challenge: “Listed, although blocked by robots.txt”
there are complete “Affected pages 51K pages”
ultimately of url I see largely ?web page&post_type=product&product=slug&add-to-cart=98063,
After inspecting these urls I discovered they’ve index tag setup and robots.txt had
* Disallow: /*?add-to-cart=
* Disallow: /*?*add-to-cart=I eliminated these two guidelines from robots.txt and hoping these pages fastened trigger they’ve canonical set to appropriate product, will that repair challenge?
or ought to I additionally setup noindex guidelines? will that trigger us our crawl price range? it’s fairly huge woocommerce web site, let me know guys your ideas if somebody has expertise fixing such challenge? and what would be the proper methodology with out stopping our search engine optimization or performance loss.”
Google Says Add-To-Cart URLs Don’t Want To Be Listed
Mueller responded that the add-to-cart URLs don’t have to be listed and that blocking them by robots.txt is a suitable strategy.
He defined that even when Google studies these URLs as listed, they’re unlikely to look in regular search outcomes as a result of they’re blocked by robots.txt. In accordance with Mueller, customers usually don’t seek for these URLs instantly, making them poor candidates for search visibility.
John Mueller responded:
“You don’t want the add-to-cart URLs listed. Blocking them with robots.txt is okay. Even when they get “listed” since they’re blocked by robots.txt, it’s unlikely that they’ll be proven in search (except you do particular queries for these URLs, which customers don’t do).”
I’m form of on the fence about what Mueller mentioned about “robots.txt” making it “unlikely” that the URLs can be proven in Search. The reason being as a result of robots.txt doesn’t forestall an internet web page from displaying in Google Search. It simply prevents Googlebot from crawling these pages. So technically, that’s not fairly appropriate and I’m just a little shocked Mueller would say that.
Noindex Is Most likely Not A Resolution
One of many Redditors who responded to that query advised the answer of including a noindex robots tag to the parameterized URLs. However that might not be a viable answer as a result of the pages with and with out the URL parameters are basically the identical factor. They’re rendered utilizing the identical template for a particular web page. So except WooCommerce treats them otherwise and might render the parameterized URLs with a noindex and the common web page with out the noindex, that’s not an actual answer.
Why Google Experiences Listed URLs That It Can’t Crawl
One other Redditor supplied a attainable rationalization for why so many URLs appeared in Search Console. They advised that Google probably found hyperlinks containing the add-to-cart parameters someplace on the location and added these URLs to its methods.
My suggestion for the one who initially requested that query is to crawl the web site with Screaming Frog, assessment the inner linking to establish the place these pages are being linked from, after which take some motion, like eradicating these hyperlinks or including a rel=”nofollow” hyperlink attribute to them.
Probably, the most effective answer is to make use of the robots.txt block to forestall crawling, so long as it’s understood that that is all it does. If the individual needs to be additional positive, they’ll additionally establish the place these hyperlinks exist after which add the nofollow hyperlink attribute as an additional layer, a touch to Google. Nofollow just isn’t a directive, however it’s a sturdy trace.
Search Console Warnings Don’t At all times Point out A Search Drawback
One of many recurring challenges with Search Console studies is that they’ll expose technical situations that look distressing however even have little to zero impact on search efficiency. For instance, the 404 error studies are helpful for a wide range of causes, however many instances a 404 server response is the suitable response, and it’s probably not an “error” that wants fixing.
Takeaway
Mueller’s response reinforces the takeaway that not each Search Console warning requires taking motion to repair one thing, though on this particular case there could also be one thing to repair within the type of inside hyperlinks to webpages that use the procuring cart URL parameters. If these hyperlinks with the procuring cart URL parameters are completely mandatory, then utilizing a rel=”nofollow” hyperlink attribute will give Google a powerful trace to not comply with that hyperlink. The enjoyment of technical search engine optimization!
Featured Picture by Shutterstock/Orange Line Media
#Google #Explains #URLs #Blocked #Robots.txt #Listed

