Why Are My Pages Discovered But Not Indexed?

Today’s Ask An SEO question comes from Mandeep, who is having trouble with indexing on their site.

Mandeep asks:

“We have redesigned a website and we had added a few new pages. Some pages were indexed successfully and some were not.

I tried multiple times on Google but that is not working. Now, while I submit the URL to index, it is showing this error via Google Search Console: Discovered – currently not indexed […]

I have tried everything but nothing is working. Please help me resolve this issue.”

This warning is coming from the “Pages” section of the “Indexing” report in Google Search Console. This report gives users insight into what pages Google has crawled and indexed and the problems it may have encountered doing so.

The report will give details of pages that have been successfully crawled and indexed. It also lists reasons why the pages on the site have not been indexed.

Is It A Problem If A Page Isn’t Indexed?

Most sites have pages that are not indexed. These are oftentimes at the request of the website owner.

For example, a page might be deliberately excluded from the search engine indexes by way of an HTML “noindex” tag on the page, or perhaps it is being blocked from crawling in the robots.txt file.

URLs that have been purposely excluded from indexing will appear within this report, as well as pages with problematic indexing issues.

In general, it can take some time for a new page on a website to be crawled and indexed. A new page taking time to show up among the “indexed” pages on the report is not always a sign of an issue.

Not every reason within the “Why pages aren’t indexed” report needs to be addressed.

Indexing Issues

Google will not crawl and index every URL it finds. Your main concern as a website manager is that the pages that you wish to be available as a search result are indexed.

Essentially, if they are not indexed, they will not be eligible to be a search result.

There are several reasons within the “Why pages aren’t indexed” report that do suggest an issue on the site that should be investigated. For example, “Server error (500)” and “Soft 404.”

These flags may not necessarily be a problem for the individual URLs if they aren’t ones you want to have indexed, but they can indicate a wider issue with the site.

What Is “Discovered – Currently Not Indexed”?

“Discovered – currently not indexed” is an error that Google flags for URLs that it knows about but has not indexed.

What is important to remember is that URLs will not appear in this bucket if they can fit within another in the report.

For example, a page with a noindex tag may technically have been discovered by Google and not indexed, but it would appear in the “Excluded by ‘noindex’ tag” bucket, so pages within the “Discovered – currently not indexed” bucket are there for another reason.

The explanation Google gives for a URL appearing as “Discovered – currently not indexed” is:

“The page was found by Google, but not crawled yet. Typically, Google wanted to crawl the URL but this was expected to overload the site; therefore Google rescheduled the crawl. This is why the last crawl date is empty on the report.”

Google tries to make its bots crawl conscientiously.

That is, as Googlebot is not the only visitor to a site, and maybe one among many bots crawling it, it doesn’t want to crash the site by sending too many “requests” to the server.

What Might Be Causing A URL To Be “Discovered – Currently Not Indexed”?

There are two main reasons a page is known to Google but not indexed. John Muller gave details about these in 2023.

Essentially, alongside the concerns around the server’s capacity to withstand crawling, page quality is also considered.

Now, if a page has not been crawled, how can Google know its quality? Well, it can’t. What it can do is make assumptions based on the quality of the pages elsewhere on the site.

That’s right – thin, duplicate, low-value pages elsewhere on your website can affect the indexation of your core pages.

How To Fix The Issue

There is no quick fix to move a page from “Discovered – currently not indexed” to “Indexed,” but there are several solutions you can try.

Check If The Page Is Actually Indexed

The first port of call is to determine if the Google Search Console report is accurate and up to date.

In the top right-hand corner of the report, you will see the “Last updated” date. This gives you an idea of whether the report might be outdated.

Next, go to Google and perform a site:[yourwebsitedomain] inurl:[the URL slug of the page you want to index] search.

If the page is returned as a search result, then you know it is actually indexed.

Give the report some time to get updated, and it will start appearing under the “Indexed” section and not in the “Discovered – currently not indexed” report.

Check Your Site’s Page Quality

Next, you may want to consider the overall quality of your website, as this could be the reason why Google is not indexing your page.

Remember, quality is not just a measure of the words on your site, their relevance to search queries, and the overall “E-E-A-T” displayed. Instead, Google’s John Muller described it as:

“When it comes to the quality of the content, we don’t mean like just the text of your articles.

It’s really the quality of your overall website.

And that includes everything from the layout to the design.

Like, how you have things presented on your pages, how you integrate images, how you work with speed, all of those factors they kind of come into play there.”

So, review your website with these criteria in mind. How does the quality of your website compare to that of your competitors?

A thorough website audit is a good place to start.

Check For Duplicate Pages

Sometimes, a website might have low-quality or duplicate pages that the website manager has no knowledge of.

For example, a page might be reached via multiple URLs. You might have a “Contact Us” page that exists on both exampledomain.com/contact-us and exampledomain.com/contact-us/.

The URL with and the URL without the “trailing slash” are considered separate pages by Googlebot if it can reach them both, and the server returns a 200 status code. That is, they are both live pages.

There is a possibility that all of your pages may be duplicated in this same way.

You might also have a lot of URL parameters on your website that you are unaware of. These are URLs that contain “query strings,” such as exampledomain.com/dress?colour=red.

They are usually caused by filtering and sorting options on your website. In an ecommerce website, this might look like a product category page that is filtered down by criteria such as color, and able to be sorted by price.

As a result, the main features of the page do not change with this filtering and sorting, just the products listed. These are technically separate, crawlable pages and may be causing a lot of duplicates on your site.

You may think your website only has 100 high-quality pages on it. However, a Googlebot may see hundreds of thousands of near-duplicate pages as a result of these technical issues.

Ways To Fix “Discovered – Currently Not Indexed”

Once you have identified the likely causes of your URL not being indexed, you can attempt to fix it.

If your website has duplicate pages, low-quality, scraped content, or other quality issues, that is where to begin.

As a side benefit, you are likely to see your rankings improve across your pages as you work to fix these issues.

Signify The Page’s Importance

In the example of our opening question, there is a specific page that Mandeep is struggling to get indexed.

In this scenario, I would suggest trying to bolster the page’s importance in the eyes of the search engines. Give them a reason to crawl it.

Add The Page To The Website’s XML Sitemap

One way of showing Google that it is an important page that deserves to be crawled and indexed is by adding it to your website’s XML sitemap.

This is essentially a signpost to all of the URLs that you believe search bots should crawl.

Remember, Googlebot already knows that the page exists; it just doesn’t believe it is beneficial to crawl and index it.

If it is already in the XML sitemap, do not stop there. Consider these next steps.

Add Internal Links To The Page

Another way to show a page’s importance is by linking to it from internal pages on the site.

For example, adding the page to your primary navigation system, like the main menu.

Or add contextual links to it from within the copy on other pages on your website. These will signify to Googlebot that it is a significant page on your website.

Add External Links To The Page

Backlinks – they are a fundamental part of SEO. We’ve known for a while that Google will use links from other websites to determine a page’s relevance and authority to a subject.

If you struggle to show Google that your page is of enough quality to index, then having external links from reputable, relevant websites pointing to it can give additional reassurance of the page’s value.

For example, if the page you are struggling to get indexed is a specific red dress’s product detail page, then having that dress’s page featured in some fashion blogs may give Google the signal that it is a high-quality page.

Submit It To Be Crawled

Once you have made changes to your website, try resubmitting the page to be crawled via Google Search Console.

If you notice in the Google Search Console “Indexing” report that the URL is still within the “Discovered – currently not crawled” bucket after some time (it can take anywhere from a few days to a few weeks for Google to crawl a submitted page), then you know that you potentially still have some issues with the page.

In Summary

Optimize your website for crawling and indexing. If you do this, you are likely to see those pages move from “Discovered – currently not indexed” to “Indexed.”

Optimizing your particular website will require an in-depth analysis of the overall quality of the site and identifying how to convey the importance of the “Discovered – currently not indexed” pages to Googlebot.

More resources:

Featured Image: Paulo Bobita/Search Engine Journal

Source link