There is a persistent misunderstanding about sitemaps: many people create one, submit it to Search Console, and assume Google will index everything. The reality is more nuanced. Google treats the sitemap as a hint, not an order. And there are specific conditions under which it ignores the sitemap entirely.
The XML sitemap is the map you hand Googlebot when it arrives at your site. If the map is well drawn, includes the correct routes and marks the most important destinations, Googlebot can do its job efficiently. If the map has errors, lists dead ends or points to places that no longer exist, Googlebot discards it and finds its own way — which may or may not lead to what matters most to you.
This guide covers what you actually need to know: sitemap types, which fields Google reads and which it ignores, how multimedia sitemaps work, how to generate them on the most common platforms, and how to submit and monitor them in Google Search Console.
Sitemap types: which one does your site need
There is no single type of sitemap. The standard sitemaps.org protocol, adopted by Google, Bing and Yahoo in 2006, defines the base format, and Google has developed extensions for specific content types.
The standard XML sitemap lists web page URLs with optional metadata. It is the most common type and the starting point for any site. If you have a blog, a corporate website or an ecommerce without specialised multimedia content, this is the only one you need.
Image sitemaps allow Google to discover images that ordinary crawling might miss, especially those loaded via JavaScript. They do not replace the main sitemap: they are added as an extension within the same file or in a separate file. Each URL can include up to 1,000 image references.
Video sitemaps work similarly but for audiovisual content. They include specific fields such as duration (in seconds, between 1 and 28,800), publication date, rating and geographic restrictions. They are particularly important if you want your videos to appear in Google Videos.
News sitemaps are for Google News publications. They have an important time restriction: they should only contain articles published in the last two days. Past that window, articles must be removed from the news sitemap even if they remain on the site. The limit is 1,000 URLs per file — considerably lower than the standard.
Which do you need? The answer depends on your content. A standard blog or corporate website: standard XML sitemap. A site with photographs or image galleries: image sitemap. A video channel or online course: video sitemap. A news publisher: news sitemap in addition to the standard one.
Technical structure of an XML sitemap: the fields Google reads and the ones it ignores
The sitemaps.org protocol defines four possible fields within each <url> entry. Knowing which ones Google uses and which it discards saves time and avoids confusion.
<loc> is the only mandatory field. It must contain the full URL, including protocol, domain and trailing slash if your server requires it. The URL cannot exceed 2,048 characters. All special characters (ampersands, quotation marks, angle brackets) must be escaped with XML entities.
<lastmod> is the only optional field Google uses, and only conditionally. If the date you provide genuinely reflects when the page content changed, Google uses it to prioritise crawling of updated pages. If the date is inaccurate, always the same or generated automatically without real changes, Google stops trusting that field for your site. Accuracy matters more than presence.
<changefreq> and <priority> are fields Google explicitly ignores. The official documentation confirms this without ambiguity. Whether you include them or not in your sitemap changes nothing from Google’s perspective. Some generators include them for compatibility with other search engines or out of historical inertia, but for Google crawling they have no effect.
A valid, minimal XML sitemap has this structure:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>https://example.com/page/</loc>
<lastmod>2026-03-15</lastmod>
</url>
<url>
<loc>https://example.com/another-page/</loc>
<lastmod>2026-04-01</lastmod>
</url>
</urlset>
For sites with more than 50,000 URLs or files exceeding 50MB uncompressed, a sitemap index is required. This file acts as an index referencing multiple individual sitemap files:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-blog.xml</loc>
<lastmod>2026-04-01</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-products.xml</loc>
<lastmod>2026-04-01</lastmod>
</sitemap>
</sitemapindex>
A sitemap index can reference up to 50,000 individual sitemaps. If you have an ecommerce with millions of products, you can distribute them across multiple files that the index references.
Errors that cause Google to ignore your sitemap
Gary Illyes, Trends Analyst at Google, has been direct in Search Off the Record: sitemaps with too many error URLs or inaccurate lastmod dates end up being ignored, because Google learns to distrust the signal. Accurate is more useful than comprehensive — and this principle shapes every decision about what to include.
The most common errors that degrade sitemap usefulness are concrete and avoidable.
Including URLs that return 3xx or 4xx is the most frequent error. If a URL in your sitemap responds with 301, 302 or 404, Google spends resources on a visit that does not end in indexable content. Every technical audit run with Screaming Frog on sites with indexation problems shows this pattern: between 5% and 30% of sitemap URLs point to redirected or deleted pages. The sitemap should include only URLs returning a 200 response.
Including pages with noindex is another common mistake. If a page has the noindex directive in the HTML, including it in the sitemap sends contradictory signals: the sitemap says “index this” and the HTML says “do not index this”. Google usually respects the noindex, but the contradiction wastes crawl budget unnecessarily.
Static or inaccurate <lastmod> dates. Some sitemap generators set the sitemap generation date as the lastmod for all URLs, even if the pages have not changed. Google detects this pattern and stops using the lastmod field from that domain as a freshness signal.
Duplicate content URLs or non-canonical page variants. If you have /en/ and /es/ versions of the same content, the sitemap should include only the canonical URL for each version, not all variants. Similarly, pages with URL parameters that are variants of a main page should be excluded if they are not the canonicals.
Sitemaps never updated. A sitemap that does not change for months, even though the site does change, loses relevance. Google reduces the frequency with which it consults it because it learns it provides no new information.
Sitemaps for different platforms: Astro, WordPress, Next.js
Sitemap generation varies by platform. Knowing how each one works avoids unnecessary manual configuration and implementation errors.
In WordPress, the Yoast SEO and RankMath plugins generate sitemaps automatically and keep them updated. Yoast generates a sitemap index at /sitemap_index.xml that divides content by type (posts, pages, categories, authors). RankMath does something similar at /sitemap.xml. Both automatically exclude pages with noindex and allow configuration of which content types appear. The most frequent problem in WordPress is including author pages or tag archive pages that have little unique content; it is worth reviewing what types the plugin includes and disabling those that provide no SEO value.
In Astro, the official @astrojs/sitemap integration generates the sitemap automatically during the build. It is configured in astro.config.mjs with options to filter URLs, customise lastmod and split into multiple files for large sites. For static sites with output: "static", the sitemap is generated during the build process and published with the rest of the site. For image sitemaps or hreflang for multilingual sites, configuration requires passing the corresponding options to the integration.
In Next.js, the next-sitemap library is most commonly used to generate post-build sitemaps. It is configured with a next-sitemap.config.js file that controls which routes to include, which to exclude, and whether to generate separate sitemaps per section. Next.js 13+ with App Router has native sitemap support via the sitemap.ts file in the app/ directory, which exports a function returning the URL array.
For static sites without a framework or with generators like Hugo or Eleventy, the sitemap must be generated as part of the build process or maintained manually. The manual option is only viable for sites with fewer than 100 URLs that rarely change.
Image and video sitemaps: indexation of multimedia content
Multimedia sitemaps deserve specific attention because they have a disproportionate impact on the visibility of certain content types.
The image sitemap is especially useful for product photographs, galleries, portfolios and any image loaded with JavaScript or aggressive lazy loading. Googlebot can have difficulty discovering images that only load after user interactions or that are inside client-side rendered components. The image sitemap solves this problem directly.
The structure of an image extension within the standard sitemap is:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url>
<loc>https://example.com/product/blue-shirt/</loc>
<image:image>
<image:loc>https://example.com/images/blue-shirt-front.jpg</image:loc>
</image:image>
<image:image>
<image:loc>https://example.com/images/blue-shirt-detail.jpg</image:loc>
</image:image>
</url>
</urlset>
The fields <image:caption>, <image:title>, <image:geo_location> and <image:license> have been removed from Google’s official specification. Only <image:loc> is required.
For the video sitemap, the mandatory fields are thumbnail, title, description and at least one of the two content locations (the video file or the player URL). Optional fields like duration and publication date improve how Google displays the result in video search results.
One point that is often overlooked: the URLs in image and video sitemaps do not need to be on the same domain as the main site. If your images are on a different CDN, they can appear in the sitemap. The condition is that both domains are verified in Search Console.
How to submit and monitor your sitemap in Google Search Console
The most direct way to submit a sitemap to Google is through Search Console. The process is straightforward, but there are details worth knowing.
In Search Console, go to Indexing > Sitemaps in the left-hand menu. Enter the sitemap URL — for example https://yourdomain.com/sitemap.xml — and click Submit. Search Console will show the sitemap status: whether it has been processed correctly, how many URLs it detected and how many it has indexed.
The simplest alternative for Google to find the sitemap without manual submission is to declare it in robots.txt:
User-agent: *
Disallow: /admin/
Sitemap: https://yourdomain.com/sitemap.xml
This line in robots.txt ensures that any crawler respecting the standard will find the sitemap automatically. It does not replace submitting via Search Console if you want to monitor the status, but it is a good complementary practice.
Once submitted, Search Console shows the sitemaps report with the number of submitted and discovered URLs. There is one figure that confuses many people: the difference between “submitted URLs” and “indexed URLs”. Google may discover more URLs than are in the sitemap (because it finds them by other means) or may index fewer than submitted (because it decides some do not merit indexation). Neither situation is necessarily an error.
According to John Mueller, Search Advocate at Google, the difference between the number of URLs in the sitemap and the number of indexed URLs is completely normal. Mueller has repeated across multiple Google Search Central sessions that Google does not index everything in the sitemap, and not everything in the sitemap deserves to be indexed. The metric that matters is not a 1:1 ratio, but whether the pages you actually want in the index are indexed.
When a sitemap is not enough: limits and complements
A sitemap is a discovery tool, not a guarantee of indexation. There are situations where the sitemap is necessary but not sufficient.
If your content has little PageRank or few external links, Google may discover it via the sitemap but decide not to index it due to low perceived relevance. The sitemap does not substitute for solid internal linking or authority building.
If your server returns intermittent errors (occasional 5xx), Google may attempt to crawl the sitemap URLs and find errors, which reduces confidence in the site. A correct sitemap with an unstable server does not solve the indexation problem.
If you have low-quality pages in the sitemap, the overall signal of the sitemap degrades. Google has indicated that sitemap quality matters: a sitemap full of thin or low-value pages harms the site’s perceived quality.
The sitemap works best in combination with a well-optimised crawl budget, a correct robots.txt and an internal linking structure that distributes authority towards priority pages. It also complements Google Search Console well, which lets you see exactly what Google has crawled and indexed and detect discrepancies between what you submit and what it actually indexes.
Sitemaps are especially valuable for new sites with few backlinks, sites with orphan pages (no internal links pointing to them), multimedia content that Googlebot may have difficulty discovering, and sites that publish content frequently where freshness matters.
If you want to verify that your sitemap is well-built before submitting it, tools like Screaming Frog allow you to crawl the sitemap directly and validate that all URLs return 200, that there are no contradictions with canonicals, and that the XML is syntactically valid.
Review your site’s sitemap this week: download the file, filter URLs with a response other than 200, remove pages with noindex, and verify that <lastmod> reflects real changes. Four concrete actions that improve the quality of the signal you send to Google.
Share this article
If you found this content useful, share it with your colleagues.
Frequently Asked Questions
¿Con qué frecuencia publican contenido nuevo?
Publicamos artículos nuevos semanalmente, enfocados en las últimas tendencias de SEO técnico, casos de estudio reales y mejores prácticas. Suscríbete a nuestro newsletter para no perderte ninguna actualización.
¿Los consejos son aplicables a cualquier tipo de sitio web?
Nuestros consejos se adaptan a diferentes tipos de sitios: ecommerce, blogs, sitios corporativos y aplicaciones web. Siempre indicamos cuándo una técnica es específica para cierto tipo de sitio o requerimientos técnicos.
¿Puedo implementar estas técnicas yo mismo?
Muchas técnicas básicas puedes implementarlas tú mismo siguiendo nuestras guías paso a paso. Para optimizaciones avanzadas o auditorías completas, recomendamos consultar con especialistas en SEO técnico como nuestro equipo.
¿Ofrecen servicios de consultoría personalizada?
Sí, ofrecemos servicios de consultoría SEO técnica personalizada, auditorías completas y optimización integral. Contáctanos para discutir las necesidades específicas de tu proyecto y cómo podemos ayudarte.