WebToolsHub Logo
WebToolsHubOnline Tool Suite

Sitemap Validator

The fastest free XML sitemap validator and sitemap checker online — paste your sitemap.xml or enter a URL and get a full validation report in seconds. Broken links, duplicate URLs, invalid <lastmod> dates, and out-of-range <priority> values all flagged instantly. I built this because Google Search Console's sitemap errors are maddeningly vague; you deserve a tool that tells you exactly which line is broken and why. Everything runs in your browser — your sitemap content never touches a server.

What Is an XML Sitemap Validator and Sitemap Checker?

An XML sitemap validator — also called a sitemap checker — scans your sitemap.xml file against the sitemaps.org protocol and Google Search Console requirements to make sure search engines can actually read and use it. A broken or malformed sitemap is one of those silent SEO killers — Google won't tell you it failed; it just quietly stops crawling your new pages.

At its core, a sitemap is an XML file that lists every important URL on your site along with optional metadata: when it was last modified (<lastmod>), how often it changes (<changefreq>), and its relative importance (<priority>). When that file has syntax errors, duplicate entries, or dead links pointing to 404 pages, crawlers either skip the sitemap entirely or waste crawl budget on pages that no longer exist.

This free sitemap validator catches all of that — XML structure errors, broken URLs, duplicate entries, invalid date formats, out-of-range priority values, and Google's hard limits (50,000 URLs, 50MB uncompressed) — before you submit to Search Console and wonder why nothing is getting indexed.

How to Use This Free Sitemap Checker

You can validate your sitemap in two ways: paste the raw XML directly, or enter your sitemap URL and let the tool fetch it. Here's the full workflow:

  1. Input your sitemap: Either paste the full XML content into the text area, or enter the URL (e.g. https://yoursite.com/sitemap.xml) and click "Fetch & Validate."
  2. Run validation: Click "Validate Sitemap." The tool parses the XML, checks every URL entry, and runs all rule checks within a couple of seconds.
  3. Review the summary score: You'll see an overall health percentage (e.g. "94% valid") with a breakdown of Passed / Warnings / Errors — color coded green, amber, and red.
  4. Drill into the per-URL table: Every URL gets its own row showing HTTP status, <lastmod> validity, <changefreq> value, and <priority> range check. Click any row for details.
  5. Fix issues and re-validate: Fix the errors your CMS or sitemap generator introduced, paste the corrected XML, and run again until you hit 100%.
  6. Download the report: Export results as CSV or JSON to share with your team or keep as a baseline audit record.

Key Features

  • Full XML Syntax Check: Validates well-formed XML — catches unclosed tags, unescaped characters (&, <, >), wrong namespace declarations, and broken encoding before anything else.
  • URL Count & Google Limit Check: Counts total URLs and immediately flags if you've exceeded Google's hard limits — 50,000 URLs per sitemap file or 50MB uncompressed. Large sites need a sitemap index, not a single bloated file.
  • Broken & Dead Link Detection: Checks each URL for HTTP status codes — 404s, 500s, and redirect chains (301/302) are flagged separately so you know which to remove and which to fix.
  • Duplicate URL Detection: Finds exact-match duplicates and near-duplicates (HTTP vs HTTPS, trailing slash vs no trailing slash) that bloat your sitemap and confuse crawlers.
  • <lastmod> Date Validation: Checks that every <lastmod> value uses W3C datetime format (YYYY-MM-DD or full ISO 8601). Wrong formats like 01/15/2024 are silently ignored by Google — this tool catches them.
  • <changefreq> Validation: Verifies that <changefreq> values are one of the eight valid options: always, hourly, daily, weekly, monthly, yearly, never. Typos here cause silent parse failures.
  • <priority> Range Check: Ensures priority values are between 0.0 and 1.0. Values outside this range (e.g. 1.5 or -0.1) are protocol violations.
  • HTTPS vs HTTP Check: Flags HTTP URLs in your sitemap — Google expects canonical HTTPS URLs, and mixing protocols creates duplicate content issues.
  • URL Length Check: Flags excessively long URLs (over 2,048 characters) that some crawlers truncate or reject.
  • Sitemap Index Support: Handles sitemap index files that point to multiple child sitemaps — validates the index structure and optionally crawls each child sitemap too.
  • Image & Video Sitemap Support: Validates Google's image and video sitemap extensions, checking required tags like <image:loc> and <video:thumbnail_loc>.
  • robots.txt Auto-Detection: If you enter your domain root instead of a sitemap URL, the tool reads your robots.txt to find the declared Sitemap: directive and validates that file automatically.
  • Downloadable Report: Export your full validation results as CSV or JSON. Useful for client audits, team handoffs, or tracking sitemap health over time.
  • Color-Coded Results: Errors (red), Warnings (amber), and Passed checks (green) with an overall health score percentage — easy to scan at a glance.
  • 100% Client-Side Processing: All XML parsing and validation logic runs entirely in your browser using JavaScript. Your sitemap content is never sent to any server, stored, or logged anywhere.

When Should You Use an XML Sitemap Checker?

The obvious time is before submitting to Google Search Console — but there are several other moments in a real workflow where running a quick validation saves hours of debugging later.

After every major site restructure. Moved pages, changed URL patterns, deleted old blog posts — any of these can leave dozens of dead URLs in a sitemap that was generated months ago. A 404 in your sitemap wastes crawl budget and can suppress indexing of surrounding pages in the same file.

When new content isn't getting indexed. You published 10 articles last week and none of them are showing up in Search Console's Coverage report. Nine times out of ten, either the sitemap wasn't updated, or it was updated with a malformed <lastmod> date that Google ignored. Run the validator first before opening a Search Console ticket.

Before a site migration. HTTPS migrations, domain changes, platform switches (WordPress to Next.js, anyone?) — all of these generate a completely new sitemap. Validate it against the old one. You may also want to use our robots.txt generator to update your robots file and declare the new sitemap URL at the same time.

As part of a regular SEO audit. Running a sitemap check monthly takes 30 seconds and catches drift before it compounds. Large e-commerce or content sites where products and posts are added and removed constantly should treat sitemap validation like a recurring task, not a one-time setup.

Understanding XML Sitemap Structure — What Google Actually Reads

Most developers treat sitemaps as a "generate and forget" step in deployment. That's a mistake. Google's documentation is pretty clear: a sitemap that violates the protocol is silently ignored, not partially parsed. Here's what a valid <url> entry actually looks like:

<url>
  <loc>https://example.com/blog/my-post</loc>
  <lastmod>2026-05-20</lastmod>
  <changefreq>monthly</changefreq>
  <priority>0.8</priority>
</url>

The only required element is <loc>. Everything else is optional — but when you include optional fields, they must be valid. Let's break down what this validator checks for each field:

  • <loc> — The URL itself: Must be a fully absolute URL including the scheme (https://). Relative paths like /blog/my-post are invalid and will cause the entry to be skipped. Must also be properly URL-encoded — spaces and special characters need percent-encoding.
  • <lastmod> — Last modified date: Google officially supports W3C datetime format: YYYY-MM-DD (date only) or the full ISO 8601 datetime 2026-05-20T14:30:00+05:00. Common mistakes include American date format (05/20/2026), Unix timestamps, and missing timezone offsets on datetime strings. Google ignores malformed lastmod values entirely — it won't tell you; it'll just stop using them for crawl prioritization.
  • <changefreq> — Change frequency: One of eight valid string literals. This field is largely advisory — Google has said publicly they don't rely on it heavily for crawl scheduling — but typos like bi-weekly or forthnightly are protocol violations that can affect how compliant parsers handle the file.
  • <priority> — Relative importance: A float between 0.0 and 1.0. Default is 0.5. This is relative to other pages on your own site — setting everything to 1.0 is the same as setting nothing, and some SEO tools flag it as a red flag. Use higher priority for cornerstone content and lower for tag/category pages.

One thing that trips up a lot of people: the XML namespace declaration. Your root element must include xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" exactly — no trailing slash, no HTTP vs HTTPS mismatch. Without the correct namespace, some parsers reject the entire file.

Sitemap Index Files — When One Sitemap Isn't Enough

If your site has more than 50,000 pages — or even if it has fewer but you want to organize sitemaps by content type — you need a sitemap index file. This is a separate XML file that lists multiple sitemap files rather than individual URLs.

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap-blog.xml</loc>
    <lastmod>2026-05-20</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-products.xml</loc>
    <lastmod>2026-05-18</lastmod>
  </sitemap>
</sitemapindex>

This tool detects sitemap index files automatically. When you validate an index, it validates the index structure itself (correct root element, valid child <loc> values) and can optionally fetch and validate each child sitemap — so you get a complete picture of your entire sitemap ecosystem in one run.

Common mistakes with sitemap indexes: referencing child sitemaps that return 404s, setting <lastmod> on the index entry to a date older than the actual child file's last-modified header (confuses crawl freshness signals), and exceeding the 50,000-sitemap limit on the index itself (yes, that's also capped at 50,000 entries).

Common Sitemap Mistakes That Kill Your SEO

I've audited dozens of sites where the developer did everything right in their Next.js or WordPress config, but the generated sitemap still had one of these quiet problems lurking:

  • Including noindex pages: If a page has <meta name="robots" content="noindex">, it should not be in your sitemap. Including it sends Google a contradictory signal — you're saying "crawl this" and "don't index this" simultaneously. Google will usually obey the noindex directive, but it wastes crawl budget getting there.
  • Including redirect URLs: Your sitemap should only contain canonical, final-destination URLs. If you have a 301 redirect from an old URL to a new one, only the new URL belongs in the sitemap. Redirects in sitemaps burn crawl budget and dilute signals.
  • HTTP URLs in an HTTPS sitemap: Mixed protocol URLs (some http://, some https://) create implicit duplicate content issues. Every URL should match your canonical protocol — almost certainly https:// in 2026.
  • Stale sitemap not updated after deletions: You deleted 50 old blog posts but your static sitemap generator still references them. Now your sitemap has 50 dead URLs, and Google is wasting crawl budget checking them every recrawl cycle. Run this validator after any content deletion sprint.
  • Setting all priorities to 1.0: This is the sitemap equivalent of bolding every word in a document. If everything is highest priority, nothing is. Use a gradient: homepage at 1.0, top-level pages at 0.9, blog posts at 0.7–0.8, tag pages at 0.5 or below.
  • Wrong <lastmod> format: This is the most common mistake I see. Developers set lastmod in their database timestamp format — 2026-05-20 14:30:00 (no T, no timezone) — and Google silently drops the field. Use YYYY-MM-DD at minimum, or full ISO 8601 with timezone offset if you want precision.
  • Forgetting to declare the sitemap in robots.txt: Google can find your sitemap through Search Console submission, but declaring it in robots.txt with Sitemap: https://yoursite.com/sitemap.xml ensures all crawlers (not just Google) discover it automatically. Our robots.txt & llms.txt generator handles this correctly — it auto-includes the sitemap declaration.

Sitemap Checker vs Google Search Console — What's the Difference?

Sound familiar? You submit your sitemap to Search Console, wait 48 hours, and get back a vague "Sitemap could not be read" error with no line number, no context, and no hint of what's wrong. Search Console's built-in sitemap checker is a black box — this free sitemap checker online is not.

This validator is the opposite of that. It tells you exactly which URL has a 404, exactly which <lastmod> value has the wrong format, and exactly how many duplicates you have — line by line, with color-coded severity. Use this tool to get to 100% first, then submit to Search Console with confidence. Think of Search Console as the gate; think of this tool as the prep work you do before approaching the gate.

The other thing Search Console won't tell you: whether your sitemap is being used efficiently. You might have a "valid" sitemap per Google's minimal spec, but still have 40% of your URLs returning redirects or 15% with missing lastmod — both of which hurt crawl efficiency without triggering a hard "error" status. This tool surfaces those soft problems too.

There's also a timing difference worth understanding. Search Console's sitemap validator only runs when you explicitly trigger it — either on submission or manual recheck. Your sitemap can degrade silently between those checks: pages get deleted, URLs get renamed, HTTPS migration introduces mixed protocol entries. A monthly run through this validator catches that drift continuously, not just at submission time.

How Sitemaps Affect Crawl Budget — The Part Most Guides Skip

Crawl budget is the number of pages Googlebot is willing to fetch from your site within a given timeframe. It's determined by two factors: crawl rate limit (how fast Google crawls without overloading your server) and crawl demand (how popular and fresh your content appears to be). Your sitemap directly affects the second factor.

Here's the part that trips people up: Googlebot doesn't just use your sitemap to discover URLs — it uses <lastmod> dates to decide which pages are worth recrawling. If your lastmod dates are accurate and recent, Google crawls those pages more frequently. If your lastmod is missing, wrong format, or frozen at the same old date for everything, Google treats your site as largely static and reduces recrawl frequency. That new blog post you published this morning? It might not get indexed for 3–4 days instead of a few hours — all because of a bad lastmod date.

The second crawl budget killer is dead URLs in your sitemap. Every 404 in your sitemap consumes crawl budget without producing an indexed page. On a large site with hundreds of dead URLs in an outdated sitemap, Googlebot can waste 20–30% of your crawl allocation checking pages that no longer exist — leaving less budget for your new, valuable content. This is why the broken link detection in this validator matters beyond just "fixing errors." It's about protecting the budget Google allocates to crawling your live content.

Finally, consider crawl demand amplification. When your sitemap is clean — accurate lastmod, no dead links, no redirects, no duplicates — Google sees a site that is well-maintained and freshly updated. That signal compounds: Googlebot starts visiting more frequently, which means new content gets indexed faster, which means it appears in search results sooner. A validated sitemap isn't just about "not having errors." It's an active SEO signal that tells Google your site is worth prioritizing.

For Next.js developers specifically: if you're generating sitemaps with the App Router's built-in sitemap.ts route handler, make sure your lastModified field is actually reading from your database or CMS — not hardcoded to the build date. A sitemap where every URL has the same <lastmod> is worse than having no lastmod at all, because it actively misleads Googlebot about which pages have changed.

Image and Video Sitemaps — The SEO Edge Most Sites Miss

Standard sitemaps only list page URLs. But if your site relies heavily on images or videos for traffic — photography portfolios, recipe blogs, tutorial sites, e-commerce product pages — you should be using Google's sitemap extensions for rich media. These extensions add extra metadata inside your <url> entries that helps Google index images and videos for Google Images and Google Video search.

An image sitemap extension looks like this inside a standard <url> block:

<url>
  <loc>https://example.com/blog/my-recipe</loc>
  <image:image>
    <image:loc>https://example.com/images/recipe-hero.webp</image:loc>
    <image:title>Chocolate Lava Cake Recipe</image:title>
    <image:caption>A rich chocolate lava cake served warm</image:caption>
  </image:image>
</url>

The validator checks image sitemaps for the required <image:loc> tag, validates that image URLs are absolute and reachable, and flags cases where the image namespace declaration is missing from the root element. Missing the namespace declaration (xmlns:image="http://www.google.com/schemas/sitemap-image/1.1") means Google's parser silently ignores all image tags — a very common mistake when manually editing sitemaps.

If your site uses Next.js and you're serving WebP images, pairing a clean image sitemap with properly converted assets gives Google Images the signals it needs. Our image to WebP converter makes that conversion fast — then validate the resulting sitemap entries here to make sure the image URLs are accessible and correctly declared.

Related Tools You Might Find Useful

If you're doing a full technical SEO audit, the sitemap is usually the starting point. Once you've validated it, you'll likely want to check that your robots.txt and llms.txt are configured correctly and that the sitemap URL is properly declared there.

For content-heavy sites, make sure your page metadata is solid too — our word and character counter is useful when you want to check that meta descriptions across your pages are within Google's ~155-character display limit before they end up in the sitemap. If you're generating dynamic sitemaps in Next.js 14+, the blog post on fixing "Discovered — Currently Not Indexed" in Next.js 14 covers the exact sitemap + metadata combination that Google needs to start crawling your pages.

Why Use WebToolsHub?

All tools on WebToolsHub are completely free — no account required, no signup, no usage limits. Everything runs client-side in your browser using JavaScript, which means your sitemap data never leaves your machine and is never transmitted to, stored in, or logged by any server. No ads interrupting your workflow, no paywalls after three uses, and no "upgrade to Pro" popups. Just a fast, focused tool that does exactly what it says.

Frequently Asked Questions

Is the Sitemap Validator free to use?

Yes, completely free — no account, no signup, no usage limits, and no hidden paywalls. You can validate as many sitemaps as you want. All processing runs in your browser using client-side JavaScript.

Does this tool store or upload my sitemap data to a server?

No. All XML parsing and validation logic runs entirely in your browser. Your sitemap content is never transmitted to any server, stored in a database, or logged anywhere. This is especially important if your sitemap contains internal URL structures you'd rather not expose — everything stays on your device.

What is an XML sitemap and why does it need to be validated?

An XML sitemap is a file (usually at /sitemap.xml) that lists all the important URLs on your website, along with optional metadata like last-modified dates and change frequency. Search engines like Google use it as a crawl roadmap — especially for pages that aren't well-linked internally. Validation matters because a malformed sitemap (wrong XML syntax, broken URLs, invalid date formats) is silently ignored by Google. It won't throw an error; it just stops using your sitemap, which means new content may not get indexed for days or weeks longer than necessary.

What is the correct <lastmod> date format in an XML sitemap?

Google requires W3C datetime format. The simplest valid format is YYYY-MM-DD (e.g. 2026-05-20). You can also use full ISO 8601 with timezone: 2026-05-20T14:30:00+05:00. Common mistakes that cause Google to silently ignore the value: American date format (05/20/2026), datetime without timezone (2026-05-20 14:30:00 — note the space instead of T), and Unix timestamps. This validator catches all of these and flags the exact entries with wrong formats.

What is a sitemap index file and does this tool support it?

A sitemap index is an XML file that lists multiple child sitemaps instead of individual URLs — used when a single sitemap would exceed 50,000 URLs or 50MB. The root element is <sitemapindex> instead of <urlset>. Yes, this validator supports sitemap index files: it validates the index structure itself and can optionally fetch and validate each child sitemap it references. This gives you a complete validation of your entire sitemap ecosystem in a single run.

What is the difference between a sitemap warning and a sitemap error?

Errors are protocol violations that will likely cause Google to reject or skip the affected entries — things like malformed XML, absolute-path-missing URLs (relative instead of absolute), HTTP status 404s, or <priority> values outside the 0.0–1.0 range. Warnings are non-blocking issues that reduce sitemap effectiveness without causing hard failures — examples include missing <lastmod> tags, HTTP URLs in an HTTPS site, or setting all priorities to 1.0. Fix errors first, then work through warnings for maximum crawl efficiency.

Why are there duplicate URLs in my sitemap?

Duplicate URLs in sitemaps usually come from CMS or static site generator misconfigurations. The most common causes: HTTP and HTTPS versions of the same URL both appearing, trailing-slash vs no-trailing-slash variants (e.g. /blog/post and /blog/post/), pagination pages being included multiple times, and sitemap files from different generators being merged without deduplication. This validator detects exact duplicates and near-duplicates (protocol and trailing-slash variants) and flags them for removal.

Can I validate a sitemap that's behind authentication or on localhost?

For URL-based fetching, no — the tool can only fetch publicly accessible URLs (the same URLs Google would be able to reach). However, you can always copy-paste your raw XML directly into the text input, which works for local sitemaps, staging environments, and password-protected sites. Just generate the sitemap locally, copy the XML, paste it in, and validate.

Is there a difference between a sitemap validator and a sitemap checker?

They're the same thing — different people just search for different terms. A sitemap validator typically emphasizes protocol compliance (checking XML syntax, namespace declarations, required fields), while a sitemap checker more broadly refers to any tool that inspects sitemap health including broken links, redirect URLs, and missing pages. This tool does both: it validates your XML sitemap against the sitemaps.org protocol AND checks every URL for real-world reachability issues.

What browsers does the Sitemap Validator support?

All modern browsers: Chrome, Firefox, Safari, and Edge. No plugins, extensions, or installations required. The tool uses standard browser APIs — no Node.js, no backend calls. If you're on a very old browser (pre-2020), some XML parsing features may not work correctly, but any browser updated in the last 3 years will work fine.