What Is an XML Sitemap Validator and Sitemap Checker?
An XML sitemap validator — also called a sitemap checker — scans your sitemap.xml file against the
sitemaps.org protocol
and Google Search Console requirements to make sure search engines can actually read and use it.
A broken or malformed sitemap is one of those silent SEO killers — Google won't tell you it failed;
it just quietly stops crawling your new pages.
At its core, a sitemap is an XML file that lists every important URL on your site along with
optional metadata: when it was last modified (<lastmod>), how often it changes
(<changefreq>), and its relative importance (<priority>).
When that file has syntax errors, duplicate entries, or dead links pointing to 404 pages, crawlers
either skip the sitemap entirely or waste crawl budget on pages that no longer exist.
This free sitemap validator catches all of that — XML structure errors, broken URLs, duplicate entries, invalid date formats, out-of-range priority values, and Google's hard limits (50,000 URLs, 50MB uncompressed) — before you submit to Search Console and wonder why nothing is getting indexed.
How to Use This Free Sitemap Checker
You can validate your sitemap in two ways: paste the raw XML directly, or enter your sitemap URL and let the tool fetch it. Here's the full workflow:
-
Input your sitemap: Either paste the full XML content into the text area,
or enter the URL (e.g.
https://yoursite.com/sitemap.xml) and click "Fetch & Validate." - Run validation: Click "Validate Sitemap." The tool parses the XML, checks every URL entry, and runs all rule checks within a couple of seconds.
- Review the summary score: You'll see an overall health percentage (e.g. "94% valid") with a breakdown of Passed / Warnings / Errors — color coded green, amber, and red.
-
Drill into the per-URL table: Every URL gets its own row showing HTTP status,
<lastmod>validity,<changefreq>value, and<priority>range check. Click any row for details. - Fix issues and re-validate: Fix the errors your CMS or sitemap generator introduced, paste the corrected XML, and run again until you hit 100%.
- Download the report: Export results as CSV or JSON to share with your team or keep as a baseline audit record.
Key Features
-
Full XML Syntax Check: Validates well-formed XML — catches unclosed tags,
unescaped characters (
&,<,>), wrong namespace declarations, and broken encoding before anything else. - URL Count & Google Limit Check: Counts total URLs and immediately flags if you've exceeded Google's hard limits — 50,000 URLs per sitemap file or 50MB uncompressed. Large sites need a sitemap index, not a single bloated file.
- Broken & Dead Link Detection: Checks each URL for HTTP status codes — 404s, 500s, and redirect chains (301/302) are flagged separately so you know which to remove and which to fix.
- Duplicate URL Detection: Finds exact-match duplicates and near-duplicates (HTTP vs HTTPS, trailing slash vs no trailing slash) that bloat your sitemap and confuse crawlers.
-
<lastmod> Date Validation: Checks that every
<lastmod>value uses W3C datetime format (YYYY-MM-DDor full ISO 8601). Wrong formats like01/15/2024are silently ignored by Google — this tool catches them. -
<changefreq> Validation: Verifies that
<changefreq>values are one of the eight valid options:always,hourly,daily,weekly,monthly,yearly,never. Typos here cause silent parse failures. -
<priority> Range Check: Ensures priority values are between
0.0and1.0. Values outside this range (e.g.1.5or-0.1) are protocol violations. - HTTPS vs HTTP Check: Flags HTTP URLs in your sitemap — Google expects canonical HTTPS URLs, and mixing protocols creates duplicate content issues.
- URL Length Check: Flags excessively long URLs (over 2,048 characters) that some crawlers truncate or reject.
- Sitemap Index Support: Handles sitemap index files that point to multiple child sitemaps — validates the index structure and optionally crawls each child sitemap too.
-
Image & Video Sitemap Support: Validates Google's image and video sitemap
extensions, checking required tags like
<image:loc>and<video:thumbnail_loc>. -
robots.txt Auto-Detection: If you enter your domain root instead of a sitemap URL,
the tool reads your
robots.txtto find the declaredSitemap:directive and validates that file automatically. - Downloadable Report: Export your full validation results as CSV or JSON. Useful for client audits, team handoffs, or tracking sitemap health over time.
- Color-Coded Results: Errors (red), Warnings (amber), and Passed checks (green) with an overall health score percentage — easy to scan at a glance.
- 100% Client-Side Processing: All XML parsing and validation logic runs entirely in your browser using JavaScript. Your sitemap content is never sent to any server, stored, or logged anywhere.
When Should You Use an XML Sitemap Checker?
The obvious time is before submitting to Google Search Console — but there are several other moments in a real workflow where running a quick validation saves hours of debugging later.
After every major site restructure. Moved pages, changed URL patterns, deleted old blog posts — any of these can leave dozens of dead URLs in a sitemap that was generated months ago. A 404 in your sitemap wastes crawl budget and can suppress indexing of surrounding pages in the same file.
When new content isn't getting indexed. You published 10 articles last week and
none of them are showing up in Search Console's Coverage report. Nine times out of ten, either
the sitemap wasn't updated, or it was updated with a malformed <lastmod> date
that Google ignored. Run the validator first before opening a Search Console ticket.
Before a site migration. HTTPS migrations, domain changes, platform switches (WordPress to Next.js, anyone?) — all of these generate a completely new sitemap. Validate it against the old one. You may also want to use our robots.txt generator to update your robots file and declare the new sitemap URL at the same time.
As part of a regular SEO audit. Running a sitemap check monthly takes 30 seconds and catches drift before it compounds. Large e-commerce or content sites where products and posts are added and removed constantly should treat sitemap validation like a recurring task, not a one-time setup.
Understanding XML Sitemap Structure — What Google Actually Reads
Most developers treat sitemaps as a "generate and forget" step in deployment. That's a mistake.
Google's documentation is pretty clear: a sitemap that violates the protocol is silently ignored,
not partially parsed. Here's what a valid <url> entry actually looks like:
<url>
<loc>https://example.com/blog/my-post</loc>
<lastmod>2026-05-20</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
The only required element is <loc>. Everything else is optional — but when
you include optional fields, they must be valid. Let's break down what this validator checks
for each field:
-
<loc> — The URL itself: Must be a fully absolute URL including the scheme
(
https://). Relative paths like/blog/my-postare invalid and will cause the entry to be skipped. Must also be properly URL-encoded — spaces and special characters need percent-encoding. -
<lastmod> — Last modified date: Google officially supports W3C datetime
format:
YYYY-MM-DD(date only) or the full ISO 8601 datetime2026-05-20T14:30:00+05:00. Common mistakes include American date format (05/20/2026), Unix timestamps, and missing timezone offsets on datetime strings. Google ignores malformed lastmod values entirely — it won't tell you; it'll just stop using them for crawl prioritization. -
<changefreq> — Change frequency: One of eight valid string literals.
This field is largely advisory — Google has said publicly they don't rely on it heavily for
crawl scheduling — but typos like
bi-weeklyorforthnightlyare protocol violations that can affect how compliant parsers handle the file. -
<priority> — Relative importance: A float between
0.0and1.0. Default is0.5. This is relative to other pages on your own site — setting everything to1.0is the same as setting nothing, and some SEO tools flag it as a red flag. Use higher priority for cornerstone content and lower for tag/category pages.
One thing that trips up a lot of people: the XML namespace declaration. Your root element must
include xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" exactly — no trailing
slash, no HTTP vs HTTPS mismatch. Without the correct namespace, some parsers reject the entire
file.
Sitemap Index Files — When One Sitemap Isn't Enough
If your site has more than 50,000 pages — or even if it has fewer but you want to organize sitemaps by content type — you need a sitemap index file. This is a separate XML file that lists multiple sitemap files rather than individual URLs.
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>https://example.com/sitemap-blog.xml</loc>
<lastmod>2026-05-20</lastmod>
</sitemap>
<sitemap>
<loc>https://example.com/sitemap-products.xml</loc>
<lastmod>2026-05-18</lastmod>
</sitemap>
</sitemapindex>
This tool detects sitemap index files automatically. When you validate an index, it validates the
index structure itself (correct root element, valid child <loc> values) and
can optionally fetch and validate each child sitemap — so you get a complete picture of your
entire sitemap ecosystem in one run.
Common mistakes with sitemap indexes: referencing child sitemaps that return 404s, setting
<lastmod> on the index entry to a date older than the actual child file's
last-modified header (confuses crawl freshness signals), and exceeding the 50,000-sitemap limit
on the index itself (yes, that's also capped at 50,000 entries).
Common Sitemap Mistakes That Kill Your SEO
I've audited dozens of sites where the developer did everything right in their Next.js or WordPress config, but the generated sitemap still had one of these quiet problems lurking:
-
Including noindex pages: If a page has
<meta name="robots" content="noindex">, it should not be in your sitemap. Including it sends Google a contradictory signal — you're saying "crawl this" and "don't index this" simultaneously. Google will usually obey the noindex directive, but it wastes crawl budget getting there. - Including redirect URLs: Your sitemap should only contain canonical, final-destination URLs. If you have a 301 redirect from an old URL to a new one, only the new URL belongs in the sitemap. Redirects in sitemaps burn crawl budget and dilute signals.
-
HTTP URLs in an HTTPS sitemap: Mixed protocol URLs (some
http://, somehttps://) create implicit duplicate content issues. Every URL should match your canonical protocol — almost certainlyhttps://in 2026. - Stale sitemap not updated after deletions: You deleted 50 old blog posts but your static sitemap generator still references them. Now your sitemap has 50 dead URLs, and Google is wasting crawl budget checking them every recrawl cycle. Run this validator after any content deletion sprint.
-
Setting all priorities to 1.0: This is the sitemap equivalent of bolding
every word in a document. If everything is highest priority, nothing is. Use a gradient:
homepage at
1.0, top-level pages at0.9, blog posts at0.7–0.8, tag pages at0.5or below. -
Wrong <lastmod> format: This is the most common mistake I see. Developers
set lastmod in their database timestamp format —
2026-05-20 14:30:00(no T, no timezone) — and Google silently drops the field. UseYYYY-MM-DDat minimum, or full ISO 8601 with timezone offset if you want precision. -
Forgetting to declare the sitemap in robots.txt: Google can find your sitemap
through Search Console submission, but declaring it in
robots.txtwithSitemap: https://yoursite.com/sitemap.xmlensures all crawlers (not just Google) discover it automatically. Our robots.txt & llms.txt generator handles this correctly — it auto-includes the sitemap declaration.
Sitemap Checker vs Google Search Console — What's the Difference?
Sound familiar? You submit your sitemap to Search Console, wait 48 hours, and get back a vague "Sitemap could not be read" error with no line number, no context, and no hint of what's wrong. Search Console's built-in sitemap checker is a black box — this free sitemap checker online is not.
This validator is the opposite of that. It tells you exactly which URL has a 404, exactly which
<lastmod> value has the wrong format, and exactly how many duplicates you have
— line by line, with color-coded severity. Use this tool to get to 100% first, then submit to
Search Console with confidence. Think of Search Console as the gate; think of this tool as the
prep work you do before approaching the gate.
The other thing Search Console won't tell you: whether your sitemap is being used efficiently. You might have a "valid" sitemap per Google's minimal spec, but still have 40% of your URLs returning redirects or 15% with missing lastmod — both of which hurt crawl efficiency without triggering a hard "error" status. This tool surfaces those soft problems too.
There's also a timing difference worth understanding. Search Console's sitemap validator only runs when you explicitly trigger it — either on submission or manual recheck. Your sitemap can degrade silently between those checks: pages get deleted, URLs get renamed, HTTPS migration introduces mixed protocol entries. A monthly run through this validator catches that drift continuously, not just at submission time.
How Sitemaps Affect Crawl Budget — The Part Most Guides Skip
Crawl budget is the number of pages Googlebot is willing to fetch from your site within a given timeframe. It's determined by two factors: crawl rate limit (how fast Google crawls without overloading your server) and crawl demand (how popular and fresh your content appears to be). Your sitemap directly affects the second factor.
Here's the part that trips people up: Googlebot doesn't just use your sitemap to discover URLs —
it uses <lastmod> dates to decide which pages are worth recrawling. If your
lastmod dates are accurate and recent, Google crawls those pages more frequently. If your lastmod
is missing, wrong format, or frozen at the same old date for everything, Google treats your site
as largely static and reduces recrawl frequency. That new blog post you published this morning?
It might not get indexed for 3–4 days instead of a few hours — all because of a bad lastmod date.
The second crawl budget killer is dead URLs in your sitemap. Every 404 in your sitemap consumes crawl budget without producing an indexed page. On a large site with hundreds of dead URLs in an outdated sitemap, Googlebot can waste 20–30% of your crawl allocation checking pages that no longer exist — leaving less budget for your new, valuable content. This is why the broken link detection in this validator matters beyond just "fixing errors." It's about protecting the budget Google allocates to crawling your live content.
Finally, consider crawl demand amplification. When your sitemap is clean — accurate lastmod, no dead links, no redirects, no duplicates — Google sees a site that is well-maintained and freshly updated. That signal compounds: Googlebot starts visiting more frequently, which means new content gets indexed faster, which means it appears in search results sooner. A validated sitemap isn't just about "not having errors." It's an active SEO signal that tells Google your site is worth prioritizing.
For Next.js developers specifically: if you're generating sitemaps with the App Router's built-in
sitemap.ts route handler, make sure your lastModified field is actually
reading from your database or CMS — not hardcoded to the build date. A sitemap where every URL
has the same <lastmod> is worse than having no lastmod at all, because it
actively misleads Googlebot about which pages have changed.
Image and Video Sitemaps — The SEO Edge Most Sites Miss
Standard sitemaps only list page URLs. But if your site relies heavily on images or videos for
traffic — photography portfolios, recipe blogs, tutorial sites, e-commerce product pages —
you should be using Google's sitemap extensions for rich media. These extensions add extra
metadata inside your <url> entries that helps Google index images and videos
for Google Images and Google Video search.
An image sitemap extension looks like this inside a standard <url> block:
<url>
<loc>https://example.com/blog/my-recipe</loc>
<image:image>
<image:loc>https://example.com/images/recipe-hero.webp</image:loc>
<image:title>Chocolate Lava Cake Recipe</image:title>
<image:caption>A rich chocolate lava cake served warm</image:caption>
</image:image>
</url>
The validator checks image sitemaps for the required <image:loc> tag, validates
that image URLs are absolute and reachable, and flags cases where the image namespace declaration
is missing from the root element. Missing the namespace declaration
(xmlns:image="http://www.google.com/schemas/sitemap-image/1.1") means Google's
parser silently ignores all image tags — a very common mistake when manually editing sitemaps.
If your site uses Next.js and you're serving WebP images, pairing a clean image sitemap with properly converted assets gives Google Images the signals it needs. Our image to WebP converter makes that conversion fast — then validate the resulting sitemap entries here to make sure the image URLs are accessible and correctly declared.
Related Tools You Might Find Useful
If you're doing a full technical SEO audit, the sitemap is usually the starting point. Once you've validated it, you'll likely want to check that your robots.txt and llms.txt are configured correctly and that the sitemap URL is properly declared there.
For content-heavy sites, make sure your page metadata is solid too — our word and character counter is useful when you want to check that meta descriptions across your pages are within Google's ~155-character display limit before they end up in the sitemap. If you're generating dynamic sitemaps in Next.js 14+, the blog post on fixing "Discovered — Currently Not Indexed" in Next.js 14 covers the exact sitemap + metadata combination that Google needs to start crawling your pages.
Why Use WebToolsHub?
All tools on WebToolsHub are completely free — no account required, no signup, no usage limits. Everything runs client-side in your browser using JavaScript, which means your sitemap data never leaves your machine and is never transmitted to, stored in, or logged by any server. No ads interrupting your workflow, no paywalls after three uses, and no "upgrade to Pro" popups. Just a fast, focused tool that does exactly what it says.



