XML sitemap: the secret code to successful indexing
Greetings to all SEO professionals and enthusiasts! Today we will talk about an important tool, the XML sitemap, which, when used correctly, can greatly improve your search engine rankings. This means we cannot ignore it. So, let’s begin!
Sitemaps are a simple tool for webmasters that help search engines better understand the structure and importance of pages on a website. The basis of a sitemap is an XML file that lists URLs, along with additional data such as the date of last update, frequency of changes, and priority level relative to other pages on the site. This information helps search engines effectively crawl all pages on the site and provide intelligent indexing. XML sitemaps are not intended for humans, but solely for search engine robots.
History of creation
In the early days of the Internet, search engines indexed websites solely through robots that traversed all pages via links. The constantly growing number of websites on the Internet made finding new pages for search engines a very resource-intensive task, with the possibility that some important pages would not be discovered by the bot. At this point, the idea arose that webmasters could inform search engines in a machine-readable format about the list of pages on their site. Thus, the Sitemaps project was born.
On June 2, 2005, Google’s Technical Director, Shiva Shivakumar, announced in a blog post:
“We are conducting an experiment called Google Sitemaps, which will either fail or exceed our wildest dreams and make the Internet better for both webmasters and users. It is a beta version of an “ecosystem” that can help webmasters address two current tasks: informing Google about all of your new web pages or updates, and increasing the coverage of your web pages in the Google index.”
On November 16, 2006, Google, Yahoo!, and Microsoft announced their joint support for Sitemaps 0.90. As emphasized by Tim Mayer, Senior Director of Product Management for Yahoo Search:
“The launch of Sitemaps is important as it allows websites to use a unified and simple way to provide content and metadata to search engines.”
“Sitemaps help webmasters display content that is typically difficult for scanners to discover, providing a more comprehensive search experience for users.”
This version of Sitemaps is still relevant today.
On April 11, 2007, Ask.com, Google, Microsoft Live Search, and Yahoo! announced their support for auto-discovery of Sitemap files using the robots.txt file.
On May 13, 2023, Bing announced the removal of the ability to anonymously submit a sitemap via an HTTP request. On June 26, 2023, Google also canceled this feature. In both cases, it was stated that spammers had been abusing this feature.
In which cases is a Sitemap file necessary?
According to Google’s official documentation, a Sitemap is necessary in three cases:
● If you have a large website (more than 500 pages);
● If the website is new and/or has issues with internal linking and/or link mass;
● If the website is featured in Google News or contains a lot of multimedia content.
However, practice shows that there are no guarantees that a robot will find all pages of a website without a Sitemap file. Additionally, having a Sitemap file makes it easier for the robot to obtain information about changes made to the website. Therefore, we recommend adding an XML map for any website: it won’t hurt in any case.
The Sitemap file should be created in UTF-8 encoding. Instead of characters &, ‘, “, >, <, their masks should be used. The order of URLs in the Sitemap file does not matter for search engines.
The sitemap may contain XML tags:
Let’s look at them in more detail.
● urlset defines the namespace (protocol standard).
● url is the parent tag for each entry.
● loc is a child tag for url and contains the full URL of the page (including the protocol prefix). It specifies the current page that will be described by optional tags. The content of this tag should not exceed 2048 characters.
● lastmod is a child tag for url and contains the date of the last modification of the current page in W3C Datetime format. The date is often used in the abbreviated format YYYY-MM-DD.
● changefreq is a child tag for url and describes the expected frequency of changes to the current page. It can take the following values:
The priority tag is a child tag for the url tag and describes the relative priority of the current page on the website, ranging from 0 to 1. The priority is set in relation to other pages on the same site. The default priority of a page is 0.5.
Example of a sitemap:
Sitemap Index File
There are two limitations for sitemap files. Each file must:
1. Contain no more than 50,000 URLs.
2. Have a size no larger than 52,428,800 bytes.
If the sitemap exceeds these limitations, it is necessary to create multiple sitemap files. Each file must be mentioned in the index file. Index sitemap files also
cannot contain links to more than 50,000 sitemap files, and their size should not exceed 52,428,800 bytes.
In Google Search Console, there is a limitation: you can provide no more than 500 Sitemap index files for each of your websites.
The Sitemap index file can contain the following XML tags:
*The sitemapindex tag is the parent for the entire XML tree.
*The sitemap tag is the parent for each entry.
*The loc tag is a child tag for sitemap and contains the full URL of the page (including the protocol prefix). The Sitemap index file can contain no more than 50,000 loc tags.
*The lastmod tag is optional and is a child tag for sitemap. It contains the last modification date of the corresponding Sitemap file in W3C Datetime format. The date is most often used in the shortened format YYYY-MM-DD.
Example of an Index File
Sitemaps Protocol Extension
There is a possibility to extend the Sitemaps protocol with a custom namespace by specifying it in the root element.
How to inform search engines about the sitemap location
There are three ways to inform search engines about the location of your sitemap file:
1. Through the webmaster tools panel,
2. Through the robots.txt file,
3. By using an HTTP request (this method is not recommended).
Sending the Sitemap file through the Google webmaster tools panel is only possible if
you have ownership rights to the resource. To add the file, you need to open the
Sitemap report page, select the resource, and add the Sitemap file URL. It is recommended to check if Google has access to the Sitemap file beforehand.
Here is review video from Google:
“Sitemaps in Search Console – Google Search Console Training”
Send the Sitemap file through the Bing webmaster tools panel.
Note that you can only submit a sitemap for a selected site from the list of verified sites.
To send the Sitemap file to Baidu, use the Baidu Ziyuan Webmaster Tools. Only high-quality websites that comply with Chinese legislation, have at least one page in simplified Chinese, and have passed verification can submit a sitemap to Baidu.
To inform search engines about the location of the sitemap, you need to add the following directive to the robots.txt file:
Here, instead of www.example.com, you add the domain of your site, and instead of sitemap.xml, the location of the site map.
The method of adding a sitemap through an HTTP request is outdated. It ceased to be effective in Bing as of May 13, 2023, and in Google as of June 26, 2023. The main reason for canceling this method of adding sitemap information is spam. Setting up a script that would automatically send your sitemap using the http protocol daily (or after each change) seemed like a good idea. For Google, the sent string looked like this:
As the number of websites increased, the load on search engine servers noticeably grew, and a decision was made to abandon this form of informing about the location of the sitemap.
Sitemap and Google
In Google, the changefreq and priority elements specified in the Sitemap file are not taken into account when indexing the site. The changefreq element, which shows how often the content of a page changes, is not relevant when there is a lastmod element that indicates the date of the last page modification. And the priority element, which assumes the determination of the page’s priority, is subjective and does not accurately reflect the real significance of the page compared to other pages on the site.
It is important to remember that Googlebot takes into account the lastmod attribute when scanning pages, but if you set an incorrect value for it, it will not be taken into account in the future. The number of Sitemap files does not play a role in indexing: on the Google side, all Sitemap files are merged, and URLs from them are processed together, regardless of the number of files you provided.
Google supports four ways to provide additional information in the Sitemap file:
1. Expand URLs to include graphic files,
2. Expand URLs to include video files,
3. Include information about different languages or localized versions of pages using the hreflang attribute
4. For news sites, a special type of Sitemap can be used to provide information about the latest updates.
Sitemap and Bing
The head manager of the Bing index generation program, Fabrice Canel,says:
“One of the key strategies for search engine optimization (SEO) of websites is having high-quality sitemaps that help search engines discover and access all relevant content hosted on this website. Sitemaps offer website owners a truly simple way to share with each search engine information about the content they have on their site, rather than relying solely on scanning algorithms (i.e. scanners, robots) to find it.“
He recommends avoiding too many XML sitemap files on the site. Ideally, he suggests using only one index sitemap file that lists all relevant sitemap files and index sitemap files.
To minimize the number of duplicate and non-working links in sitemaps, Bing recommends creating sitemaps at least once a day.
For very large sites with millions and billions of URLs, the total size of sitemap XML files can reach over 100 gigabytes. In these cases, Bing recommends creating two sets of sitemaps and alternating them daily.
Sitemap and Baidu
Baidu requires the following for sitemaps:
● Should not exceed 10 MB;
● Should not contain Chinese characters;
● Should be located in the root folder of your website.
How to create an XML sitemap?
Google offers a slightly outdated list of third-party Sitemap file generators that webmasters can use or create their own tool based on them. Currently, not all links are active, but many still work.
Here are some popular services for creating XML sitemaps:
Sitemap Writer Pro is a program that offers seven types of sitemaps (standard sitemap, Google images, Google news, Google video, Google code search, Google Geo, Google Mobile). It includes a built-in crawler for generating sitemaps and an FTP manager for uploading the sitemap to the website.
The XML sitemap generator includes a free generator (up to 500 pages), a professional generator, and a server script written in PHP.
My Sitemap Generator is a free sitemap generator that scans the website and creates an XML sitemap.
CMS plugins that create sitemaps:
XML Sitemap module– creates a sitemap. Includes several submodules that can add sitemap links for content, menu items, and user profiles.
Google (XML) Sitemaps Generator – automatically creates an XML sitemap. Additionally generates an HTML sitemap.
XML Sitemap Generator
IKSWEB XML Sitemap Generator – a lightweight and compact plugin that creates an XML sitemap.
Google XML News Sitemap plugin
The Google News plugin is open-source software that allows publishers to update Sitemap files for Google News. In addition to the usual functionality, it allows publishers to describe the types of content being published, as well as specify access levels for individual articles.
The current list of plugins creating a site map in XML format for Joomla is listed on the CMS website.
Other Site Map Formats:
In addition to the XML site map, there are other less common formats of site maps intended for search engines.
RSS, mRSS and Atom 1.0
For a blog with an RSS or Atom feed, you can use the URL of this feed as a source to create a Sitemap file. Almost all blogging platforms provide the ability to create a feed. Google supports RSS 2.0 and Atom 1.0 feeds, as well as mRSS (media RSS) feeds.
Text Sitemap File
In a text Sitemap file, you can only add URLs, each on a separate line. The extension of this file should be .txt (for example, sitemap.txt).
In conclusion, an XML sitemap is an essential tool in the arsenal of a webmaster or website owner. It is a structured file that helps search engines more effectively scan and index the content of your site. In this article, we have discussed key aspects of XML sitemaps and their important role in optimizing search visibility.
An XML sitemap allows you to control which pages of your site are available for indexing and provides additional information about each page, such as the date of last update and priority. This helps search engines more accurately determine which pages to index and how often to revisit them.
Using XML sitemaps is a key element of a successful SEO strategy. They contribute to improving the indexing of your site, thereby increasing its visibility in search results.
It is important to remember that an XML sitemap is a tool that requires constant updating and maintenance to remain current and effective. Regularly updating your sitemap allows bots to track changes on your site and inform search engines about these changes.
Dear friends! Thank you for reading and taking the time!
We hope the article was interesting and, most importantly, useful!