Knowledgebase

How to generate and update a sitemap in MediaWiki

A sitemap is a file(s) that lists pages on your site. It's used by search engine bots to crawl and index the pages on your site. So the general purpose of a sitemap is to help search engines and make it easier for them to index the pages of your site and thus to improve their visibility on the Internet.

In MediaWiki you can generate a sitemap and update it regularly with the help of a script that comes prepackaged with the application. In this article well go over the steps on how to create and update a sitemap.

Create a Sitemap Directory

This step is not really required. You can generate the sitemap files in any folder of your MediaWiki application. But in order to keep things more organized it's not a bad idea to have a specific folder in which to store the sitemap. For instance, in the root MediaWiki directory on your MediaWiki hosting account you can create a folder named sitemap in which you can later generate the sitemap. One way for HostKnox customers to create directories is from the File manager of the Pixie control panel.

Generate a Sitemap

MediaWiki comes with a PHP script for generating (and updating) a sitemap. The file is called generateSitemap.php and it's located in the maintenance folder of MediaWiki. The maintenance folder itself is located in the root MediaWiki directory. So if, for example, the application is installed directly in the public_html directory, then the path on the account in relation to that directory will be public_html/maintenance, and to the script it will be accordingly public_html/maintenance/generateSitemap.php.

To create the sitemap you have to execute the script file with a command. This is done via SSH. All HostKnox customers have a free SSH access as part of their hosting package. Here we'll not go into details how to connect to your account via SSH; for information on that check out our tutorial on how to connect to your hosting account via SSH (or the shorter article version). In the SSH section of our knowledge base you'll also find useful articles on how to manage various aspects of your account via SSH.

In its most basic form the command for generating a sitemap is:

php generateSitemap.php

In order for the above command to work the current working directory has to be changed to the folder that contains the script, that's the maintenance folder. This is done with the cd command. When a HostKnox customer logs into their account they are automatically logged into a directory that's a parent directory to the public_html folder (the root web-accessible folder). Assuming MediaWiki is installed directly in public_html, then you can change the current working directory to the maintenance folder with the command cd public_html/maintenance. Otherwise, if you don't change the current working directory to the maintenance folder, you also have to put the path to the file in the command for executing the script (e.g. php public_html/maintenance/generateSitemap.php).

This most simple form of the command will generate the sitemap in the current working directory. If you want the sitemap to be in a specific folder that you have created for it, you can either change the current working directory to that folder and execute the script from there, or you can add to the command an option with the path to the desired sitemap folder. In any case it's also a good idea to add to the command an option for the URL to the folder with the sitemap and also an option with the URL of the site. This is a precaution for preventing some possible problems. So it's recommended to execute a command that has the form:

php generateSitemap --fspath [path to folder for sitemap files] --urlpath [full URL to the folder with the sitemap files] --server [URL of the site]

The option --fspath is for the path on the hosting account to the folder in which you want the sitemap to be generated. As we mentioned, if before that you change the current working directory to that folder in which you want to put the sitemap, you can skip this option. The option --urlpath specifies the full URL address of the folder in which the sitemap is to be stored (you should also include the part http:// in the URL). The option --server is for the URL of the site (with the http:// part). In the actual command the brackets have to be removed and replaced with the actual values. So, for instance, if your MediaWiki is installed directly in public_html and you have created there a folder for the sitemap called sitemap, and the domain of your site is yourdomain.com, then the command will look like:

php generateSitemap --fspath /home/username/public_html/sitemap --urlpath http://yourdomain.com/sitemap --server http://yourdomain.com

Note that in our example the path to the public_html folder works for the way HostKnox accounts are set up. In the path you have to replace username with the actual folder name of your account.

The sitemap that the script generates actually consists of a number of files. There's one file that serves as an index to a bunch of other files. These files contain the URLs to the pages on your site; one such file is generated for each namespace on the site that contains pages. For example, there's a separate file that lists the pages in the Main namespace with all the articles, another is for the discussion pages, etc. If a namespace doesn't contain pages, a sitemap file is not generated for it.

Update the Sitemap

When the output of the script generateSitemap.php is put in the same folder where the previously generated sitemap files are located (e.g. the sitemap folder), then the old files are replaced and updated with the new ones. So in order to update the sitemap regularly you need to periodically execute the script. Instead of doing this manually every time, you can set up a cron job to execute the script at certain intervals, whatever you decide works best for your site (e.g. once a day, once a week).

An easy way for HostKnox customers to add and configure a cron job is from the Cron Jobs section of the Pixie control panel. From there you can add (and edit) the cron job for generating a sitemap. From the options that are available there for the cron job you can select how often you want it to be executed. For the command to be performed by the cron job use a command as described in the previous section of this article, particularly the last example. You should specify the path to the folder with the sitemap (with the --fspath option).

For more information on how to set up cron jobs check out the tutorial on how to manage cron jobs with the Pixie control panel.

Other articles related to the sitemap that you may find useful:

Was this answer helpful?

 Print this Article

Also Read