If you own a Wordpress website, you must have looked at the Robots.txt file once, right? So you wonder what the Robots.txt file is ? In this article, I will clearly explain this term as well as show you how to create Robots.txt with Wordpress extremely simply. Please read the full article!
What is robots.txt file?
The robots.txt file is a simple .txt text file that is used by many webmasters. This file is part of the Robots Exclusion Protocol (REP) which contains web standards to tell search engine robots which pages on your website they can and cannot crawl, access, index and provide that content to the user.
REP includes commands like meta robots, as well as Page-Subdirectory, Site-Wide Instructions. These commands instruct search engines how to handle links (such as “Follow” or “nofollow”).
The robots.txt file is used to block robots from accessing the website. The robots.txt file is usually the first place a crawler visits when visiting a website. Even if you want Google bot to access all of your website, you should still add the robots.txt file.
What is robots.txt file? An overview of SEO with robots.txt
Syntax of robots.txt . file
The robots.txt file has the following basic format:
- User-agent: Is the name of the crawler or web data access such as Googlebot, Bingbot, etc.
- Disallow: This section is responsible for notifying the User-agent not to crawl any URLs. Each URL can use up to 1 Disallow line.
- Allow (applies to Googlebot): Notifies Googlebot to access a page or a subdirectory. Although these pages or subfolders may not be allowed.
- Crawl-delay: Informs Web crawlers how long to wait for web page content to load and crawl. However, this command has no effect on GoogleBot. Therefore, it is recommended that you set the crawl rate in Google Search Console.
- Sitemap: Used to provide the XML Sitemap location associated with the web page. Note that this command is only supported on Google, Ask, Bing and Yahoo tools.
Following the basic formatting of robots.txt, you can omit the Crawl-delay and Sitemap sections. In fact, a robots.txt file will contain more User-agent lines and more user commands. Each line of commands: Disallow, Allow, Crawl-delay,… in the robots.txt file specify a different bot and are separated by 1 line.
In the robots.txt file , you can specify commands for the bots by writing them continuously without line breaks. In case a robots.txt file contains many commands for a type of bot, the bot will follow the command that is written clearly and completely.
Special files in robots.txt
Is the robots.txt file necessary?
Creating a robots.txt file gives you more control over certain areas of your website. This is very dangerous if you do a few wrong actions that make Google Bot unable to index your website.
However, creating a robots.txt file is really useful for websites. The benefits can include:
- Prevent the collection of duplicate content.
- Keep parts of a website private.
- Prevent crawling of internal search results pages.
- Server overload protection.
- Limit Google’s waste in the “crawl budget”.
- Prevent images, videos, and resource files from appearing in Google search results.
Although Google does not index blocked sites in robots.txt, it does not guarantee exclusion from search results. As Google says that if content is linked to from elsewhere on the website, it can still appear in Google’s search results.
If there is no area on your website that you want to control access to, the robots.txt file may not be needed.
How does the robots.txt file work?
Search engines have two main jobs:
- Crawl the web to discover content.
- Index that content to provide users looking for information.
To crawl web pages, search engines follow links to go from one page to another. Ultimately, it crawls through billions of links and web pages. This information gathering process is called “Spidering”.
After arriving at a web page, the crawler will look for the robots.txt file. If it finds a file, the crawler will read that file before continuing to the next page.
Because the robots.txt file contains information about how search engines crawl. Therefore, the information found there instructs the crawler to add more content for these specific sites.
If the robots.txt file does not contain any directives that disallow User-agent operations, or if the site does not have a robots.txt file, it will perform a different crawl of the site.
Where is the robots.txt file located on the website?
When you create a website on Wordpress, it will automatically create a robots.txt file in the root directory of your domain name.
For example, to control the crawling behavior of the domain “domain.com” , you can access the robots.txt file at the “domain.com/robots.txt” site. If you want to control crawling on a subdomain like “blog.domain.com”, the robots.txt file can be accessed at “blog.domain.com/robots.txt” .
If you are using WordPress, the robots.txt file can be found in the public_html directory of your website.
WordPress includes a robots.txt file by default with a new installation that will include the following:
Disallow: / wp-admin /
The above is to tell all bots to crawl all parts of the site except for the content in the /wp-admin/ or /wp-include/ directory.
How to check website has robots.txt file or not?
To check if your website already has a robots.txt file, simply enter the root domain name, then add /robots.txt to the end of the URL.
For example, domain.com/robots.txt.
If no .txt pages appear, you do not have a robots.txt file.
How to create robots.txt file in WordPress
Once you’ve decided what should appear in your robots.txt file, all that remains is to proceed with the creation steps. You can edit robots.txt in WordPress using a plugin or manually. In this section, I will show you how to create a simple robots.txt file in Wordpress.
Create robots.txt File with Yoast SEO plugin
To optimize your WordPress site, you can use SEO plugins. These plugins each contain their own robots.txt file generator. In this section, I will create a robots.txt file using the Yoast SEO plugin. Using plugins, it will be easier for you to create robots.txt files.
Step 1. Install Plugin
Click on Plugins > Add New. Then search, install and activate the Yoast SEO plugin if you haven’t already.
Step 2. Create the robots.txt . file
Once the plugin is activated, click SEO > Tools > File editor.
Since this is my first time creating a file, click Create robots.txt file
You will notice the file is created with some default directives.
The default Yoast SEO robots.txt file generator will have the following commands:
You can add more commands to robots.txt if you want. Once done, click Save changes to robots.txt.
Go ahead and enter the domain name followed by “/robots.txt”. If you find the default commands as shown below, then you have successfully created the robots.txt file.
A plus is that you should add a sitemap to your robots.txt file.
For example, if your sitemap URL is https://yourdomain.com/sitemap.xml, consider including Sitemap: https://yourdomain.com/sitemap.xml in your robots.txt file Friend.
Another example would be if you wanted to create a directive to block bots from crawling all the images in your site. In that case, the robots.txt file would look like this:
>>See more: Instructions for installing and using the Yoast SEO Plugin
Create robots.txt file through All in One SEO Plugin
All in One SEO Pack is another popular name when it comes to WordPress SEO. It includes most of the features that Yoast SEO has, but some people prefer this plugin because it is a lightweight plugin. For robots.txt, creating files with this plugin is just as simple.
Once you have the plugin set up, navigate to the All in One SEO page > Features Manager > Click Active for the robots.txt section. Inside, you’ll find an option called Robots.txt, with an Activate button right below it. Go ahead and click that.
Now, the interface will appear a new Robots.txt tab displayed in the All in One SEO menu. If you click on it, you’ll see options to add new rules to your file, save the changes you make, or delete it entirely.
Note that you cannot make edits to the robots.txt file directly using this plugin. The file itself will be grayed out, unlike with Yoast SEO, allowing you to import anything you want.
However, on the positive side, this plugin helps you limit damage to your website in case Malware bots will damage your website.
Create robots.txt file and upload via FTP
To create a robots.txt file, you need to open a text editor like Notepad or TextEdit and type in the text. You can then save the file, using whatever name you want and the txt file type. It actually takes a few seconds to do this, here’s how to edit the robots.txt file in WordPress without using a plugin.
You can use a text editor like Notepad, TextEdit, vi, and emacs to create a robots.txt file. Avoid using word processors, as they often save files in a proprietary format and can add invalid characters, causing crawlers problems. Save the file using UTF-8 encoding if it says in the save file dialog.
Once you have created and saved the aforementioned file, next you need to connect to the website via FTP and navigate to the public_html folder. Then upload the robots.txt file from your computer to the server.
It will only take a few seconds to upload the file. This method is actually simpler than using a plugin.
Mistakes to avoid when creating robots.txt
Doesn’t block good content
It’s important not to block any good content that you want to show publicly with a robots.txt file or a noindex tag. This has affected SEO results. Therefore, you should double-check for noindex tags and invalid rules.
Avoid excessive crawl delay
You should avoid frequent crawl delays as this is restricting pages from being crawled by bots. This may be fine for some sites, but if you own a large site, it can interfere with your ability to rank well and get traffic.
The Robots.txt file is case sensitive, so you must remember to create the robots.txt file in its correct format. If not formatted correctly, robots.txt file may not work.
Instructions on how to submit robots.txt file
Reference articles >> What is SEO ? The benefits of SEO website to the top >>Top 21 latest Onpage SEO optimization standards should not be missed >>Learn Offpage SEO techniques to help rank thousands of keywords
Frequently asked questions about the file robots.txt
What is the maximum limit of robots.txt file?
The robots.txt file size limit is 500 kibibytes (KiB). Content after the maximum file size will be ignored.
How to edit robots.txt file in WordPress?
You can either manually or use WordPress SEO plugins like Yoast to edit robots.txt from the WordPress backend.
Why is robots.txt file important for SEO?
The robots.txt SEO file plays an important role in SEO, as it allows you to give instructions to GoogleBot to find which pages of your website should be crawled and which should not.
Does my site need a robots.txt file?
When Googlebot visits a website, Google requests crawl permission by retrieving the robots.txt file. A site that typically doesn’t have a robots.txt file, robots meta tag, or the X-Robots-Tag HTTP header will still be crawled and indexed normally.
I use the same robots.txt file for multiple sites. Can I use an alternate URL for a relative path?
You cannot use an alternate URL for a relative path. Because the commands contained in the robots.txt file (except sitemap:) are only valid for relative paths.
Can robots.txt file be placed in a subdirectory?
The robots.txt file should be placed in the top-level directory of the site.
Is it possible to block users from viewing the robots.txt file?
The robots.txt file allows users to view. If you do not want users to see private information, it is best not to include it in the robots.txt file.
Is the allow directive required to allow Google to crawl it?
You do not need to declare the allow directive with Google in robots.txt. All URLs are allowed. The allow directive is only used to override the disallow directive in the robots.txt file.
What program should I use to create the robots.txt file?
You can use any text generator capable of creating a valid text file. The editors used to create robots.txt files are Notepad, TextEdit, vi or emacs.
If I use a disallow directive in the robots.txt file to block Google from crawling a page, will the page disappear from the search results?
Preventing Google from crawling a website may result in the site being removed from Google’s index altogether.
How can I suspend all data collection on my site?
You can return an HTTP 503 (service unavailable) status code for any URL, including a robots.txt file, to suspend all crawling. Google will regularly revisit that robots.txt file until it succeeds. You should not change the robots.txt file for the purpose of blocking Google’s crawling.
I return a 403 Forbidden status code for every URL, including the robots.txt file. Why is my site still being crawled?
The HTTP 403 Forbidden code as well as other 4xx HTTP codes that are said to be robots.txt file does not exist. This means that crawlers will usually be able to crawl every website URL by default. To prevent crawling on your website, you must return the robots.txt file with an HTTP 200 OK code and the file must contain the appropriate disallow rule.
The robots.txt file may not be so important when you start building a website. However, as your website grows and the number of pages increases, you will need the robots.txt file. Hopefully this article will help you get some insight into what robots.txt file is and how to create a robots.txt file for your website. Wish you early success!