Web optimization for search engines

Published on 23 February 2023

Tips with which search engines will be able to crawl, index and better understand your website.

With these tips that we will give in a superficial way, the search engines will be able to crawl, index and understand your website better.
First of all, you have to find out if the main URL of your website is in the search engine. In the case of Google, you can enter the search term: site: tusiteweb.com.

That your site is not in Google is due to several reasons:

Your website is not well linked to other websites, which in turn are the only websites that link to you.
Your website is new and has not been crawled yet.
Your web design prevents your content from being crawled by search engines or gives some kind of error, whether your site is down or something similar.
Your website has a type of policy that prevents tracking.

Create a sitemap

The best way for your website to be found by any web crawler is by adding a sitemap to it. A sitemap is a file whose obligatory name must be 'sitemap.xml' or sitemap index that is on your website and that informs the crawler of all the pages that are made up of your site. At lookkle we have a sitemap generator that is useful for uploading to your website.

A website's sitemap is an HTML or XML file that contains links to all of your website's content. Sitemaps are helpful when you're developing a website. Search engines use a website's sitemap to find relevant content. Plus, creating one before your website launches makes it easier to create.

A search engine searches through websites to find relevant information. A relevant website has relevant content and a well-maintained structure. Creating a sitemap involves updating your website with links to new and updated content. This ensures your website has fresh content for search engines to find and use. It's also a good idea to update your sitemap whenever you make changes to your website. That way search engines can easily find new and updated information about your topic. Essentially, creating a sitemap helps people find relevant information about your topic.

It's beneficial to create a sitemap before creating your website. That way you can understand how a website works from the inside out. You'll know what information search engines are most likely to find and where that information is on your website. This makes it easy to create new pages for your website and add links to those pages in your sitemaps. Additionally, you can use the same sitemap for both development and hosting your site on a webserver. Essentially, this gives you two uses for the same file without requiring additional work on your part.

It's easy to create a sitemap in HTML or XML format using any text editor program such as Notepad++ or Notepad-Plus. First, rename the .txt file extension (for example, from 'txt' to 'xml') in the body of the file by adding .xml at the end of the filename. Next, open the file in any word processor program that allows you to edit HTML files (such as Microsoft Word). This way you can easily add links into the body of the document using hypertext links. Save the file as an HTML file and upload it onto your webserver via FTP program such as FileZilla.

A well-maintained sitemap helps search engines find relevant content on your website. Plus, creating one before starting work on your website makes it easier and quicker to create new pages for it. Either way, creating a sitemap before starting work on your website makes it easier and quicker to update it with new links.

The last step is to upload the sitemap.xml to Google:

Sign in to Google Search Console, or Sign in to Google Gmail and go to Google Search Console.
In the left sidebar, upper area, select your website.
Click on 'Sitemaps'.
Enter your URL/sitemap.xml
Click Submit.

Create a robot

A robot is a file named "robots.txt" that should be located in the main directory of your website. This file tells search engines the parts of your website that a web crawler can access.

For example, a good robot prevents a web crawler from accessing sensitive sites on its website, just as it prevents access to pages that redirect to other sites.

The robots.txt files must be placed at the root of the sites. For example, www.google.com/robots.txt.

This file must comply with a number of rules so that it can be readable from pages such as Google webmaster tools, and thus make your website more secure and provide confidence to the major search engines so they can add your website safely.

The main function of a robots.txt file is to give access or prohibit the entry to certain sites of your entire website, ie, there will always be personal folders on your website that you do not want to be visited by search engines, so the best way to make them is indicating to the same search engine that this particular directory path or web page is not interesting for users, and is only a configuration file or an internal file whose utility is to provide service to other web pages that if they are valid within your website.

In the robots.txt file rules will be applied for each line that is written, blocking or allowing access to external crawlers, whether search engines, determining in each line the exact path to the folder or web page. In the case that for a folder or file no permission is specified, the crawler will take as free access to that folder or file.

Example:

User-agent: Googlebot
Disallow: /admin_folder/
User-agent: MSNBot
Disallow: /admin_folder/
User-agent: *
Allow: /

Sitemap: https://www.example.com/sitemap.xml

Explanation:

The Googlebot user-agent cannot crawl any URL beginning with https://example.com/admin_folder/.
The MSNBot user-agent cannot crawl any URL beginning with https://example.com/admin_folder/.
The rest of the user-agents can crawl the entire site.
The sitemap file for the site is at https://www.example.com/sitemap.xml.

In the following Google link you can test your robots.txt: https://support.google.com/webmasters/answer/6062598

Do not block internal content of web pages

Crawlers must be allowed to access the JavaScript, CSS and images files used by your website, that is, the web crawler must parse the page as a normal user sees it. Your site's robots.txt file must allow tracking these files.
Google offers the URL inspection tool belonging to the Google Search Console that allows you to know exactly how Google sees any web page on your site and if it is indexed in the search engine.