How do search engines work?

Search engines are the name given to computer programs that allow us to quickly and easily reach the information we want to achieve without getting lost in the infinity of the Internet. In order to answer the queries sent to him, a search engine must "see" the information on the Internet in advance. In other words, a search engine can only provide you with the pages "he sees" and "remembers" previously. The pages cannot be present to users if they have not seen or seen before. But how does a computer see and remember the pages?

The pages on the Internet are connected via links (link). This makes it possible to move from one page to another. Moreover, they normally link pages to pages that are relevant to them. For example, a Turkish page that mentions heart surgeries, even on the site, is unlikely to see a link to a French page about cat food. Here, search engines start from a site, taking advantage of this feature, and they start to wander the Internet. When they see a page, they look at the page just like a user will be looking at a browser like internet Explorer or Firefox, and they try to understand its contents. They then write the content in their memory (hard disks) and follow the links on the page to other pages and do the same. So they can visit and try to remember as many sites as possible while browsing the Internet.

Remembering pages is basically thanks to the words on the page. In the lists called "index", they hold words that appear on the page, such as the contents at the beginning of a book or the index at the end. So they try to keep them in their memory which words are going through the pages they see. More advanced search engines the frequency (number) of words passing through the page, the location of the words, the position relative to each other, the words used for external links within the page, the page title, the titles on the page, the uppercase-lowercase letter The size of the words, the color of the writings, the topic dominated by the site, the content of other pages that link to the page, and the content of the external links that the page links to, they observe and store them in indices.

The purpose of indexing rather than storing the page as it is is to make it easy to reach the information on the page when necessary. To make an analogy, we'll still need the contents section, although we have the entire book. In our book we look at chapter titles, page titles, etc. before we search for a topic. Or, when referring to another book in an important book for us, if it is under our hand, we will agree and try to review it. The search engines are also implementing a similar thought for internet pages and sites.

When a user submits a query, the search engines immediately look at their index and try to find pages in the query that the words passed in. They then sort by various criteria between those pages and show the results to the user.

To summarize one more time, the search engines are basically composed of three parts. The first part is in English, which is called crawler or Spider, and collects the content of the pages. The second module is the module that examines the content of the pages collected from the Internet and stores it in indexes. The query module, which is the last section, sorts and displays queries from the user by locating the indexes created in the second section and shows the user.