So we looked in an Intro to Search at why people need to search. But how do search engines work?
For example, did you know that Google estimates that they only know about 40% of the web pages out there? But how did they learn about all those other pages?
Well, lets look at what makes up a search engine. Now this is going to be a high level view, not so high as to think that its “magic” but not deep in the details. You won’t need to know anything about coding, IT, or related information to understand this.
We’re going to break Search Engines into three main sections, and each search engine has these three. Now they may have multiple sub sections under each section, but we’re not going to worry about that right now.
Bots and Spiders
Now, I’m not talking about the 8 legged type! I’m talking about automated programs those job is to go out there and search for content. In fact, they search for two types of content, new and updated.
A spider starts at a web page and looks for the content, and then, sends its information for storage to be indexed later on. (More on this in a minute.)
The spider then looks for links in the web page, and will start following those links, finding new content to index and follow. Some of these pages that are linked to, will be something that it already knows about, sometimes it is a new page.
If you view the content of pages linking to one another, it starts to look like a huge spider web, and thus these automated bots are called spiders.
Indexers
Once the content has been found by the spiders, the search engines need to know how good the content is, and what it relates to. There are a series of applications which will attempt to find and “correct” misspellings, key concepts, and even perform translations.
The idea is for these programs to help determine what a web page, and the corresponding website, is about, and determine how good it is at describing those key term(s).
It uses a variety of factors, in some search engines it can be over 400, to determine what words or phrases it should return in the search results for. Everything from the name of the file, to how many other web pages link to it. When was it last updated, and are there suspicious files on the website, or do other web pages on the site rank well.
Where on the page did the key term(s) appear (at the top or bottom), did it appear more than once, or within a heading tag? On and on these go, factoring in what they think people will want as a search result.
Bots search quickly and return lots of information to the indexers. Therefore, there is often a back log of pages that need to be indexed. Sometimes a page will get a quick index, to see if it should be considered more deeply, but not be thoroughly indexed for some time.
The more a site is ranked the faster the search engines will get to their new pages it seems, this can make small and/or new websites very hard to show up in the search results.
Retrieval Program
This is the part we’re most familiar with. We type in a search word or phrase and out spits ten million results for us to sort through…. well, sort of.
We put in our search term, and the search engine goes into the index. That’s the results of everything that the indexers have looked at. This is a very fast tool which returns to you the top results. However, in reality, while it might say there are 10,000,000 results, in reality, it will only find the first few hundred, before they know you’ll redirect/retry or have found your answer.
As it only searches it’s index, it is not searching the entire Internet. It’s not even searching the entire Internet that it knows about, because they’re not all in the index yet.
The retrieval program will also take information about you, such as recently searched items, what sites you typically pick answers from, etc and based upon the index and your personal preferences, return a series of results.
If you don’t believe me, perform a search for something you might normally search for, and then go into incognito mode, and search for it again. You’ll potentially get different suggestions because it views you as a clean slate.
Intro to How Search Engines Work was originally found on Access 2 Learn
One Comment
Comments are closed.