Intro to How Search Engines Work

So we looked in an Intro to Search at why people need to search. But how do search engines work?

For example, did you know that Google estimates that they only know about 40% of the web pages out there? But how did they learn about all those other pages?

Well, lets look at what makes up a search engine. Now this is going to be a high level view, not so high as to think that its “magic” but not deep in the details. You won’t need to know anything about coding, IT, or related information to understand this.

We’re going to break Search Engines into three main sections, and each search engine has these three. Now they may have multiple sub sections under each section, but we’re not going to worry about that right now.

Bots and Spiders

Now, I’m not talking about the 8 legged type! I’m talking about automated programs those job is to go out there and search for content. In fact, they search for two types of content, new and updated.

A spider starts at a web page and looks for the content, and then, sends its information for storage to be indexed later on. (More on this in a minute.)

The spider then looks for links in the web page, and will start following those links, finding new content to index and follow. Some of these pages that are linked to, will be something that it already knows about, sometimes it is a new page.

If you view the content of pages linking to one another, it starts to look like a huge spider web, and thus these automated bots are called spiders.

spider web type view of interconnected websites

Indexers

Once the content has been found by the spiders, the search engines need to know how good the content is, and what it relates to. There are a series of applications which will attempt to find and “correct” misspellings, key concepts, and even perform translations.

The idea is for these programs to help determine what a web page, and the corresponding website, is about, and determine how good it is at describing those key term(s).

It uses a variety of factors, in some search engines it can be over 400, to determine what words or phrases it should return in the search results for. Everything from the name of the file, to how many other web pages link to it. When was it last updated, and are there suspicious files on the website, or do other web pages on the site rank well.

Where on the page did the key term(s) appear (at the top or bottom), did it appear more than once, or within a heading tag? On and on these go, factoring in what they think people will want as a search result.

Bots search quickly and return lots of information to the indexers. Therefore, there is often a back log of pages that need to be indexed. Sometimes a page will get a quick index, to see if it should be considered more deeply, but not be thoroughly indexed for some time.

The more a site is ranked the faster the search engines will get to their new pages it seems, this can make small and/or new websites very hard to show up in the search results.

Retrieval Program

This is the part we’re most familiar with. We type in a search word or phrase and out spits ten million results for us to sort through…. well, sort of.

We put in our search term, and the search engine goes into the index. That’s the results of everything that the indexers have looked at. This is a very fast tool which returns to you the top results. However, in reality, while it might say there are 10,000,000 results, in reality, it will only find the first few hundred, before they know you’ll redirect/retry or have found your answer.

As it only searches it’s index, it is not searching the entire Internet. It’s not even searching the entire Internet that it knows about, because they’re not all in the index yet.

The retrieval program will also take information about you, such as recently searched items, what sites you typically pick answers from, etc and based upon the index and your personal preferences, return a series of results.

If you don’t believe me, perform a search for something you might normally search for, and then go into incognito mode, and search for it again. You’ll potentially get different suggestions because it views you as a clean slate.

Intro to How Search Engines Work was originally found on Access 2 Learn

Walter Wimberly

Assistant Professor

Walter Wimberly is an Assistant Professor at a regional college in Tennessee, teaching Computer Science in the Software Engineering track. He works as a student advisor, oversees curriculum changes, develops new courses, and manages the advisory panel.
Walter taught full time for about 7 years, before going back into “industry” as a full stack Software Developer for a dozen years. There he focused on web based projects coding in JavaScript/jQuery and utilizing the Bootstrap CSS Framework on the front-end, and coding in PHP, ASP/ASP.Net, SQL on the back-end.

Since he loves teaching, he taught as an adjunct web and digital media classes for eight (8) years, while working in industry, and has since returned to teaching full time.

He has been married for over 25 years, and is father to several special needs boys. As such, he is working on some projects to help others who have special needs to be self-sufficient, and support the care givers of those with special needs. Check out his Autism blog for more info.

HTML/CSS Tutorials

An Introduction to Search Engine Optimization (SEO)

ByWalter Wimberly January 6, 2013

Search Signals Search engines use different signals to determine which sites to display, and in what order to display them, in response to a user’s query. Most general search engines (Google, Bing, Yahoo, etc.) use similar signals. However, how much weight a signal is given, and what they index (content they know about) can vary….

Improving Your Search

ByWalter Wimberly July 25, 2020July 25, 2020

Now that we’ve looked at a Intro to Search and have a basic idea of how a search engine works, now we need to look at what we can do to improve our own search results – i.e. those things that get returned when you type what you want into the search engine. Knowing that…

Tutorials

An Introduction to Search Engine Marketing (SEM)

ByWalter Wimberly January 6, 2013February 4, 2019

Search Engine Marketing, or SEM, is about ranking ads to display when someone searches. The Ads are usually shown to the top and/or right side of the organic search results. Because the ad results are near the organic results, Search Engine Marketing shares many similar properties to search engine optimization. However, there are additional factors…

Intro to Search

ByWalter Wimberly July 23, 2020July 23, 2020

Do you know how many websites are on the Internet? To be honest, no one really knows, but the best guess is around 1.6 BILLION, and that was in 2018. Usually that number is going up. After all, it only takes a few bucks to create a website, if you aren’t using one of the…

Algorithms | Programming

Searching Indexs

ByWalter Wimberly January 14, 2020January 21, 2020

Now the idea behind weighted search is nice, but it is complicated, requires several extra calculation steps, and can cause other issues. In a Database, you might have heard about indexing. The idea of finding a way to store a subset of your table which means faster searching. Of course, if you are looking through…

HTML/CSS Tutorials

One Comment

Pingback: Improving Your Search | Access 2 Learn

Comments are closed.

Bots and Spiders

Indexers

Retrieval Program

Similar Posts

One Comment