top of page

How Search Engines Work, and Things You Shouldn’t Do, a post by R. Clint Peters

  • R. Clint Peters, Author
  • Jan 31, 2013
  • 3 min read

Several months ago, while working as a customer service agent, I got a great idea.  I wanted to create a program that would keep track of my time, warn me of certain events (when I went over the established time limits), and generally keep notes on what I was doing.

I had done some programming many years ago, and felt up to the task.  I began to search for a programming language to use, and finally settled on a free online programming course.

The goal of the course I selected was to write a web-browsing program.  I managed to pass the final test, but did not write my program.

However, I learned a lot about browsers.

When you submit your inquiry into the search box, the browser rushes off to a huge database and compares what you wrote to what is in the database.  If you wrote “black horse”, all of the references in the database to “black horse” are accessed.

However, how are all the references sorted?  By the most recent?  By the oldest?  By the blackest horse?  Or maybe by the biggest horse?

Before we answer that question, let us look at how the information got into the huge database.

The Book Reviewers Club URL (Universal Resource Locator) is:  http://theauthorsclub.wordpress.com.  It can also be written as http://www.theauthorsclub.wordpress.com.  The key letters are www, or World Wide Web.

The database your browser accessed for your search term did not magically appear, and magically fill itself up.  It was filled up by a Spider that cruises the Web, and checks all the URLs.

To quote Wikipedia:  A Web crawler is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion.

Other terms for Web crawlers are ants, automatic indexers, bots,[1] Web spiders,[2] Web robots,[2] or—especially in the FOAF community—Web scutters.[3]

This process is called Web crawling or spidering. Many sites, in particular search engines, use spidering as a means of providing up-to-date data. Web crawlers are mainly used to create a copy of all the visited pages for later processing by a search engine that will index the downloaded pages to provide fast searches.

If your browser started searching the Internet for your search term when you submitted the term, it might take forever or even longer to search everything.  That is not something I would sit still for.

However, after the data is placed in the database, how does your browser access that data?  Now we get into a popularity contest.  The browser does not display the data by when it was compiled, or by what country it came from, or if it won the Super Bowl.  The data is displayed by relevance.  That translates to how popular it is —- how many times it has been accessed.

Now we have a problem.  If I look for “black horse”, and click on a web site, the activity is registered as a hit.  If you look for “black horse”, and go to the website, it is also registered as a hit.  Those hits make it popular, and the location will appear higher in the search.  Perhaps even on the first page.

So, it’s easy.  Get all your friends to search for “black horse” and click the same link.  That link will soon be on the first page.

What if I put a link to “black horse” on The Book Reviewers Club blog, and then you click the link I placed on the blog?   The browser thinks you might be “spamming”, and shuffles the popularity numbers around.  The results get watered down a little., and my link to the “black horse” moves down the page as do the original search results.

The best recommendation is do not copy the verbiage of anyone’s link.  If you want to refer to this blog posting, do not copy the title and paste it into to the title of your post in your blog.   And, if you take a look at the top of the blog, there is a copy write declaration.  This post is (c) 2013 by The Book Reviewers Club.  If you would like to refer to this blog, paraphrase the contents, use a new title that reflects your own words, and then link to my blog on the interior of your blog.

The advent of blogging, tweeting, and posting on social media has created an extensive grey area.  There are many good things floating around the Internet.  Once in a while, however, something or someone you thought was friendly might jump up and bite you in a place you would rather not have teeth marks.

Recent Posts

See All

Comments


©2019 by R. Clint Peters, Author. Proudly created with Wix.com

bottom of page