The Tweaking Bee

Have you seen real life baby spiders burst out and crawl about after they break out of their mother's egg all at once? Because myriads of these little creatures try to get out of the egg, the scene would look as if grains of sand are rolling off a pane.

But hey, this post is not really about that air breathing arthropod with eight legs that could be found in your dusty unused rooms or neglected corners. And it's not about a man calle Peter Parker either. I'll introduce you to the other kind of spider, one that's been crawling the pages of the net eversince the world wide web began.

Medabots. Autobots. Spiderbots?

This kind of spider that I'm talking about is actually a simple but clever computer program. It's a software that has the ability to automatically browse the web to update pertinent data stored as a form of tracking. Spiders, spiderbots, or web crawlers are the lifeblood of search engines for data collection. Google has Googlebot. Yahoo has Yahoo! Slurp. MSN has MSNbot. All of these are man-made crawlers that are harvesters of information around the net.

Crawling the Web

Still wondering how these spiderbots work?

This type of software continually delves into the web for any update ofpages and newly created pages. It follows links from the main page and accumulates data for every page that it comes across. It stores those site information in the index or databank of the search engines.

Generally, spiders automatically search the web for pages, but there is also a way to stop spiders from crawling your site.

How do spiders work during a search engine query?

Imagine again, the scenario that I mentioned earlier about the rupturing of the spider egg and the breaking free of the little spiders all at once like pouring grains of sand. The same movement applies to spiderbots, but in this case, the egg that holds those crawlers is the search engine itself.

Everything starts with you, the user. Everytime you visit a search engine site like Google, Yahoo, or MSN, you type in a query, keywords, or phrase inside the search engine box. The release of the spiderbots is triggered when you hit he search button or symbol. This is essentially the start of the crawling process. By clicking the search button, the spiderbots burst out all at once and crawl and scan the saved data from the index of the search engines.

The spiders then sort out the pages that are relevant to your search through a special algorithm process unique to each of the search engines.

This "crawling" or "spidering" takes only a fraction of a second to happen and after that, you now have your search results ranked on the results page of the search engine. ^_^

The Tweaking Bee

Pages

Thursday, June 17, 2010

New-Bee Guide to Spiders: Know What's Crawling Your Internet Page

No comments:

Post a Comment