How search engines find my site?

How search engines find my site?

Google-Crawl-Index

Crawler OR SPIDERS: 

A Web crawler is a program that visits the site and read the ‘robots.txt’ to know which webpages are allowed for indexing . It is typically for the purpose of Web indexing. A Web crawler may also be called a Web spider, an ant, an automatic indexer, bot. Web search engines use Web crawlers or bots to update their web content or indexes of others site’s web content. Web crawlers can copy all the pages they visit for later processing by a search engine that indexes the downloaded pages so that users can search them much more quickly.

googleCrawler

What is Robots.txt?

Each server can have a file called robots.txt, containing rules for the spidering of that server that the bot (Googlebot) is supposed to obey or be removed. Robots.txt file contains the rules that tells the server which file should indexed or not. Mostly all major Search Engines detects the robots.txt file and reads its rule. There are some search engines that may ignore the contents of robots.txt and crawl blocked areas anyway, It will allow entire site for indexing. So if your site contains some sensitive and confidential data, It better to make it Password Protected, rather than relying on Robots.txt.  Here are rules to create robots.txt file . The Robots.txt Generator in Webmaster Tools. It’s designed to give you an easy and interactive way to build a robots.txt file. It can be as simple as entering the files and directories you don’t want crawled by any robots.  You can create your file on your server with name ‘robots.txt’ and write your rules for your site indexing. Place your robots.txt file in top of your directory, say,  jasleenkaur.com/robots.txt  . If you place it in a subdirectory, then its useless.

Robots.txt file of WordPress:

There is option in wordpress blog, where you can Allow search engines to index this site.
Dashboard > Settings > Reading > Site Visibility > (select the option: Allow search engines to index this site). If you have selected this Option: Discourage search engines from indexing this site,  It will disallow the searching of your entire site by search engine.

OMG! Google can’t find my blog

Goto http://ismyblogworking.com/ and check is your site working It will check your robots.txt file and read the rules of that file and will tell you whether Googlebot or other search engine indexes your sites. It will tell you about crawling by bots.

If your site contains blocked webpages then it will warn you about ranking of your site. By default Googlebot index your entire site.

visibility-off

You may use Google Webmaster Tool to verify your site. Webmaster Tools helps you understand exactly how your site appears to Googlebot

Get your site verified with webmaster tool

WordPress.com provides you with built-in stats that give you lots of information about your traffic, but if you’re a stats junky and you can’t get enough info about how people are finding your site, some search engines and social sites offer additional “webmaster tools.”

Read more at :  http://en.support.wordpress.com/webmaster-tools/

 

Reference:

[1] http://googlewebmastercentral.blogspot.in/2008/03/speaking-language-of-robots.html
[2] https://support.google.com/webmasters/answer/70897?hl=en
[3] http://onecoolsitebloggingtips.com/2010/01/21/omg-i-cant-find-my-blog-on-google/
Advertisements

About Jasleen Kaur

Hi, I am Jasleen kaur, a Computer Science Student.
This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s