Robots txt SEO

How my site got indexed first.

I started a large blog usmleqb.com, about 500+ posts everyday.
As my website was new i was eager to see it in google and within three days 47 pages got indexed (checked by search query site:usmleqb.com in google ) while i had posted about 1500 posts in these 3 days. Thinking of indexing it quickly i signed up in webmaster tools (Google). I submitted the sitemaps of my blogger blog by atom.xlm method.

atom.xml?redirect=false&start-index=1&max-results=500

How webmaster tools worked.

As i submitted about 4 atom.xlm sitemaps to webmaster tools each of them containing 500 pages.

webmaster tools

next day displayed me out of these 1867 pages 34 are indexed, but when i did site:usmleqb.com query i found 107 pages indexed in real.

How my site got unindexed.

the next next day 210 pages were indexed according to webmaster tools while google was showing 11 posts lolzzz, Actually all that site got unindexed.

WHAT I THOUGHT ABOUT UNINDEXING OF THE SITE.

may be beacuse of too much posts in new site marked the site as a spam in google bot's eyes.
or submitting sitemap caused unindexed.
or may be there was problem with site content, template etc.

but the actual problem was something else.

Also i submitted this problem to google here.

BLOGGER ROBOTS TXT DEFAULT SITEMAP PROBLEM.

USMLE QUESTION BANK
today i found the problem and tried to fix.
This was actually blogger fault, as my site get 300+ posts daily but blogger robot.txt only allow to crawl 26 daily.

User-agent: Mediapartners-Google
Disallow: 
User-agent: *
Disallow: /search
Allow: /
Sitemap: http://www.usmleqb.com/feeds/posts/default?orderby=UPDATED

as you can see in sitemap order was according to update (which show only 26 latest).
while this time i submitted other sitemaps in webmaster tools which led to confliction (first google quickly indexed about 130 pages and traffic jumped after that beacuse of robot.txt sitemap pages quickly got unindexed after that day daily only 20 pages get indexed.)

fix for default blogger robots txt sitemap.

i updated the robot.txt as follow.

User-agent: Mediapartners-Google
Disallow: 
User-agent: *
Disallow: /search

Allow: /

Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=1&max-results=500
Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=501&max-results=1000
Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=1001&max-results=1500
Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=1501&max-results=2000
Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=2001&max-results=2500
Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=2501&max-results=3000

lets see the result tommorow.

Result of `atom.xml sitemap in robots txt file.`

OOOPss next morning i found just 67 pages indexed out of 3104 pages.

however there was some improvement in indexing. Because yesterday after every 20 posts i used to ping Google but this rate was not sufficient for a fast a fast and furious impatient blogger.

`How i fixed this slow crawl rate of a large blog ?`

this is how what i finally did to quickly bring my large blogger blog to Google index.

I found the solution of this problem coincidently while i was trying to fetch Google bot to my blogger label pages.

http://www.usmleqb.com/search/label/Anatomy

and i got denied by robot.txt error.

Denied by robots.txt

i immediately understood what was the problem behind whole story.

as i can see all pages with address /search were
getting this error and the problem was with robots.tx.

ROBOTS.TXT /SEARCH PROBLEM IN BLOGGER - large blogs.

In my blog Google bot need to use /search to crawl through the speedly grown big blog (to go through older post newer post link in navigation )

While in robots txt it was not allowed

see default robots.txt og blogger.

User-agent: Mediapartners-Google
Disallow: 
User-agent: *
Disallow: /search
Allow: /
Sitemap: http://www.usmleqb.com/feeds/posts/default?orderby=UPDATED

This /search was the problem along with this small sitemap.

How i finally fixed blogger index problem caused by /search.

Of course i have to remove this /search story of here, bing frustrated i deleted everything of robots.txt (you need not to do so.) Now my blog was very much open for any search engine to crawl as there was no restriction.

but you may just update your robots.txt to like this.

User-agent: Mediapartners-Google
Disallow: 
User-agent: *
Disallow:
Allow: /

However if you want you can add atom.xlm sitemaps also like this one.

User-agent: Mediapartners-Google
Disallow: 
User-agent: *
Disallow:

Allow: /

Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=1&max-results=500
Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=501&max-results=1000
Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=1001&max-results=1500
Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=1501&max-results=2000
Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=2001&max-results=2500
Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=2501&max-results=3000

When google try to crawl your site next time it will download the robots.tx and thus open signal and the big perfect sitemap will help you to get crawled quickly. However i forced google to quickly crawl my content.

`How i forced google to quickly crawl my large blog.`

as my blog was large enough to crawl automatically, slowly, auto crawl etc.

i planned to get indexed quickly. So i requested fetch as google bot to fetch my sitemaps (6 sitemaps) and my next previuos pages by using

&max-results=276&start=479 in post list query search?updated-max=2013-07-16T12:08:00-07:00&max-results=276&start=479&by-date=false

and then submitted urls and linked urls to the google index.
now here there was still a small problem.

Denied by robots.txt error still coming even after robots.txt file update.

as google still hevent downloaded the new robots.txt this problem was coming and i cant wait to google download it so i tried another trick.

i submitted fetch for http://www.usmleqb.com//search?updated-max=2013-07-16T12:08:00-07:00&max-results=276&start=479&by-date=false

instead of http://www.usmleqb.com/search?updated-max=2013-07-16T12:08:00-07:00&max-results=276&start=479&by-date=false

this way the robots.txt denied error will not come again. (Ucan use this trick also if you can not alter the robots.tx)

How 1000+ pages got crawled in 12 hours.

After all this i was sure that google is going to crawl me now.

And after 24 hours when i did site:usmleqb.com query in google.

I was very happy to see again my site rocking (at least pages got indexed.)

All this was because of fetch as google bot submission.

Now let me report this solution to webmaster product forum.

YOUR ADVICE WILL RESPECTED.

By - Pramvir Rtahee

Robots txt SEO

Sunday, January 4, 2015

test

Saturday, July 20, 2013

Blog get indexed and then unindexed | Blogger Robots.txt weakness | Blogger seo

How my site got indexed first.

How webmaster tools worked.

How my site got unindexed.

WHAT I THOUGHT ABOUT UNINDEXING OF THE SITE.

BLOGGER ROBOTS TXT DEFAULT SITEMAP PROBLEM.

fix for default blogger robots txt sitemap.

Result of `atom.xml sitemap in robots txt file.`

`How i fixed this slow crawl rate of a large blog ?`

ROBOTS.TXT /SEARCH PROBLEM IN BLOGGER - large blogs.

How i finally fixed blogger index problem caused by /search.

`How i forced google to quickly crawl my large blog.`

Denied by robots.txt error still coming even after robots.txt file update.

How 1000+ pages got crawled in 12 hours.

Sunday, January 4, 2015

test

Saturday, July 20, 2013

Blog get indexed and then unindexed | Blogger Robots.txt weakness | Blogger seo

How my site got indexed first.

How webmaster tools worked.

How my site got unindexed.

WHAT I THOUGHT ABOUT UNINDEXING OF THE SITE.

BLOGGER ROBOTS TXT DEFAULT SITEMAP PROBLEM.

fix for default blogger robots txt sitemap.

Result of atom.xml sitemap in robots txt file.

How i fixed this slow crawl rate of a large blog ?

ROBOTS.TXT /SEARCH PROBLEM IN BLOGGER - large blogs.

How i finally fixed blogger index problem caused by /search.

How i forced google to quickly crawl my large blog.

Denied by robots.txt error still coming even after robots.txt file update.

How 1000+ pages got crawled in 12 hours.

Result of `atom.xml sitemap in robots txt file.`

`How i fixed this slow crawl rate of a large blog ?`

`How i forced google to quickly crawl my large blog.`