Sunday, January 4, 2015

test

Saturday, July 20, 2013

Blog get indexed and then unindexed | Blogger Robots.txt weakness | Blogger seo


How my site got indexed first.

I started a large blog usmleqb.com, about 500+ posts everyday.
As my website was new i was eager to see it in google and within three days 47 pages got indexed (checked by search query site:usmleqb.com in google ) while i had posted about 1500 posts in these 3 days. Thinking of indexing it quickly i signed up in webmaster tools (Google). I submitted the sitemaps of my blogger blog by atom.xlm method. 


atom.xml?redirect=false&start-index=1&max-results=500 

How webmaster tools worked.

As i submitted about 4 atom.xlm sitemaps to webmaster tools each of them containing 500 pages.
webmaster tools 
  • next day displayed me out of these 1867 pages 34 are indexed, but when i did site:usmleqb.com query i found 107 pages indexed in real.

How my site got unindexed.

  • the next next day 210 pages were indexed according to webmaster tools while google was showing 11 posts lolzzz, Actually all that site got unindexed.

WHAT I THOUGHT ABOUT UNINDEXING OF THE SITE.

  1. may be beacuse of too much posts in new site marked the site as a spam in google bot's eyes.
  2. or submitting sitemap caused unindexed.
  3. or may be there was problem with site content, template etc.
but the actual problem was something else.
    Also i submitted this problem to google here.

    BLOGGER ROBOTS TXT DEFAULT SITEMAP PROBLEM.

    USMLE QUESTION BANK
    today i found the problem and tried to fix.
    This was actually blogger fault, as my site get 300+ posts daily but blogger robot.txt only allow to crawl 26 daily.



    User-agent: Mediapartners-Google
    Disallow: 
    User-agent: *
    Disallow: /search
    Allow: /
    Sitemap: http://www.usmleqb.com/feeds/posts/default?orderby=UPDATED
    
    
    
    
    
    
    as you can see in sitemap order was according to update (which show only 26 latest).
    while this time i submitted other sitemaps in webmaster tools  which led to confliction (first google quickly indexed about 130 pages and traffic jumped after that beacuse of robot.txt sitemap pages quickly got unindexed after that day daily only 20 pages get indexed.)

    fix for default blogger robots txt sitemap.

    i updated the robot.txt as follow.

    User-agent: Mediapartners-Google
    Disallow: 
    User-agent: *
    Disallow: /search
    
    Allow: /
    
    Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=1&max-results=500
    Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=501&max-results=1000
    Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=1001&max-results=1500
    Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=1501&max-results=2000
    Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=2001&max-results=2500
    Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=2501&max-results=3000

    lets see the result tommorow.

    Result of atom.xml sitemap in robots txt file.

    OOOPss next morning i found just 67 pages indexed out of 3104 pages.
    however there was some improvement in indexing.  Because yesterday after every 20 posts i used to ping  Google  but this rate was not sufficient for a fast a fast and  furious impatient blogger.

    How i fixed this slow crawl rate of a large blog ?

    this is how what i finally did to quickly bring my large blogger blog to  Google  index.
    I found the solution of this problem  coincidently  while i was trying to fetch  Google  bot to my blogger label pages.

    http://www.usmleqb.com/search/label/Anatomy
    and i got denied by robot.txt error.
    Denied by robots.txt
    i  immediately  understood what was the problem behind whole story.
    as i can see all pages with   address /search were
    getting this error and the problem was with robots.tx.

    ROBOTS.TXT /SEARCH PROBLEM IN BLOGGER - large blogs.

    In my blog  Google  bot need to use /search to crawl through the speedly  grown  big blog (to go through older post  newer post link in navigation )
    While in robots txt it was not allowed
    see default robots.txt og blogger.


    User-agent: Mediapartners-Google
    Disallow: 
    User-agent: *
    Disallow: /search
    Allow: /
    Sitemap: http://www.usmleqb.com/feeds/posts/default?orderby=UPDATED
    
    
    
    
    This /search was the problem along with this small sitemap.

    How i finally fixed blogger index problem caused by /search.

    Of course i have to remove this /search story of here, bing frustrated i deleted everything of robots.txt (you need not to do so.) Now my blog was very much open for any search engine to crawl as there was no restriction.
    but you may just update your robots.txt to like this.

    User-agent: Mediapartners-Google
    Disallow: 
    User-agent: *
    Disallow:
    Allow: /
    


    However if you want you can add atom.xlm sitemaps also like this one.


    User-agent: Mediapartners-Google
    Disallow: 
    User-agent: *
    Disallow:
    
    Allow: /
    
    Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=1&max-results=500
    Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=501&max-results=1000
    Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=1001&max-results=1500
    Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=1501&max-results=2000
    Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=2001&max-results=2500
    Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=2501&max-results=3000


    When google try to crawl your site next time it will download the robots.tx and thus open signal and the big perfect sitemap will help you to get crawled quickly. However i forced google to quickly crawl my content.


    How i forced google to quickly crawl my large blog.

    as my blog was large enough to crawl automatically, slowly, auto crawl etc.
    i planned to get indexed quickly. So i requested fetch as google bot to fetch my sitemaps (6 sitemaps) and my next previuos pages by using &max-results=276&start=479 in post list query  search?updated-max=2013-07-16T12:08:00-07:00&max-results=276&start=479&by-date=false
    and then submitted urls and linked urls to the google index.
    now here there was still a small problem.

    Denied by robots.txt error still coming even after robots.txt file update.

    as google still hevent downloaded the new robots.txt this problem was coming and i cant wait to google download it so i tried another trick.
    this way the robots.txt denied error will not come again. (Ucan use this trick also if you can not alter the robots.tx)

    How 1000+ pages got crawled in 12 hours.

    After all this i was sure that google is going to crawl me now.
    And after 24 hours when i did site:usmleqb.com query in google.
    I was very happy to see again my site rocking (at least pages got indexed.)
    All this was  because  of fetch as google bot submission.
    Now let me report this solution to webmaster product forum.

    YOUR ADVICE WILL RESPECTED.

    By - Pramvir Rtahee