Copyright © 2015. ❺❺❺. BUY MEDICINE AT CHEAP PRICE & SIDE EFFECTS.
Please Do not Use any Part of this Page for medical or prescription purpose, this is just for information purpose, do not try to diagnose or treat any disease using information here, site only provide prices of different medicines in different countries. Information here is not guaranteed 100% correct, and must not be used for any medical or medical condition related purpose.
Robots txt SEO
My seo, crawling, indexing and robots.txt experience with blogger.
Saturday, July 20, 2013
Blog get indexed and then unindexed | Blogger Robots.txt weakness | Blogger seo
How my site got indexed first.
I started a large blog usmleqb.com, about 500+ posts everyday.As my website was new i was eager to see it in google and within three days 47 pages got indexed (checked by search query site:usmleqb.com in google ) while i had posted about 1500 posts in these 3 days. Thinking of indexing it quickly i signed up in webmaster tools (Google). I submitted the sitemaps of my blogger blog by atom.xlm method.
atom.xml?redirect=false&start-index=1&max-results=500
How webmaster tools worked.
As i submitted about 4 atom.xlm sitemaps to webmaster tools each of them containing 500 pages.
webmaster tools
- next day displayed me out of these 1867 pages 34 are indexed, but when i did site:usmleqb.com query i found 107 pages indexed in real.
How my site got unindexed.
- the next next day 210 pages were indexed according to webmaster tools while google was showing 11 posts lolzzz, Actually all that site got unindexed.
WHAT I THOUGHT ABOUT UNINDEXING OF THE SITE.
- may be beacuse of too much posts in new site marked the site as a spam in google bot's eyes.
- or submitting sitemap caused unindexed.
- or may be there was problem with site content, template etc.
Also i submitted this problem to google here.
BLOGGER ROBOTS TXT DEFAULT SITEMAP PROBLEM.
USMLE QUESTION BANKtoday i found the problem and tried to fix.
This was actually blogger fault, as my site get 300+ posts daily but blogger robot.txt only allow to crawl 26 daily.
User-agent: Mediapartners-Google
Disallow:
User-agent: *
Disallow: /search
Allow: /
Sitemap: http://www.usmleqb.com/feeds/posts/default?orderby=UPDATED
while this time i submitted other sitemaps in webmaster tools which led to confliction (first google quickly indexed about 130 pages and traffic jumped after that beacuse of robot.txt sitemap pages quickly got unindexed after that day daily only 20 pages get indexed.)
fix for default blogger robots txt sitemap.
i updated the robot.txt as follow.
User-agent: Mediapartners-Google Disallow: User-agent: * Disallow:
/search
Allow: / Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=1&max-results=500 Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=501&max-results=1000 Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=1001&max-results=1500 Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=1501&max-results=2000 Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=2001&max-results=2500 Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=2501&max-results=3000
lets see the result tommorow.
Result of atom.xml sitemap in robots txt file.
OOOPss next morning i found just 67 pages indexed out of 3104 pages.
however there was some improvement in indexing. Because yesterday after every 20 posts i used to ping Google but this rate was not sufficient for a fast a fast and furious impatient blogger.
How i fixed this slow crawl rate of a large blog ?
this is how what i finally did to quickly bring my large blogger blog to Google index.
I found the solution of this problem coincidently while i was trying to fetch Google bot to my blogger label pages.
http://www.usmleqb.com/search/label/Anatomy
and i got denied by robot.txt error.
Denied by robots.txt |
i immediately understood what was the problem behind whole story.
as i can see all pages with address /search were
getting this error and the problem was with robots.tx.
getting this error and the problem was with robots.tx.
ROBOTS.TXT /SEARCH PROBLEM IN BLOGGER - large blogs.
In my blog Google bot need to use /search to crawl through the speedly grown big blog (to go through older post newer post link in navigation )
While in robots txt it was not allowed
see default robots.txt og blogger.
User-agent: Mediapartners-Google
Disallow:
User-agent: *
Disallow: /search
Allow: /
Sitemap: http://www.usmleqb.com/feeds/posts/default?orderby=UPDATED
This /search was the problem along with this small sitemap.How i finally fixed blogger index problem caused by /search.
Of course i have to remove this /search story of here, bing frustrated i deleted everything of robots.txt (you need not to do so.) Now my blog was very much open for any search engine to crawl as there was no restriction.
but you may just update your robots.txt to like this.
User-agent: Mediapartners-Google
Disallow:
User-agent: *
Disallow:
Allow: /
However if you want you can add atom.xlm sitemaps also like this one.
User-agent: Mediapartners-Google
Disallow:
User-agent: *
Disallow:
Allow: /
Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=1&max-results=500
Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=501&max-results=1000
Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=1001&max-results=1500
Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=1501&max-results=2000
Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=2001&max-results=2500
Sitemap: http://www.usmleqb.com/atom.xml?redirect=false&start-index=2501&max-results=3000
When google try to crawl your site next time it will download the robots.tx and thus open signal and the big perfect sitemap will help you to get crawled quickly. However i forced google to quickly crawl my content.
How i forced google to quickly crawl my large blog.
as my blog was large enough to crawl automatically, slowly, auto crawl etc.
i planned to get indexed quickly. So i requested fetch as google bot to fetch my sitemaps (6 sitemaps) and my next previuos pages by using
&
seaand then submitted urls and linked urls to the google index.
now here there was still a small problem.
Denied by robots.txt error still coming even after robots.txt file update.
as google still hevent downloaded the new robots.txt this problem was coming and i cant wait to google download it so i tried another trick.
i submitted fetch for htt p:/ /ww w.u sml eqb .co m// sea rch ?up dat ed- max =20 13- 07- 16T 12: 08: 00- 07: 00& max -re sul ts= 276 &st art =47 9&b y-d ate =fa lse
this way the robots.txt denied error will not come again. (Ucan use this trick also if you can not alter the robots.tx)
How 1000+ pages got crawled in 12 hours.
After all this i was sure that google is going to crawl me now.
And after 24 hours when i did site:usmleqb.com query in google.
I was very happy to see again my site rocking (at least pages got indexed.)
All this was because of fetch as google bot submission.
Now let me report this solution to webmaster product forum.
YOUR ADVICE WILL RESPECTED.
By - Pramvir Rtahee