I haven’t seen this mentioned anywhere yet, but it looks as though Google is now providing exact numbers in their Crawl Stats report in Google Webmaster Tools:
I’m not sure how accurate these numbers are, but since they don’t appear to be rounded, I’m going to assume that they’re fairly accurate.
So far Google has not provided an export functionality for this data, but based on their history of what they did with indexation data, I’m guessing it’s only a matter of time before you can export this data from GWT.
If you don’t want to wait for Google, however, you’re in luck. Back when Google first added numbers to their Indexation report, I created a little script that would extract the data from the report, and format it nicely for export into Excel. Today I retooled the script so that it works with the new Crawl Report, and you can find it below.
Instructions
To run the script, go into Health > Crawl Stats in GWT and view the source code for the page. Select all the text using command-A, command-C (control if you’re on a PC), paste it into the form below, click the button, and voila, instantly formatted data! Enjoy!
Very clever hack, thanks for sharing!
This is really smart and simple. I really like it…thank you very much for this little script, great stuff.
Would be nice if google would provide which pages did actually get crawled so we could try to tackle some internal problems.
Thanks
That would nice, we’re stuck to digging through our log files for now.
Thanks for sharing Takeshi. Very interesting to see how frequently Googlebot crawls a website compared with how often it provides a cached version in the SERPs.