|
|
|
#1 |
|
Messages: n/a
Hébergeur: |
Hello:
I read about robots exclusion here http://www.robotstxt.org/wc/exclusion.html I am wondering how are search engine bots implemented. Lets assume, I have Disallow: /foobar in the robots.txt. On the main page of my site, I link to content say /foobar/pictures.html So will the search engine bot index /foobar/pictures.html or not ? If not, does it mean that during the entire period of crawling, it maintains the information that it has read in robots.txt ? Thank you for your time. |
|
|
|
#2 |
|
Messages: n/a
Hébergeur: |
On Sun, 21 Oct 2007 18:42:41 -0000, khabri put finger to keyboard and
typed: >Hello: > >I read about robots exclusion here http://www.robotstxt.org/wc/exclusion.html > >I am wondering how are search engine bots implemented. Lets assume, I >have Disallow: /foobar in the robots.txt. On the main page of my site, >I link to content say /foobar/pictures.html > >So will the search engine bot index /foobar/pictures.html or not ? It won't, if it correctly follows the standards. >If >not, does it mean that during the entire period of crawling, it >maintains the information that it has read in robots.txt ? It should cache the contents of robots.txt at the start of every crawl and obey it thereafter, until it next checks it. Mark -- http://www.BritishSurnames.co.uk - What does your surname say about you? "All I want is to find an easier way to get out of our little heads" |
|
|
|
#3 |
|
Messages: n/a
Hébergeur: |
In article <1192992161.373908.214400@y27g2000pre.googlegroups .com>,
khabri <khabri@gmail.com> wrote: > Hello: > > I read about robots exclusion here http://www.robotstxt.org/wc/exclusion.html > > I am wondering how are search engine bots implemented. Lets assume, I > have Disallow: /foobar in the robots.txt. On the main page of my site, > I link to content say /foobar/pictures.html > > So will the search engine bot index /foobar/pictures.html or not ? Mark Goodge is correct; correctly programmed bots should not access that file. > If > not, does it mean that during the entire period of crawling, it > maintains the information that it has read in robots.txt ? It's up to the bot how often they re-read your robots.txt file. Note that you can send an 'Expires' header along with your robots.txt file and it *should* be respected. (No guarantees, though!) Good luck -- Philip http://NikitaTheSpider.com/ Whole-site HTML validation, link checking and more |
|
![]() |
| Outils de la discussion | |
|
|