PHWinfo banniere

Titres
PORTAIL ANNUAIRE ARTICLES COMPARATEUR HÉBERGEURS DEVIS FORUMS RÉDUCTEUR D'URL
Précédent   PHWinfo > Autres forums > Forum Programmation & Conception > alt.www.webmaster > robots.txt - Question
S'inscrire FAQ Membres Recherche Messages du jour Marquer les forums comme lus
robots.txt - Question

Réponse
 
LinkBack Outils de la discussion
Vieux 21/10/2007, 19h42   #1
khabri
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut robots.txt - Question

Hello:

I read about robots exclusion here http://www.robotstxt.org/wc/exclusion.html

I am wondering how are search engine bots implemented. Lets assume, I
have Disallow: /foobar in the robots.txt. On the main page of my site,
I link to content say /foobar/pictures.html

So will the search engine bot index /foobar/pictures.html or not ? If
not, does it mean that during the entire period of crawling, it
maintains the information that it has read in robots.txt ?

Thank you for your time.

  Réponse avec citation
Vieux 21/10/2007, 20h39   #2
Mark Goodge
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: robots.txt - Question

On Sun, 21 Oct 2007 18:42:41 -0000, khabri put finger to keyboard and
typed:

>Hello:
>
>I read about robots exclusion here http://www.robotstxt.org/wc/exclusion.html
>
>I am wondering how are search engine bots implemented. Lets assume, I
>have Disallow: /foobar in the robots.txt. On the main page of my site,
>I link to content say /foobar/pictures.html
>
>So will the search engine bot index /foobar/pictures.html or not ?


It won't, if it correctly follows the standards.

>If
>not, does it mean that during the entire period of crawling, it
>maintains the information that it has read in robots.txt ?


It should cache the contents of robots.txt at the start of every crawl
and obey it thereafter, until it next checks it.

Mark
--
http://www.BritishSurnames.co.uk - What does your surname say about you?
"All I want is to find an easier way to get out of our little heads"
  Réponse avec citation
Vieux 22/10/2007, 23h53   #3
Nikita the Spider
Aucun Avatar
 
Messages: n/a
Hébergeur:
Par défaut Re: robots.txt - Question

In article <1192992161.373908.214400@y27g2000pre.googlegroups .com>,
khabri <khabri@gmail.com> wrote:

> Hello:
>
> I read about robots exclusion here http://www.robotstxt.org/wc/exclusion.html
>
> I am wondering how are search engine bots implemented. Lets assume, I
> have Disallow: /foobar in the robots.txt. On the main page of my site,
> I link to content say /foobar/pictures.html
>
> So will the search engine bot index /foobar/pictures.html or not ?


Mark Goodge is correct; correctly programmed bots should not access that
file.

> If
> not, does it mean that during the entire period of crawling, it
> maintains the information that it has read in robots.txt ?


It's up to the bot how often they re-read your robots.txt file. Note
that you can send an 'Expires' header along with your robots.txt file
and it *should* be respected. (No guarantees, though!)

Good luck

--
Philip
http://NikitaTheSpider.com/
Whole-site HTML validation, link checking and more
  Réponse avec citation
Réponse


Outils de la discussion

Règles de messages
Vous ne pouvez pas créer de nouvelles discussions
Vous ne pouvez pas envoyer des réponses
Vous ne pouvez pas envoyer des pièces jointes
Vous ne pouvez pas modifier vos messages

Les balises BB sont activées : oui
Les smileys sont activés : oui
La balise [IMG] est activée : oui
Le code HTML peut être employé : non
Trackbacks are oui
Pingbacks are oui
Refbacks are oui


Fuseau horaire GMT +1. Il est actuellement 00h35.


Édité par : vBulletin® version 3.7.3
Copyright ©2000 - 2008, Jelsoft Enterprises Ltd.
Search Engine Friendly URLs by vBSEO 3.2.0 RC5 Tous droits réservés.
Version française #16 par l'association vBulletin francophone
PHWinfo est un site Éducation Sans Frontières ©2000-2008
Ad Management by RedTyger
©Tous droits réservés par les parties respectives
Page generated in 0,08703 seconds with 11 queries