robots.txt - what is it?

For requests for help from CG administrators, Wranglers, and experienced CG members. Please read the FAQ before posting. Also look at CG Wiki for tutorials and how-tos written by other CG webtoonists.
Post Reply
Josh IV
Newbie
Posts: 16
Joined: Fri Jun 27, 2003 3:42 am

robots.txt - what is it?

Post by Josh IV »

it popped up around the time the custom 404 and 403 appeared but I'm wondering what does it do?
Image

User avatar
VileTerror
Anti-Villain
Posts: 3437
Joined: Wed Sep 17, 2003 11:16 am
Location: n. 1 a place where something is located. 2 the action of location. - DERIVATIVES locational adj.
Contact:

Hmm . . .

Post by VileTerror »

As far as I can tell: it disallows 'bots.

What that means . . . I have no freakin' idea.
Haughty spirit and pride make for a wild roller coaster ride!
I mean, as long as you like fairly final endings.

User avatar
Johndisko
Regular Poster
Posts: 207
Joined: Thu Oct 24, 2002 12:00 pm
Contact:

Post by Johndisko »

the bots to which it refers are (according to my belief) spider bots that scan your webpage for email addresses to send spam and stuff. usually used by metasearchers and stuff

i think

-jd
im the "j" "o" to the "h" "n" and i cant even spell the rest. it takes too long and i need a friggin cigarette.

User avatar
VileTerror
Anti-Villain
Posts: 3437
Joined: Wed Sep 17, 2003 11:16 am
Location: n. 1 a place where something is located. 2 the action of location. - DERIVATIVES locational adj.
Contact:

COOL!

Post by VileTerror »

In that case: I'm going to allow them!
Haughty spirit and pride make for a wild roller coaster ride!
I mean, as long as you like fairly final endings.

User avatar
Prism
Regular Poster
Posts: 125
Joined: Fri Jan 01, 1999 4:00 pm

Post by Prism »

johndisko wrote:the bots to which it refers are (according to my belief) spider bots that scan your webpage for email addresses to send spam and stuff. usually used by metasearchers and stuff

i think
You think correctly; it indeed is referering to spiders that craw over your site and all the pages contained within (via links from various pages, like the front page), though not nessesarly for harvesting email addresses. Usually it's for search engines.
In fact, 'malicious' spiders for that purpose probably ignore any robots.txt file and just do whatever they want as far as looking at your site~

User avatar
VileTerror
Anti-Villain
Posts: 3437
Joined: Wed Sep 17, 2003 11:16 am
Location: n. 1 a place where something is located. 2 the action of location. - DERIVATIVES locational adj.
Contact:

Niiiiiiiice.

Post by VileTerror »

That means I should put a list of every hotmail account which belongs to someone I dislike on my page. Should be nice and effective.
Haughty spirit and pride make for a wild roller coaster ride!
I mean, as long as you like fairly final endings.

Kyouryuu
Regular Poster
Posts: 29
Joined: Sat Jul 12, 2003 2:08 pm
Contact:

Post by Kyouryuu »

Heh.

Yeah, as summarized, robots.txt is a special addition you can put in your main FTP directory that tells legitimate robots what to do if they happen upon your site. Everyday, the Internet is crawled over and traversed with robots, little automated programs that recursively blast through those billions of pages in search of new content. robots.txt can be used, for example, to tell a search engine's robot not to index your site.

The operative phrase here is "legitimate." As said, malicious robots will completely ignore robots.txt. ;)

User avatar
Johndisko
Regular Poster
Posts: 207
Joined: Thu Oct 24, 2002 12:00 pm
Contact:

Post by Johndisko »

holy shit! all that book learnin actually paid off :)

im not bein sarcastic. i was making an educated guess, but a guess nonetheless :) so thank you all

id like to dedicate this award to... *fades*

-jd
im the "j" "o" to the "h" "n" and i cant even spell the rest. it takes too long and i need a friggin cigarette.

Post Reply