<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Ruby Web Crawler</title>
	<atom:link href="http://blog.netphase.com/2007/04/19/ruby-web-crawler/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.netphase.com/2007/04/19/ruby-web-crawler/</link>
	<description>for a connected world</description>
	<lastBuildDate>Thu, 21 Jan 2010 13:34:49 -0500</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: Andre Durao</title>
		<link>http://blog.netphase.com/2007/04/19/ruby-web-crawler/comment-page-1/#comment-1245</link>
		<dc:creator>Andre Durao</dc:creator>
		<pubDate>Thu, 21 Jan 2010 13:34:49 +0000</pubDate>
		<guid isPermaLink="false">http://blog.netphase.com/2007/04/19/ruby-web-crawler/#comment-1245</guid>
		<description>thanks for that!
Helped a lot.</description>
		<content:encoded><![CDATA[<p>thanks for that!<br />
Helped a lot.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nik</title>
		<link>http://blog.netphase.com/2007/04/19/ruby-web-crawler/comment-page-1/#comment-1204</link>
		<dc:creator>Nik</dc:creator>
		<pubDate>Tue, 30 Jun 2009 20:02:22 +0000</pubDate>
		<guid isPermaLink="false">http://blog.netphase.com/2007/04/19/ruby-web-crawler/#comment-1204</guid>
		<description>Hello, thanks for this script. I have been wanting to experiment with making my own web indexes for a few websites. --

Do you happen to know that, now say we have the crawler, how we come up with a list of websites for it to crawl on if we were to index the whole world wide web. Should we run through all ips from 0.0.0.0 to 255.255.255.255 ?

Or is it that the idea is that you give one seed website, a rather big one, and let the spider go link after link until hopefully it includes all webpages? or fail somewhere and try to find another one seed website?

Thanks!</description>
		<content:encoded><![CDATA[<p>Hello, thanks for this script. I have been wanting to experiment with making my own web indexes for a few websites. &#8211;</p>
<p>Do you happen to know that, now say we have the crawler, how we come up with a list of websites for it to crawl on if we were to index the whole world wide web. Should we run through all ips from 0.0.0.0 to 255.255.255.255 ?</p>
<p>Or is it that the idea is that you give one seed website, a rather big one, and let the spider go link after link until hopefully it includes all webpages? or fail somewhere and try to find another one seed website?</p>
<p>Thanks!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Manjunath</title>
		<link>http://blog.netphase.com/2007/04/19/ruby-web-crawler/comment-page-1/#comment-1198</link>
		<dc:creator>Manjunath</dc:creator>
		<pubDate>Tue, 05 May 2009 10:14:00 +0000</pubDate>
		<guid isPermaLink="false">http://blog.netphase.com/2007/04/19/ruby-web-crawler/#comment-1198</guid>
		<description>hi,  i am new to the ruby can i know this 

i run this code in SciTE when i execute i am getting this error 

updateversionofsitecrawl.rb:6:in `initialize&#039;: uninitialized constant WebCrawler::URL (NameError)
	from updateversionofsitecrawl.rb:53:in `new&#039;
	from updateversionofsitecrawl.rb:53

can u please help me out this</description>
		<content:encoded><![CDATA[<p>hi,  i am new to the ruby can i know this </p>
<p>i run this code in SciTE when i execute i am getting this error </p>
<p>updateversionofsitecrawl.rb:6:in `initialize&#8217;: uninitialized constant WebCrawler::URL (NameError)<br />
	from updateversionofsitecrawl.rb:53:in `new&#8217;<br />
	from updateversionofsitecrawl.rb:53</p>
<p>can u please help me out this</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Carsten</title>
		<link>http://blog.netphase.com/2007/04/19/ruby-web-crawler/comment-page-1/#comment-53</link>
		<dc:creator>Carsten</dc:creator>
		<pubDate>Wed, 23 Jan 2008 11:04:38 +0000</pubDate>
		<guid isPermaLink="false">http://blog.netphase.com/2007/04/19/ruby-web-crawler/#comment-53</guid>
		<description>Great code,

Will have to try it out.

Thanks man!</description>
		<content:encoded><![CDATA[<p>Great code,</p>
<p>Will have to try it out.</p>
<p>Thanks man!</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Shig</title>
		<link>http://blog.netphase.com/2007/04/19/ruby-web-crawler/comment-page-1/#comment-26</link>
		<dc:creator>Shig</dc:creator>
		<pubDate>Thu, 13 Sep 2007 15:16:38 +0000</pubDate>
		<guid isPermaLink="false">http://blog.netphase.com/2007/04/19/ruby-web-crawler/#comment-26</guid>
		<description>Sounds good. Thanks for the response.</description>
		<content:encoded><![CDATA[<p>Sounds good. Thanks for the response.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: scott</title>
		<link>http://blog.netphase.com/2007/04/19/ruby-web-crawler/comment-page-1/#comment-25</link>
		<dc:creator>scott</dc:creator>
		<pubDate>Thu, 13 Sep 2007 15:09:06 +0000</pubDate>
		<guid isPermaLink="false">http://blog.netphase.com/2007/04/19/ruby-web-crawler/#comment-25</guid>
		<description>While it would be easy to add this to a controller, it&#039;s probably not the best place for this.  One possibility would be to create an &lt;a href=&quot;http://www.railsmanual.org/class/ActiveRecord%3A%3AObserver&quot; rel=&quot;nofollow&quot;&gt;Observer&lt;/a&gt; to handle it; however, I would be more inclined to create a task (in lib/tasks) to handle the crawling and set up cron to run it periodically.  That would insure a constant level of resource usage and make it easier to process the requests serially.</description>
		<content:encoded><![CDATA[<p>While it would be easy to add this to a controller, it&#8217;s probably not the best place for this.  One possibility would be to create an <a href="http://www.railsmanual.org/class/ActiveRecord%3A%3AObserver" rel="nofollow">Observer</a> to handle it; however, I would be more inclined to create a task (in lib/tasks) to handle the crawling and set up cron to run it periodically.  That would insure a constant level of resource usage and make it easier to process the requests serially.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Shig</title>
		<link>http://blog.netphase.com/2007/04/19/ruby-web-crawler/comment-page-1/#comment-24</link>
		<dc:creator>Shig</dc:creator>
		<pubDate>Thu, 13 Sep 2007 03:46:58 +0000</pubDate>
		<guid isPermaLink="false">http://blog.netphase.com/2007/04/19/ruby-web-crawler/#comment-24</guid>
		<description>Scott - I&#039;m currently in the process of learning Ruby and RoR. Is this something that would integrate easily into an RoR app? I&#039;m wondering how this would fit into the MVC architecture.</description>
		<content:encoded><![CDATA[<p>Scott &#8211; I&#8217;m currently in the process of learning Ruby and RoR. Is this something that would integrate easily into an RoR app? I&#8217;m wondering how this would fit into the MVC architecture.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: scott</title>
		<link>http://blog.netphase.com/2007/04/19/ruby-web-crawler/comment-page-1/#comment-5</link>
		<dc:creator>scott</dc:creator>
		<pubDate>Sun, 27 May 2007 20:15:13 +0000</pubDate>
		<guid isPermaLink="false">http://blog.netphase.com/2007/04/19/ruby-web-crawler/#comment-5</guid>
		<description>You could store it wherever you like.  In this example, the contents of the page are in the page_text variable.  You can see an &lt;a href=&quot;http://rubyforge.org/snippet/detail.php?type=snippet&amp;id=155&quot; rel=&quot;nofollow&quot;&gt;updated version&lt;/a&gt; at RubyForge where I&#039;m storing the results in a hashmap.</description>
		<content:encoded><![CDATA[<p>You could store it wherever you like.  In this example, the contents of the page are in the page_text variable.  You can see an <a href="http://rubyforge.org/snippet/detail.php?type=snippet&#038;id=155" rel="nofollow">updated version</a> at RubyForge where I&#8217;m storing the results in a hashmap.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: webug</title>
		<link>http://blog.netphase.com/2007/04/19/ruby-web-crawler/comment-page-1/#comment-4</link>
		<dc:creator>webug</dc:creator>
		<pubDate>Sun, 27 May 2007 03:02:18 +0000</pubDate>
		<guid isPermaLink="false">http://blog.netphase.com/2007/04/19/ruby-web-crawler/#comment-4</guid>
		<description>dear sir:
i am a ruby beginner,i wondere in your programe above, where  do you store the download html pages ? 
thanks</description>
		<content:encoded><![CDATA[<p>dear sir:<br />
i am a ruby beginner,i wondere in your programe above, where  do you store the download html pages ?<br />
thanks</p>
]]></content:encoded>
	</item>
</channel>
</rss>
