Nokogiri

  1. parse HTML & XML.

  2. use XPath or CSS3 selectors to extract data.

screen-scraping

  1. reading data from a computer terminal's screen.

  2. extracts data from human-readable output from another program.

Ruby

# /lib/tasks/product_prices.rake
require 'rubygems'
require 'nokogiri'
require 'open-uri' # to get the contents of a URL 

url = "http://www.walmart.com/search/search-ng.do?search_constraint=0&ic=48_0&search_query=Batman&Find.x=0&Find.y=0&Find=Find"
doc = Nokogiri::HTML(open(url))
doc.css(".item").each do |item|
  text = item.at_css(".prodLink").text
  price = item.at_css(".PriceXLBold, .PriceCompare .BodyS").text[/\$[0-9\.]+/]
  puts "#{text} - #{price}"
end

Rails

http://railscasts.com/episodes/190-screen-scraping-with-nokogiri?view=asciicast

Last updated

Was this helpful?