Ruby/Rails syntax
  • Index
  • chap 1
  • chap 2
  • chap 3
  • chap 4
  • Enterprise Rails - big picture
  • Nokogiri
  • ActiveRecord - 進階功能
  • pack & unpack
  • performance
  • rails engine
  • jsonb / json / hstore
  • Deploy
  • Polymorphism/Polymorphic Associations
  • relationship
  • rvm / ENV
  • Auth
  • DB related
  • TODO N+1
  • SQL view
  • module
  • api + create-react-app
  • ONE_LINE
  • Delete & destroy association
Powered by GitBook
On this page
  • screen-scraping
  • Ruby
  • Rails

Was this helpful?

Nokogiri

  1. parse HTML & XML.

  2. use XPath or CSS3 selectors to extract data.

screen-scraping

  1. reading data from a computer terminal's screen.

  2. extracts data from human-readable output from another program.

Ruby

# /lib/tasks/product_prices.rake
require 'rubygems'
require 'nokogiri'
require 'open-uri' # to get the contents of a URL 

url = "http://www.walmart.com/search/search-ng.do?search_constraint=0&ic=48_0&search_query=Batman&Find.x=0&Find.y=0&Find=Find"
doc = Nokogiri::HTML(open(url))
doc.css(".item").each do |item|
  text = item.at_css(".prodLink").text
  price = item.at_css(".PriceXLBold, .PriceCompare .BodyS").text[/\$[0-9\.]+/]
  puts "#{text} - #{price}"
end
$ ruby test.rb
Batman - $6.86
Batman: No Man's Land - $11.50
Batman: No Man's Land - Vol 03 - $11.50
...

Rails

desc "Fetch product prices"
task fetch_prices: :environment do # load the Rails environment.
  require 'nokogiri'
  require 'open-uri'  
  Product.find_all_by_price(nil).each do |product|
    escaped_product_name = CGI.escape(product.name) # make it safe to embed in a URL
    url = "http://www.walmart.com/search/search-ng.do?search_constraint=0&ic=48_0&search_query=#{escaped_product_name}&Find.x=0&Find.y=0&Find=Find"
    doc = Nokogiri::HTML(open(url))
    price = doc.at_css(".PriceXLBold, .PriceCompare .BodyS").text[/[0-9\.]+/]
    product.update_attribute(:price, price)
  end
end
$ rake fetch_prices
PreviousEnterprise Rails - big pictureNextActiveRecord - 進階功能

Last updated 5 years ago

Was this helpful?

http://railscasts.com/episodes/190-screen-scraping-with-nokogiri?view=asciicast