The calendar is gone.
Click here to view posts


Woot and hpricot
I want to learn more about one of my favorite sites, Woot. It was the idea of my stat friend Drew, that he might be able to predict a woot off if he had the data about woot items. Easy enough to parse a website. After looking around woot for a few minutes I found that the blog holds all the data I needed. Lucky enough the data goes back to may of 2006.

Lets start..
#!/usr/bin/env ruby
require "rubygems"
require "hpricot"
require "open-uri"
#my base blog url
uri_base = "http://www.woot.com/Forums/Default.aspx?p="
#When parsing the woot pages I got an error and after a google search I found I needed to increase the hpricot buffer
Hpricot.buffer_size = 262144
Each blog page holds about 20 Woot items. It turns out that there are 58 pages of items. The Woot off days are days with more then one item.
master_list = []
(0..58).each{|page|
 #open the page
 doc = Hpricot(open(uri_base+page.to_s))
 #get all item rows
 row =(doc/'tr.itemRow')
 data =[]
 #create the date and title entries
 data = row.map{|row| ["#{(row/'div.saleMonth').text} #{(row/'div.saleDay').text} #{(row/'div.saleYear').text}","#{(row/'div.saleTitle/a').text}"] }
 #add the values of the data array not the array
 master_list.push(*data)
 #dont kill the server        
 sleep(5)
}
After this I create a CSV file with data. This allows me to re-parse the data later for anything.

File.open("woot.data","w+"){|file|
 file<<"Date,Item\n"
 master_list.each{|woot_item|
  file.puts woot_item[DATE]+","+woot_item[ITEM].gsub(",","\,")+"\n"
 }
}
My code and data files are in my SVN.