In a previous post I described how I'm keeping up with APRESS' eBook Deal of the Day. At the time I was using a Firefox extension, Update Scanner to watch for changes to the web page. It worked well enough, but it still required me to physically go out to the page and I've noticed a couple quirks. Since then I got excited about a tangent, zenhabits Twitterbot Challenge. I never actually participated in the challenge as the Twitterbot thread took me in a different direction.
While exploring some of the different uses people have been using twitter for I saw a w00t twitterbot that tweets w00t items, handy in a w00t off. That wasn't far off from what I wanted from APRESS (I would have been happy with RSS, but I'm using twitter regularly enough that it's a decent solution). All I need to do is go out to the daily deal page, scrape it for the book name and link, then tweet it. How hard can it be?
Not that hard, really, about fifteen lines of ruby code and a few gems:
1: #!/usr/bin/ruby
2:
3: require 'rubygems'
4: require 'hpricot'
5: require 'shorturl'
6: require 'twitter'
7: require 'open-uri'
8:
9: doc = Hpricot(open('http://www.apress.com/info/dailydeal')) 10: deal = doc.search("//div.bookdetails") 11: book = (deal/"h3/a").first
12: description = (deal/"p")
13: details = (description/"div")
14: (description/"div").remove
15:
16: root_url = 'http://www.apress.com'
17: book_url = ShortURL.shorten(root_url + book.attributes['href'])
18: book_title = book.inner_html
19:
20: tweet_start = "[#{Date.today}]: " 21: tweet_end = " (#{book_url})" 22: book_title_shortened = book_title[0, 140 - tweet_start.length - tweet_end.length]
23: tweet = tweet_start + book_title_shortened + tweet_end
24:
25: twitter ||= Twitter::Base.new("ApressDailyDeal", "my_super_secret_password") 26: twitter.post tweet
So, let's break it down.
Line 1 is there because I'm going to set this to run as a cron job later and I'd rather do ./dailydeal.rb than ruby dailydeal.rb.
Lines 3-7 bring in some gems (similar to perl's cpan packages). open-uri to get the page, Hpricot to scrape it, short-url to make the link to the book shorter and twitter to tweet.
Lines 9-14 could probably be tightened up, but I rather like the "clarity over cleverness" meme. Surprisingly, APRESS' page was rather easy to scrape which means it was well marked up, using classes appropriately for semantic meaning (div class="bookdetails" actually held... book details).
I stitch the URL together in 16-18 and call out to short-url's shorten method. It supports a bunch of services, rubyurl is the default and with tinyurl not working for me, I went with it. I actually forked the project on github to add support for is.gd (every character counts with twitter) but I'm waiting for a gem update before I switch to that.
In lines 20-23 I assemble the tweet. I broke it into a few parts because I need to make sure the message length doesn't exceed 140 characters. The date and URL to the book are more important than the entire title of the book, so I truncate that if necessary.
The last lines are just sending the tweet.
On my linux box I used crontab -e to enter the following job:
1: 30 08 * * * /home/jeff/development/apress_twitterbot/dailydeal.rb
So at 8:30 every morning, give or take a bit of drift either way, I'll tweet off today's new Daily Deal:
The original page:

So, if you're interested in getting APRESS' Daily Deal tweeted to you every morning (EST) follow my bot: http://twitter.com/ApressDailyDeal
eBooks are becoming a lot more interesting with the introduction of the Plastic Logic reading device (http://www.plasticlogic.com/)