In a previous post I described how I'm keeping up with APRESS' eBook Deal of the Day.  At the time I was using a Firefox extension, Update Scanner to watch for changes to the web page.  It worked well enough, but it still required me to physically go out to the page and I've noticed a couple quirks.  Since then I got excited about a tangent, zenhabits Twitterbot Challenge.  I never actually participated in the challenge as the Twitterbot thread took me in a different direction.

While exploring some of the different uses people have been using twitter for I saw a w00t twitterbot that tweets w00t items, handy in a w00t off.  That wasn't far off from what I wanted from APRESS (I would have been happy with RSS, but I'm using twitter regularly enough that it's a decent solution).  All I need to do is go out to the daily deal page, scrape it for the book name and link, then tweet it.  How hard can it be?

Not that hard, really, about fifteen lines of ruby code and a few gems:

  1: #!/usr/bin/ruby
  2: 
  3: require 'rubygems'
  4: require 'hpricot'
  5: require 'shorturl'
  6: require 'twitter'
  7: require 'open-uri'
  8: 
  9: doc = Hpricot(open('http://www.apress.com/info/dailydeal')) 
 10: deal = doc.search("//div.bookdetails")
 11: book = (deal/"h3/a").first
 12: description = (deal/"p")
 13: details = (description/"div")
 14: (description/"div").remove
 15: 
 16: root_url = 'http://www.apress.com'
 17: book_url = ShortURL.shorten(root_url + book.attributes['href'])
 18: book_title = book.inner_html
 19: 
 20: tweet_start = "[#{Date.today}]: "
 21: tweet_end = " (#{book_url})"
 22: book_title_shortened = book_title[0, 140 - tweet_start.length - tweet_end.length]
 23: tweet = tweet_start + book_title_shortened + tweet_end
 24: 
 25: twitter ||= Twitter::Base.new("ApressDailyDeal", "my_super_secret_password")
 26: twitter.post tweet

So, let's break it down.

Line 1 is there because I'm going to set this to run as a cron job later and I'd rather do ./dailydeal.rb than ruby dailydeal.rb.

Lines 3-7 bring in some gems (similar to perl's cpan packages).  open-uri to get the page, Hpricot to scrape it, short-url to make the link to the book shorter and twitter to tweet.

Lines 9-14 could probably be tightened up, but I rather like the "clarity over cleverness" meme.  Surprisingly, APRESS' page was rather easy to scrape which means it was well marked up, using classes appropriately for semantic meaning (div class="bookdetails" actually held... book details). 

I stitch the URL together in 16-18 and call out to short-url's shorten method.  It supports a bunch of services, rubyurl is the default and with tinyurl not working for me, I went with it.  I actually forked the project on github to add support for is.gd (every character counts with twitter) but I'm waiting for a gem update before I switch to that.

In lines 20-23 I assemble the tweet.  I broke it into a few parts because I need to make sure the message length doesn't exceed 140 characters.  The date and URL to the book are more important than the entire title of the book, so I truncate that if necessary.

The last lines are just sending the tweet.

On my linux box I used crontab -e to enter the following job:

  1: 30 08 * * * /home/jeff/development/apress_twitterbot/dailydeal.rb

So at 8:30 every morning, give or take a bit of drift either way, I'll tweet off today's new Daily Deal:

image

The original page:

image

 

 So, if you're interested in getting APRESS' Daily Deal tweeted to you every morning (EST) follow my bot: http://twitter.com/ApressDailyDeal

eBooks are becoming a lot more interesting with the introduction of the Plastic Logic reading device (http://www.plasticlogic.com/)