Frizzle Tech Blog: How I created my weekly feed digest

So, if you read the frizzlefry blog, then you probably noticed I had an issue today that was all about my weekly shared Items in Google Reader. I have been thinking about reorganizing all of my shared feeds into a series of digests. The nice part, is that other people did all the work for me, so all I need to do is put their work together into a script I can run.

So, where to start? ok, first I want to parse out an Atom feed, so I chose the Universal Feed Parser which is a nice little utility to parse out RSS/Atom feeds among other stuff I'm not currently interested in. So, because the library I wanted to use is in Python, I decided that I was going to write a little python script to get the work done... I guess the language of a library is as good a reason as any to choose your language. '

Anyway, ok, so how does this thing work? really simple as it turns out...
To start off, I import the feedparser library, and create a feed object:



import feedparser



d = feedparser.parse('http://www.google.com/reader/public/atom/user%2F11424318483849164397%2Fstate%2Fcom.google%2Fbroadcast')

So, that creates a variable d, which is a feed object. All I want from the feed is the title and description, so I can access those simply by using:



import feedparser



d = feedparser.parse('http://www.google.com/reader/public/atom/user%2F11424318483849164397%2Fstate%2Fcom.google%2Fbroadcast')



d.entries[itemnumber].title

d.entries[itemnumber].description

However, I want to print out those values, which can be done with the print like this:



import feedparser



d = feedparser.parse('http://www.google.com/reader/public/atom/user%2F11424318483849164397%2Fstate%2Fcom.google%2Fbroadcast')



print d.entries[itemnumber].title

print d.entries[itemnumber].description

Next, I want itemnumber to mean something, so I'll put the whole thing together in a for loop that prints all entries like this:



import feedparser



d = feedparser.parse('http://www.google.com/reader/public/atom/user%2F11424318483849164397%2Fstate%2Fcom.google%2Fbroadcast')



for itemnumber in range(0, len(d.entries)):

   print d.entries[itemnumber].title

   print d.entries[itemnumber].description

Now, I wanted to make sure that I limit the list to only include items after a given start date that I could specify as a command line argument, so I added a couple more lines:



import feedparser



startdate = datetime.strptime(sys.argv[1],'%Y-%m-%d').date()

d = feedparser.parse('http://www.google.com/reader/public/atom/user%2F11424318483849164397%2Fstate%2Fcom.google%2Fbroadcast')



for itemnumber in range(0, len(d.entries)):

   print d.entries[itemnumber].title

   print d.entries[itemnumber].description

   if datetime.strptime(d.entries[itemnumber+1].published,'%Y-%m-%dT%H:%M:%SZ').date() < startdate: break

So, I created the startdate variable, which is a date provided by the command line in the format
--, then it breaks the loop if date published is before (less than) startdate.

Finally, I wanted to add some basic HTML formatting to make it easy for me to copy and paste the text into blogger... that lead me to my final version of the python script, which looks a little like this:



import feedparser

from datetime import date

from datetime import datetime

import sys



print sys.argv[1]

startdate = datetime.strptime(sys.argv[1],'%Y-%m-%d').date()



d = feedparser.parse('http://www.google.com/reader/public/atom/user%2F11424318483849164397%2Fstate%2Fcom.google%2Fbroadcast')

print '<html><head><title>'



print d.feed.title

print '</title></head><body>'



for itemnumber in range(0,len(d.entries)):



print '<h1>'

print d.entries[itemnumber].title

print '</h1>'

print d.entries[itemnumber].description

print '<br /><br />Date Shared: '

print datetime.strptime(d.entries[itemnumber].published,'%Y-%m-%dT%H:%M:%SZ').date()



if datetime.strptime(d.entries[itemnumber+1].published,'%Y-%m-%dT%H:%M:%SZ').date() < startdate: break

print '<hr />'

print '<hr />'

print '<hr />'



print '</body></html>'

So, the first part of my script was done... I named the file GeneratePage.py, and proceeded to create three more files... the second file was SaveDate.py, which would simply output the date in the format I needed it using two lines:



from datetime import date

print date.today()

I figured the best way to track the previous time the script was run would be to save the date to a file, so I created a file called date.txt that contained only the string "2011-02-26".

Then I created a batch script that would tie everything together... the batch script looks like this:



#!/bin/bash 

D=`cat date.txt`

python GeneratePage.py $D

python SaveDate.py>date.txt

This script would set the date from date.txt to a variable called D. Then it would run the GeneratePage.py script while passing it the value of D. Then it would run the SaveDate.py script, with the output replacing the contents of date.txt.

At this point, the script could be run every week, and the output can be copied and pasted into blogger.

The next step will be to automate the process of putting the dialog onto the site, which will actually take a little effort, but I'm pretty sure someone else has already done the work for me, I just have to figure out who did it.

Oh, on one other note, in case you are curious why I didn't just redirect all the output to a file, it's because some of the feed items use Unicode characters that cannot be converted to ASCII... so Bash gets angry when you do that. I might be able to fix this easily, however, I really don't mind having to copy and paste the characters into the blog anyway.

Later,

SteveO

Frizzle Tech Blog

Sunday, February 27, 2011

How I created my weekly feed digest

No comments:

Post a Comment

About Me

BlogCatalog

My Blog List

Followers

My Digg History

Steve"s shared items in Google Reader

Blog Archive