Recently in Blog & Tweet I explained why I wanted to make my Twitter history a part of the publication you are now reading. Along the same lines, read Dave Winer on the importance of the historic record and the general goodness of static files behind an Apache server. This post outlines how it works, with source code, and draws a conclusion.
But First, the Conclusion · There’s one tweets-blob a week, posted early Monday and covering the period from the previous Monday morning to midnight on the just-ended Sunday. The title is always “Short-form Fragments” because you know, microblogging might not always mean just Twitter. Also, they have their own category, which you might find worth a glance.
Each tweet is flagged by weekday and time, and gets its own fragment identifier thus URI thus is a Web Resource, which was necessary for my happiness. The shortened URIs have been lengthened whenever possible, and an attempt made to use the results both as title and link.
I’m happy I did this. Clearly it’s irritated some people, and at least one has actually unsubscribed. To those people, sorry. I think by batching ’em up so that there’s only one intrusion a week, and giving them a distinctive title & category, I’ve made the task of ignoring them acceptably lightweight. So, once again my apologies to those who are offended, but I’m not going to take them out; it’s important to me to have my contributions to the Net here under my own control, to the extent possible.
Step One: Backup and Dump · The Twitter API is straightforward, and I was getting ready to write the code to pull out my Tweet history when I ran across BackupMyTweets, which does just what it says.
I downloaded their XML dump and wrote a script to transmogrify that into the Monday-to-Sunday batches. I didn’t even use an XML reader, the format was regular enough to be line-oriented. The only nit was that angle-brackets were double-escaped. Stomp that roach, Keith!
Step Two: Archaeology · The XML backup reached back to Valentine’s day, 2008. But I was pretty sure I’d started tweeting before then.
So I spelunked around the Twitter API and, being too lazy to
code, just used
curl(1) to reach back in time. This
got me to January 30. Since Twitter’s own XML is different from BackupMyTweets’,
I had to write another script to generate a few more weekly blobs.
There are more; my first Tweet was in March 2007. But I can’t get the Twitter API to show ’em to me.
The correct answer, of course, is for Twitter to just bloody well give me a copy of my own damn written words which I donated to their app. Since they seem to be reasonable people, I’m assuming they agree in principle and it should become possible.
Step Three: Production · As of this week, there’s a script that runs early every Monday, to pull the previous Monday-to-Sunday span out and turn it into the input form for my publishing system.
Here it is. It’s under 200 lines of
Ruby and I assert no copyright nor do I claim any rights, and if you ask
questions or for other support, I may just laugh cruelly.
There are no unit
tests and the error-handling is laughable. It is not an example of excellence
in XML processing or HTTP wrangling or Ruby engineering or anything else.
Well, there are a couple of things about
might bring a smile to the odd geek face.