Recently in Blog & Tweet I explained why I wanted to make my Twitter history a part of the publication you are now reading. Along the same lines, read Dave Winer on the importance of the historic record and the general goodness of static files behind an Apache server. This post outlines how it works, with source code, and draws a conclusion.

But First, the Conclusion · There’s one tweets-blob a week, posted early Monday and covering the period from the previous Monday morning to midnight on the just-ended Sunday. The title is always “Short-form Fragments” because you know, microblogging might not always mean just Twitter. Also, they have their own category, which you might find worth a glance.

Each tweet is flagged by weekday and time, and gets its own fragment identifier thus URI thus is a Web Resource, which was necessary for my happiness. The shortened URIs have been lengthened whenever possible, and an attempt made to use the results both as title and link.

I’m happy I did this. Clearly it’s irritated some people, and at least one has actually unsubscribed. To those people, sorry. I think by batching ’em up so that there’s only one intrusion a week, and giving them a distinctive title & category, I’ve made the task of ignoring them acceptably lightweight. So, once again my apologies to those who are offended, but I’m not going to take them out; it’s important to me to have my contributions to the Net here under my own control, to the extent possible.

Step One: Backup and Dump · The Twitter API is straightforward, and I was getting ready to write the code to pull out my Tweet history when I ran across BackupMyTweets, which does just what it says.

I downloaded their XML dump and wrote a script to transmogrify that into the Monday-to-Sunday batches. I didn’t even use an XML reader, the format was regular enough to be line-oriented. The only nit was that angle-brackets were double-escaped. Stomp that roach, Keith!

Step Two: Archaeology · The XML backup reached back to Valentine’s day, 2008. But I was pretty sure I’d started tweeting before then.

So I spelunked around the Twitter API and, being too lazy to code, just used curl(1) to reach back in time. This got me to January 30. Since Twitter’s own XML is different from BackupMyTweets’, I had to write another script to generate a few more weekly blobs.

There are more; my first Tweet was in March 2007. But I can’t get the Twitter API to show ’em to me.

Google Reader’s view of my tweet stream’s Atom feed includes them all, presumably because it started caching them way back when. While I can see them there in the browser, I can’t manage to save them; “View Source” shows a nauseating tangle of JavaScript. Someone suggested visiting Yahoo! and using YQL to plumb that Atom feed, but I couldn’t get that to work either.

The correct answer, of course, is for Twitter to just bloody well give me a copy of my own damn written words which I donated to their app. Since they seem to be reasonable people, I’m assuming they agree in principle and it should become possible.

Step Three: Production · As of this week, there’s a script that runs early every Monday, to pull the previous Monday-to-Sunday span out and turn it into the input form for my publishing system.

Here it is. It’s under 200 lines of Ruby and I assert no copyright nor do I claim any rights, and if you ask questions or for other support, I may just laugh cruelly. There are no unit tests and the error-handling is laughable. It is not an example of excellence in XML processing or HTTP wrangling or Ruby engineering or anything else. Well, there are a couple of things about sunday.rb that might bring a smile to the odd geek face.



Contributions

Comment feed for ongoing:Comments feed

From: Rob Knight (Aug 24 2009, at 02:25)

If your tweets are in Google's feed cache, you might be able to get at them via the Google Ajax Feed API (http://code.google.com/apis/ajaxfeeds/documentation/). Annoyingly, it doesn't support any kind of range queries on feeds, but it might be able to give you everything that Google has from the feed in question.

As far as I'm aware, this uses the same technology that powers Google Reader (and almost definitely the same data store), so if it's in Reader it should be available here too.

[link]

From: John Cowan (Aug 24 2009, at 03:45)

Overall, what you've come up with is a Good Thing. I think the main remaining problem is the visual noise induced by your CSS, and I'd strongly urge you to go to something less noisy for the Short-form Fragments pages. In particular, the oversize bold type you use for run-in subject headings is unsuitable for short form, as there winds up being too much of it. Use something else (color?) to distinguish the timestamp from the text.

Likewise, don't make the links large or red, given that there are so many of them. Instead of making them stand out, it makes them dominate the page. In fact, I think it would be good to reduce each Twitter link to a small icon or Unicode symbol, given that you don't actually have any useful anchor text.

I really would like to read all of what you have to say, but not at the expense of a headache.

[link]

From: Jon Ellis (Aug 24 2009, at 03:49)

Is there any reason you didn't just use something like:

http://www.loudtwitter.com/

which can be atom based?

[link]

From: Pamela (Aug 24 2009, at 06:40)

As someone who follows you on twitter, I would ask you to reconsider the "short form fragments" plan.

Please give those posts a specific category that I can ignore - if you mix up other new writing under the same banner, I can't effectively filter. My preference in fact would be for somebody out there to define a common tag for tweet rollups so that I can ignore them all in one fell swoop. If you're wondering why people are annoyed, I can tell you -- not only have we already seen the data, but the last time we saw it, it was in context. The second time around, it's like reading transcriptions of one side of a person's daily telephone record. The words are there, stripped of all of the timing and community that made the words interesting in the first place.

At least, that's my opinion :)

Thanks,

Pamela

[link]

From: Kevin Scaldeferri (Aug 24 2009, at 09:21)

A thought: could you set this up so that it archives on the web site, but doesn't appear in your RSS feed?

[link]

From: Zach (Aug 24 2009, at 09:27)

I understand your desire to save your tweets and archive them on ongoing, but do you really need to pollute the feed with this? Why not simply exclude them from your atom feed? That will fix the problem for most people who object to seeing them come across their news feeds. (At least it would fix it for me. ;)

[link]

From: Kevin Spencer (Aug 24 2009, at 11:18)

Well, if nothing else, at least the scroll wheel on my mouse gets a decent workout once a week as I do my best to ignore the enormous twitter post in Google Reader ;-)

[link]

From: daniel.sandbecker@gmail.com (Aug 24 2009, at 12:10)

I'm sorry, but republishing your tweets is a waste of time for people who like to read them (and thus follow you on twitter) as well as for us who don't. I really can't understand why you (who's obviously much smarter than me) is (re-)*publishing* when what you seem to intend is *archiving*.

The category is all well if anyone whant to subscribe to only your short form fragments, but of no help if I wan't all BUT the tweets.

Please reconsider. One way might just be to look at one of those summaries and pretend that you haven't seen it before. To me, who hasn't, it's a hardly even readable mess of bold (Bold, BOLD) timestamps and links.

I will not unsubscribe, I'll settle with running your feed through a Yahoo! Pipe. (Maybee you could clone it and embed the link ... ;-)

[link]

From: Noah Slater (Aug 25 2009, at 08:35)

"While I can see them there in the browser, I can’t manage to save them; “View Source” shows a nauseating tangle of JavaScript. Someone suggested visiting Yahoo! and using YQL to plumb that Atom feed, but I couldn’t get that to work either."

If you use Firebug, you can right-click an element and inspect it. From the HTML inspector, you can copy the HTML of any element and it's children. May come in handy.

[link]

author · Dad · software · colophon · rights
picture of the day
August 23, 2009
· The World (112 fragments)
· · Life Online (267 more)
· Technology (81 fragments)
· · Publishing (156 more)
· · Web (388 more)

By .

I am an employee
of Amazon.com, but
the opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.