I’ve decided that
mod_atom really needs to be a
blog-publishing system, not just an Atom Store. And furthermore, based mostly
on the comments to
that Sanitation piece,
I’ve made two design decisions. First, the sanitizing happens only on the
HTML output; the Atom-store part will persist the data as close as possible to
the way it was sent upstream. Second, I’m going to try using the
TidyLib parser to pick
type="html" text constructs so I can clean ’em up.
Why Tidy? · The other candidate was libxml2, and online research failed to reveal any hands-on comparisons of the two, but it also failed to turn up anyone seriously dissing either HTML parser. So then I noticed that the libxml2 binary was like 3.8M, while TidyLib is under 400K. Of course, to be fair, libxml2 does tons of other useful stuff that I don’t care about.
So after a couple of days’ part-time poking around, I figured out how to compile TidyLib and mod_atom together and load the result into httpd.
As soon as I stop blogging I’m going to try to wire it up. Surely I have some big thick books or corporate strategies or social-software trends to review first?