What
 · Technology
 · · XML

XML’s 15th Birthday · Whether you like XML or not, we’re stuck with it for a long time. Th­ese days, the on­ly new XML-based projects be­ing start­ed up are document-centric and publishing-oriented. Thank good­ness, be­cause that’s a much bet­ter fit than all the WS-* and Ja­va EE con­fig puke and so on that has giv­en those three let­ters a bad name among so many pro­gram­mer­s. XML for your doc­u­ment database is ac­tu­al­ly pret­ty hard to im­prove on ...
[7 comments]  
Wrong on the Internet · I was ly­ing in bed this Sun­day morn­ing, check­ing the Net be­fore com­ing down­stairs to make scram­bled eggs (with mush­rooms and snap peas, yum) for the fam­i­ly, and ran across a bit of ran­dom snark from Aaron Swartz. Any Sun­day morn­ing is im­proved by a chance to ar­gue about markup lan­guages and how the Web work­s ...
[7 comments]  
On “Custom XML” · I see that Mi­crosoft lost an ap­peal in the “Custom XML” lit­i­ga­tion, and may be forced to dis­able that func­tion­al­i­ty in Mi­crosoft Of­fice. This is a short back­grounder ex­plain­ing what “Custom XML” is about, and why no­body should care ...
[27 comments]  
Fixing XML · A week or two ago, I was read­ing some­thing which in­clud­ed a re­al­ly sil­ly state­ment hy­per­linked to the Wikipedia en­try for XML. I fol­lowed the link and dis­cov­ered that the en­try was ap­palling­ly bad. I looked with a shud­der at the size and com­plex­i­ty of the bro­ken­ness and just failed to con­vince my­self that it was some­body else’s prob­lem. So we fixed it ...
[9 comments]  
XML in Oxford · That’s the XML Sum­mer School in Septem­ber at St. Ed­mund. I can’t make it, in part be­cause my wife is co-ordinating which means I do child-care. I’ve been to these and they’re to­tal­ly great, in­tense and in­ter­ac­tive and fo­cused; then you get to go drink­ing around Ox­ford in the evening. If you’re with­in reach and work with XML and want to up­grade your­self, I to­tal­ly rec­om­mend it.
 
Neither Father nor Inventor · It’s nice when The Reg cov­ers my speech­es, but it rais­es an is­sue that I guess I’m go­ing to have to ad­dress ev­ery year un­til I die. I am nei­ther the fa­ther nor in­ven­tor of XML!  ...
[9 comments]  
Missing-Font Messages in Keynote · There are a va­ri­ety of sit­u­a­tions in which, when you start iWork tool­s, for ex­am­ple Keynote, you get a bunch of whin­ing about miss­ing fonts. This can be fixed by hand ...
[4 comments]  
XML Trouble · Last week, I sent an email to one of the XML stan­dard­iza­tion lists at the W3C; my first pres­ence in that con­ver­sa­tion in quite some num­ber of years. This short piece, of in­ter­est on­ly to XML ob­ses­sives, gives a bit of back­ground ...
[5 comments]  
OOXML: Everything’s Just Fine · Or at least that’s what ISO’s Sec­re­tary Gen­er­al says. [I had hoped to stop writ­ing about this sub­jec­t, sigh]. There are mul­ti­ple ap­peals against OOXML; let’s try to read the tea-leaves with­out too many gut­tural snick­er­s ...
[9 comments]  
RX and 1.9 and Pain · This frag­ment is most­ly a note to my­self and place­hold­er and might prove use­ful to some­one slash­ing through the XML un­der­growth with bleeding-edge Ruby. Briefly: I re­vived my “RX” Ru­by to­k­eniz­er (see here, here, and here) to con­tribute to An­to­nio Cangiano’s pro­posed Ru­by bench­mark suite, which I think is a Real­ly Good Idea. I had a bit of pain get­ting the code to run on both Ru­by 1.8 and 1.9, and then when I tried sanity-checking the out­put by com­par­ing it to REXML on 1.9, REXML blew chunks. There are, ap­par­ent­ly, is­sues about REXML and 1.9. Read on for de­tails in the un­like­ly event that you care about any of this ...
[3 comments]  
ISO Fantasy · There has been much re­joic­ing re­cent­ly at the pro­cess where­by, ap­par­ent­ly, an ISO com­mit­tee takes full con­trol of OOXML. But you know, that sto­ry is en­tire­ly ir­rel­e­van­t. It will have no ef­fect on what im­ple­men­tors of OOXML, in­clud­ing Mi­crosoft, should or will ac­tu­al­ly do. The story’s end­ing will I think be most­ly tawdry. Oh, and I have some OOXML news that I think is im­por­tan­t, but that I don’t think any­one else has re­port­ed ...
[16 comments]  
On OOXML · I hadn’t re­al­ly planned to be­come well-informed about OOXML, but I have. So I thought I’d build my own per­son­al list of rea­sons for and against OOXML be­com­ing an ISO stan­dard ...
[23 comments]  
BRM Narrative · Now that the BRM is over, I feel I can write about it a bit more; there are some re­stric­tion­s, but I’ll lay them out. Sum­ma­ry: A lot of good work was done, but the pro­cess is ir­re­triev­ably bro­ken ...
[16 comments]  
OOXML Batch Converter? · Here’s a pro­gram that would be use­ful. You point it at a di­rec­to­ry and it runs around find­ing all the .doc, .xls, and .ppt files, and gen­er­ates an OOXML ver­sion of each. In the old days, you’d use VBA to do this kind of thing. I’m be­hind on this kind of tech­nol­o­gy, but I as­sume there’s some­thing on Win­dows that would make this tractable?
[7 comments]  
XML People · XML is ten years old to­day. It feels like yes­ter­day, or a life­time. I wrote this that year (1998). It’s re­al­ly long ...
[23 comments]  
Upcoming Gig: ISO OOXML BRM · I’ve been in­vit­ed to join the Cana­di­an del­e­ga­tion to the DIS 29500 Bal­lot Res­o­lu­tion Meet­ing in Gene­va in Fe­bru­ary. This is a con­se­quence of hav­ing joined the ex­pert group sup­port­ing the Cana­di­an Na­tion­al Stan­dards Body; I haven’t quite fig­ured out the for­est of acronyms and or­ga­ni­za­tions yet, or how things fit to­geth­er. Giv­en the white heat of pol­i­tics and ver­biage around this pro­cess, I’m go­ing to ac­cede to the re­quest of a cou­ple of Very Smart Peo­ple who’ve asked me to hold off on real-time blog­ging. Which I’m com­fy with, since I’m an ISO new­bie and don’t know the pro­cess or the cul­ture. I will say, though, that I am not rep­re­sent­ing Sun of­fi­cial­ly, the Cana­di­an Stan­dards peo­ple con­tact­ed me and I checked with our cor­po­rate Stan­dards group and said that I want­ed to go and would on­ly go if I were free to of­fer my own tech­ni­cal opin­ions on tech­ni­cal is­sues; they were OK with that. I’ve been stuff­ing my brain with the OOXML com­ments and pro­posed res­o­lu­tion­s, and the pic­ture is in­ter­est­ing; I’ll write at length once I fig­ure out how to do so with­out break­ing any­thing.
 
Now That’s a Patch · I re­fer to Sam Ruby’s mas­sive patch to make REXML work prop­er­ly with the lat­est Ruby. I’ve long dis­liked REXML (see here and here), but it’s here and it work­s. On­ly the way it works changed in 1.9, and there were some hor­ri­ble re­gres­sion­s, and it gets patched very slow­ly. (I’m ac­tu­al­ly won­der­ing why Ru­by needs to have a weird regex-based pars­er when Ex­pat is plen­ty good enough for Perl and Python, and in fact if you look at xml­parser.rb, you can switch parser­s, just as Nick Sieger has done for JRu­by with JREXML. But I di­gress.) In the short ter­m, we need to see if the REXML main­tain­ers are re­spon­sive to Sam’s patch.
[1 comment]  
Year-End Sweep — Tech · Over the course of the year, in brows­er tab­s, book­mark­s, and del.i­cio.us, I’ve built up a huge list of things that I felt I should write about, at least at the time I saw them. Wel­l, dammit, I’m not gonna let 2007 end with­out at least mak­ing a try. Here goes. Cat­e­go­rized, even ...
[7 comments]  
XBRL News · Last week I gave a talk at the 16th In­ter­na­tion­al XBRL Con­fer­ence here in Van­cou­ver. XBRL is an XML-based sys­tem for pack­ing up companies’ fi­nan­cial in­for­ma­tion, and I think it’s re­al im­por­tant. But its take-off has been kind of pro­tract­ed and ar­du­ous. I was there as an Am­bas­sador From the We­b. Here’s a quick XBRL news overview ...
[5 comments]  
Markup Thanks · I re­al­ly didn’t pay that much at­ten­tion to the first OOXML round at ISO, but I’ve de­vel­oped a sort of sick fas­ci­na­tion with it, lead­ing up to the potentially-apocalyptic Gene­va BRM. I read the Ky­oto meet­ing re­port from the ex­cel­lent Alex Brown and it dawns on me that a lot of us owe some huge debts to peo­ple whose names I bet most of you don’t know: James Ma­son, Martin Bryan, and Ken Hol­man. Here are a few words on them ...
[2 comments]  
The OOXML News · I was re­al­ly wrong about the OOXML/ISO sto­ry; told ev­ery­one “It’ll sail through ISO, don’t both­er with the process.” Boy, was I wrong. At the mo­ment that pro­cess is hurtling to­ward the mildly-historic “Ballot Res­o­lu­tion Meeting” in Gene­va in Fe­bru­ary (read about it here and here). Any­how, all those tens of thou­sands of com­ments on the first draft, which were pre­vi­ous­ly in­vis­i­ble be­hind some ISO veil, are now out there for all to view, tag, hy­per­link, an­no­tate, and en­hance, at the un­of­fi­cial but ex­cel­lent DIS29500 Com­ments site (tagline: “Help the OOXML BRM con­cen­trate on is­sues of substance”). The per­son be­hind it seems to be Alan Bel­l, whom I don’t think I know, but the world owes him a vote of thanks. Ob­vi­ous­ly, this whole thing does re­tain a grimy side; see the ex­cel­lent Martin Bryan’s fairly-despondent Re­port on WG1 ac­tiv­i­ty for De­cem­ber 2007 Meet­ing of ISO/IEC JTC1/SC34/WG1 in Ky­oto. Sigh. No­body ev­er said his­to­ry was clean.
 
Tab Sweep — Tech · This goes back weeks and week­s; I’ve been wide-finding and do­ing Sun stuff and the Web-watching has suf­fered ...
[6 comments]  
Bad, Feed Readers, Bad! · Piles of junk, I say. Par­don me, but I’m feel­ing grumpy. After much more work than it should have been, mod­_atom is now gen­er­at­ing rea­son­ably co­her­ent (not done yet, but get­ting there) HTML out­put and human-oriented (as op­posed to APP-oriented) Atom feed­s. It’s slight­ly id­iosyn­crat­ic XML, with lots of names­pace pre­fix­es. The Feed Val­ida­tor says it’s OK and I think it’s OK. But none of NetNewsWire or Vi­en­na or Blog­lines [Up­date: or Blog­bridge or Sa­far­i, or My Ya­hoo!, or Sage] can read it cor­rect­ly. I fart in their gen­er­al di­rec­tion. [Up­date: Google Read­er, Plan­et Venus, Snar­fer, Sim­plePie, Lifer­ea, Awa­su, Shrook, and Flock get it right! Good on ya, guys.] [Ah, Brent sent me a point­er to the lat­est be­ta of NetNewsWire 3.1, and it’s fine. I know oth­er peo­ple rave about GoogleRead­er and Vi­en­na and so on, but for me NNW is still way ahead of the pack in let­ting me scan a whole lot of news in al­most no time at al­l.] Are there any oth­er feed-reader im­ple­men­tors out there who think they can, you know, read XML cor­rect­ly?!?! If so, get in touch, and if you pro­cess my lit­tle bun­dle of joy prop­er­ly, I’ll lav­ish praise and links. Or if there’s a bug in the feed that nei­ther I nor the val­ida­tor can see, I’ll apol­o­gize humbly to the whole world. In any case, I’m go­ing to have to go back and patch up the code so it doesn’t emit any of those nasty colons and rel­a­tive URI ref­er­ences that ap­par­ent­ly hurt implementors’ frag­ile feel­ings. This does not im­prove my mood. [Up­date: Just to be clear, I’m not talk­ing about the on­go­ing feed; if you want to test your feed read­er, con­tact me and I’ll point you at the test feed.]
[17 comments]  
Following the OOXML Story · From the be­gin­ning of the sto­ry up to last week’s cru­cial vote, it seemed that Andy Updegrove’s blog was the best place to fol­low this sto­ry. I think that as of now, the man to read would be Alex Brown; in par­tic­u­lar, check out his au­thor­i­ta­tive OOXML bal­lot com­ments and OOXML - what just hap­pened?. He’s al­so main­tain­ing chunks of the moving-target OOXML Wikipedia en­try. For an in­ter­est­ing side­light, though, see David Berlind’s ZDNet read­er: Don’t let OOXML vs. ODF shenani­gans tar­nish oth­er stan­dards set­ters; use­ful per­spec­tive, I’d say. And then for the re­al ab­so­lute un­var­nished truth, check out Steve’s ISO doc­u­ment stan­dards.
[1 comment]  
ISO OOXML Craziness · I’ve gen­er­al­ly been ig­nor­ing all the fuss & both­er about OOXML’s well-greased path to ISO anoint­men­t. I’d as­sumed that af­ter ECMA had ap­plied rig­or­ous and im­par­tial scruti­ny to all six thou­sand pages, en­sur­ing that this was straight­for­ward­ly im­ple­mentable by all in­ter­est­ed par­ties, then the ISO rub­ber stamp wouldn’t be long in fol­low­ing, giv­ing us an In­ter­na­tion­al Stan­dard no less, plus fresh in­sight in­to the lev­el of re­spect such things de­serve; and we could all get on with life. Now, the ISO pro­cess seems to be turn­ing in­to the most en­ter­tain­ing kind of stan­dards mosh-pit, with loud ac­cu­sa­tions of cor­rup­tion and mal­prac­tice. Cana­di­ans in the crowd will be re­mind­ed of the fla­vor of a Lib­er­al Par­ty nom­i­na­tion meet­ing. Groklaw’s cov­er­age is pre­dictably over­amped, but still fun; here’s news from France, Swe­den, and Nor­way. That’s just one day’s worth. [Up­date: Hey, Den­mark too!] ...
[9 comments]  
Tab Sweep — Tech · Au­gust is sup­posed to be the slow time of year. Not! Is there ev­er a lot of in­ter­est­ing stuff out there. To­day we have WS-funnies, OOXML Pur­dah, Web names, In­ter­net Registry struc­tures, and Ru­by metapro­gram­ming crazi­ness ...
[8 comments]  
Tab Sweep — Tech · To­day we have some Atom­ic Ap­ple love, iPhone Web friend­li­ness, Re­laxNG praise, and JVM Lan­guage widen­ing ...
[6 comments]  
Color Commentary · Read the ex­cel­lent play-by-play from Andy Upde­grove: Up­date on the US Vote on OOXML (and What Hap­pens Nex­t). He seems to have all the pub­lic fact­s, but speak­ing as one who’s been through a few of those pro­cess­es, I thought I should high­light some­thing that’s go­ing on right now, but won’t be talked about much. The prob­lem of fig­ur­ing out the US vote is in the hands of the 16 mem­bers of the INCITS com­mit­tee. So does that mean that everyone’s sit­ting still wait­ing for them to make up their mind­s? Nope. What’s hap­pen­ing right now is that the big play­ers with skin in the game are ap­ply­ing executive-to-executive pres­sure, be­hind the sce­nes, to the com­mit­tee members’ bosses’ bosses’ boss­es. In a few cas­es it’ll work, and the mem­bers will be is­sued here’s-your-vote march­ing or­der­s. I’ve seen it hap­pen. In fac­t, when the in­ten­si­ty lev­el gets up there, I’ve nev­er seen it not hap­pen. No­body will ev­er know the whole sto­ry on what’s hap­pen­ing right now un­der the cov­er­s. I re­al­ly don’t en­vy the com­mit­tee mem­ber­s.
[3 comments]  
What XML Means · XML’s tenth birth­day is com­ing up next spring; here’s my sound-bite on What It All Mean­s. XML is the first suc­cess­ful in­stance of a da­ta pack­ag­ing sys­tem that is si­mul­ta­ne­ous­ly (hu­man) language-independent and (com­put­er) system-independent. It’s the ex­is­tence proof that such a thing can be built and be use­ful. Is it the best choice for ev­ery ap­pli­ca­tion? Is it the most ef­fi­cient pos­si­ble way to pack­age up data? Is it the last pack­ag­ing sys­tem we’ll ev­er need? Sil­ly ques­tion­s: no, no, and no. JSON is al­ready a bet­ter choice for pack­ag­ing up ar­rays and hash­es and tu­ples. RNC is a bet­ter choice for writ­ing schema lan­guages. A clas­sic Unix-flavor file con­tain­ing or­di­nary lines of or­di­nary text is the best choice of al­l, when­ev­er you can get away with it. XML’s still a de­cent op­tion, prob­a­bly the best, for in­ter­chang­ing things that are (at least in part) meant to be read by hu­man­s. It could be im­proved. It might be re­placed. Wouldn’t sur­prise me, ei­ther way.
[12 comments]  
Any Damn Fool · This is re­al news: James Clark has a blog, and in it he says “Any damn fool could pro­duce a bet­ter da­ta for­mat than XML”. Um, James was des­ig­nat­ed Tech­ni­cal Lead of the orig­i­nal XML Work­ing Group and is the sin­gle largest con­trib­u­tor to the de­sign of XML. Al­so, per­hap­s, the finest com­put­er pro­gram­mer I’ve ev­er had the priv­i­lege of work­ing with ...
[2 comments]  
Tech Tab Sweep · I break with my no-underlying-theme theme and do an all-technology tab sweep; in fac­t, al­most all XML ...
[8 comments]  
XML 2.0? · Anne van Kesteren sug­gests an XML 2.0 most­ly de­fined by less-Dra­co­ni­an er­ror han­dling, pro­vok­ing fur­ther dis­cus­sion over chez Sam Ruby ...
[27 comments]  
Life Is Complicated · My good­ness, even CNN picked up the sto­ry about Mi­crosoft try­ing to re­tain Rick Jel­liffe to up­date the Wikipedia ar­ti­cles on ODF and OOXML for them, just as the ISO pro­cess around OOXML is get­ting in gear. This rais­es com­pli­cat­ed is­sues about doc­u­ment for­mats and trans­paren­cy and con­flict of in­ter­est; and there’s at least one ele­phant in the room ...
[20 comments]  
Tab Sweep · This is go­ing to be big and have month-old news in it; a con­se­quence of the long southern-hemisphere post­ing in­ter­rup­tion. I’ll even group ’em in­to para­graph­s ...
 
JSON and XML · I hear peo­ple say­ing “JSON is great, XML is over”, but I don’t hear XML par­ti­sans say­ing any­thing bad about JSON. There are two ar­gu­ments that are over, though ...
[15 comments]  
Microsoft XML, the Mac Angle · There’s been a lot of noise these last few days about the Mi­crosoft Of­fice XML file for­mat­s; the world doesn’t need my opin­ion again. I’d vague­ly not­ed that Mac Of­fice would be a lit­tle be­hind on the new XML, then Si­mon Phipps shot me links to a cou­ple of clos­er look­s, which shed an in­struc­tive light ...
[6 comments]  
Choose RELAX Now · El­liotte Rusty Harold’s RELAX Wins may be a mile­stone in the life of XML. Every­body who ac­tu­al­ly touch­es the tech­nol­o­gy has known the truth for years, and it’s time to stop sweep­ing it un­der the rug. W3C XML Schemas (XSD) suck. They are hard to read, hard to write, hard to un­der­stand, have in­ter­op­er­abil­i­ty prob­lem­s, and are un­able to de­scribe lots of things you want to do all the time in XML. Schemas based on Re­lax NG, al­so known as ISO Stan­dard 19757, are easy to write, easy to read, are backed by a rig­or­ous for­mal­ism for in­ter­op­er­abil­i­ty, and can de­scribe im­mense­ly more dif­fer­ent XML con­struct­s. To Elliotte’s list of im­por­tant XML ap­pli­ca­tions that are RELAX-based, I’d add the Atom Syn­di­ca­tion For­mat and, pret­ty soon now, the Atom Pub­lish­ing Pro­to­col. It’s a pity; when XSD came out peo­ple thought that since it came from the W3C, same as XML, it must be the way to go, and it got baked in­to a bunch of oth­er tech­nol­o­gy be­fore any­one re­al­ly had a chance to think it over. So now lots of peo­ple say “Well, yeah, it suck­s, but we’re stuck with it.” Wrong! The time has come to de­clare it a wor­thy but failed ex­per­i­men­t, tear down the shaky tow­ers with XSD in their foun­da­tion, and start us­ing RELAX for all sig­nif­i­cant XML work. [Up­date: Piling-on are Don Park, Gabe Wa­chob, Mike Hostetler and some com­menter­s. There’s thought­ful in­put from Dare Obasan­jo, and now the com­ments have some push-back too. And oh my good­ness gra­cious, a Rick Jel­liffe must-read.]
[17 comments]  
XML 2006 · We’re on­ly three weeks away from XML 2006. Which brings to mind that it was ten years ago at that same con­fer­ence (d­if­fer­ent name then) that we showed the world the first draft of the XML spec. It was a carefully-staged even­t, and one of the most in­tense 45 min­utes in my life. Ah... Back to this year. Looks like David Meg­gin­son has put to­geth­er a pro­gram that is no­tably free of the usu­al sus­pects and rich with new stuff. I see pre­sen­ters from Google and the Mot­ley Fool, and on PHP and (my good­ness) JSON. Looks very good.
 
OOXML Hoo-Hah · Bob Su­tor and Rob Weir (both of IBM) have been been whack­ing away at the stan­dards lip­stick be­ing paint­ed on the Mi­crosoft Of­fice In­ter­nal Da­ta Struc­ture XML Dump pig. Oop­s, of­fi­cial­ly, that’s “ECMA Of­fice Open XML”. In A Leap Back Rob de­scribes Excel’s well-known date-representation bug be­ing en­cod­ed in an al­leged In­ter­na­tion­al Stan­dard. Then again in A bit about the bit with the bits, he talks about bit­masks and of­fal (re­al­ly). But it’s Bob’s point, in Is Open XML a one way spec­i­fi­ca­tion for most peo­ple?, that’s cen­tral: this is just a six-thousand-page da­ta dump de­scrib­ing a par­tic­u­lar XML se­ri­al­iza­tion of a par­tic­u­lar com­mer­cial application’s ob­ject mod­el, com­plete­ly obliv­i­ous to the uni­verse of publishing-related stan­dards that have been ham­mered out and put to work while MSOf­fice was be­ing tend­ed in Red­mond. You can write “STANDARD” on it in let­ters as big as you wan­t, but there will on­ly ev­er be one full im­ple­men­ta­tion, and if you stan­dard­ize on this stan­dard you’ve locked your­self in. Shame, shame on the oth­er com­pa­nies on the com­mit­tee, help­ing Mi­crosoft per­pet­u­ate this trav­es­ty. There’s just no ex­cuse.
[3 comments]  
Upcoming Gig: SEC · This has been booked for month­s, but I just found out that it’s an open-announcement thing. I’ll be par­tic­i­pat­ing in an In­ter­ac­tive Da­ta Roundtable at the US Se­cu­ri­ties and Ex­change Com­mis­sion in Wash­ing­ton DC on Wed­nes­day Oc­t. 3rd. The SEC’s In­ter­ac­tive Da­ta ini­tia­tive is, I think, go­ing to be huge. As an in­vestor, a busi­ness­man, and an open-data fan, I’m 100% sure that it’ll pay back the in­vest­ment many times over.
[1 comment]  
GullFOSS · I’ve nev­er been 100% com­fort­able with this no­tion of a “group blog”, but I guess I should stop wor­ry­ing. The Aquar­i­um seems to have been a ma­jor suc­cess for the GlassFish peo­ple, and now there’s Gul­lFOSS, OpenOffice.org’s home on the bl­o­go­spher­ic range. As I write this, the lat­est post is their week­ly de­vel­op­ment sched­ule snap­shot, some­thing that more Open-Source projects would do well to post. I may up do­ing a 180° turn and think­ing that ev­ery sub­stan­tial de­vel­op­ment project should have a group blog.
 
Making Markup Correctly · I’ve en­coun­tered three dif­fer­ent Ru­by li­braries for gen­er­at­ing markup: there’s one in the CGI li­brary, there’s Builder, and there’s Mark­a­by. To some de­gree, all are heav­i­ly in­formed by the spe­cial case of gen­er­at­ing HTML; and maybe they’re OK for that. But if you want to go fur­ther and gen­er­ate XML, they’re all point­ing in the same, wrong, di­rec­tion. Maybe I’m miss­ing some­thing, but I do have an al­ter­na­tive to of­fer. Plus, I find a chance to laugh at my­self glee­ful­ly. [Up­date: Ouch! Re­fut­ed!] [Up­date: And again, more se­ri­ous­ly.] ...
 
Johnson on Feeds · Dave John­son gave a talk this morn­ing at a lo­cal XML in­ter­est group. His slides (PDF) are the sin­gle best in­tro­duc­tion and overview I’ve ev­er seen about feeds and syn­di­ca­tion and RSS and Atom and all that stuff.
 
The DOM Song · Weird­ly enough, hav­ing been around XML for so long, the last cou­ple of days have marked my first ex­po­sure to ac­tu­al DOM wran­gling in code. This ex­pe­ri­ence has driv­en many com­put­er pro­gram­mers to gloom, and even neg­a­tive ut­ter­ances. Not me! I even com­posed a song about it ...
 
Microsoft & ODF · I’ve been won­der­ing how to re­act to this Mi­crosoft ODF An­nounce­ment. Andy Upde­grove points out that the news isn’t that new, but still I see this as sig­nif­i­can­t. From a glass-half-empty point of view, I could ob­jec­t, as Bob Su­tor does, to the mis­di­rec­tion and out­right lies in the Mi­crosoft spin. Or I could echo Mark Pil­grim in point­ing out that this is cur­rent­ly large­ly va­por­ware (more de­tails here). But I think that on bal­ance the big sto­ry is that Red­mond has moved from a “There’s no de­mand for ODF” stance to ad­mit­ting that, in fac­t, there is. Cur­rent­ly, it’s large­ly a public-sector thing; and read­ing be­tween the mel­liflu­ous lines of Chris Capossela’s A Foun­da­tion for the New World of Doc­u­ments, I sense a tone of barely-suppressed fear: “We en­cour­age pub­lic sec­tor or­ga­ni­za­tions to move to XML file for­mats but not to man­date a par­tic­u­lar for­mat or implementation.” We can all agree on implementation—that’s the point, af­ter all—but to refuse to bless a for­mat seems to me to ig­nore the les­son of the We­b, writ­ten in let­ters of fire 500 feet high: agree on the smallest-possible num­ber of da­ta for­mat­s, and com­pete on what you do with them.
 
Roundup · Once again I’m drown­ing in lit­tle tech-news tid­bits that I think the world needs to look at: hence a Fri­day link­fest: Item: John Cowan’s TagSoup has reached 1.0. This is go­ing to be an es­sen­tial tool for so many peo­ple. Item: As­saf Ark­in, in Why Blogs Work, ex­plains it al­l. Item: Kim­bro Stak­en pro­vides 10 things to change in your think­ing when build­ing REST XML Pro­to­cols. Item: In­foQ has launched; does the world need an­oth­er software-news site? Quite pos­si­bly. Item: From Mark Not­ting­ham, HIn­clude; this is point­ing in the same di­rec­tion as Ingy’s Jem­plate, and un­less I’m miss­ing some­thing ob­vi­ous, it’s an im­por­tant di­rec­tion.
 
XML 2006 - Get Yer Papers In · Yow, just re­al­ized that the XML 2006 call for pa­pers is up­on us; for de­tails see David Meg­gin­son, who’s the chair. Hang­ing with the XML tribe is al­ways good fun, and I ex­pect David to do a great job of run­ning the shindig; so send him your good ideas.
 
New Neo · I’ve been kind of qui­et, and that’s be­cause the Ja­va One peo­ple low­ered the boom on me, told me that if I didn’t get the slides for my ses­sion in they were go­ing to can­cel it. So I’ve been spend­ing qual­i­ty time with Open Of­fice, in par­tic­u­lar the NeoOf­fice fla­vor. They’ve got an al­pha of their ver­sion of OO.o 2 up, and it’s a vast im­prove­ment over 1.2, with a bunch of use­ful side­bar nav­i­ga­tors and bet­ter view-switching. Al­so, it’s all-ODF. There’s some in­ter­est­ing busi­ness mod­el in­no­va­tion; al­though Neo is GPL’ed, you have to sign up and pay to join the Ear­ly Ac­cess pro­gram if you want to use the 2.0 al­pha pre-release. I didn’t hit a sin­gle bug with the al­pha in two days of hard edit­ing; I as­sume the Neo boys are slav­ing away over per­for­mance, be­cause it’s pret­ty slow at the mo­men­t.
 
More FUD · Andy Upde­grove quotes a flur­ry of egre­gious Mi­crosoft bull­shit about ODF from Ja­son Ma­tu­sow. In par­tic­u­lar: “The ODF for­mat is lim­it­ed to the fea­tures and per­for­mance of OpenOf­fice and StarOf­fice and would not sat­is­fy most of our Mi­crosoft Of­fice cus­tomers today.” In your dream­s, Ja­son. [Up­date: What Andy Upde­grove said.]
 
SAML On The March · I tell peo­ple I’m a soft­ware gen­er­al­ist, but there are lots of holes in my knowl­edge. One of them is iden­ti­ty and I re­al­ly must fix that, be­cause it’s a hot pain point both for busi­ness­es and in­di­vid­u­al peo­ple. (How many pass­words do you have?) Any­how, our own Eve Maler is one of the peo­ple you want to watch in this space, and she’s point­ing us at a bunch of ac­tion over in SAML-land, here, here, and here. For my mon­ey, the hot sto­ry is the Dan­ish re­quire­ment that if you want to do fed­er­a­tion, you should bloody well use SAML. The Danes have had pos­i­tive ex­pe­ri­ences with shared stan­dard­ized XML vo­cab­u­lar­ies, hav­ing scored a big win with UBL. I can’t imag­ine any­thing in the short term that would be of greater ben­e­fit for ev­ery­one than ubiq­ui­tous share­able iden­ti­ty ser­vices.
 
XML Automaton · In De­cem­ber of 1996 I re­leased a piece of soft­ware called Lark, which was the world’s first XML Pro­ces­sor (as the term is de­fined in the XML Spec­i­fi­ca­tion). It was suc­cess­ful, but I stopped main­tain­ing it in 1998 be­cause lots of oth­er smart peo­ple, and some big com­pa­nies like Mi­crosoft, were ship­ping per­fect­ly good pro­ces­sors. I nev­er quite open-sourced it, hold­ing back one clever bit in the mo­ron­ic idea that I could make mon­ey out of Lark some­how. The mag­ic sauce is a fi­nite state ma­chine that can be used to parse XML 1.0. Re­cent­ly, some­one out there need­ed one of those, so I thought I’d pub­lish it, with some com­men­tary on Lark’s con­struc­tion and an amus­ing anec­dote about the name. I doubt there are more than twelve peo­ple on the plan­et who care about this kind of pars­ing ar­cana. [Rick Jel­liffe has up­grad­ed the ma­chine]. ...
 
Jon and the Minotaur · Jon Bosak (fa­ther of XML, ter­rif­ic pho­tog­ra­pher, good per­son, etc.) was in Van­cou­ver for some meet­ings hav­ing to do with UBL (and be warned, there’s go­ing to be some more UBL tub-thumping around here), and en­coun­tered a mon­ster ...
 
Dr. Macro · That would be the han­dle of Eliot Kim­ber, a mem­ber of the orig­i­nal XML Work­ing Group. I count my­self among the more pro­lif­ic and pedan­tic mem­bers of the markup com­mu­ni­ty, but Eliot sets A Higher Stan­dard; in­deed, those who know him find his en­try to the bl­o­go­sphere long over­due. His tagline: “All tools suck.” He has re­cent­ly pub­lished a rant pre­view which may help you de­cide whether you want to sub­scribe, as I have.
 
On XML Language Design · If you’re go­ing to be de­sign­ing a new XML lan­guage, first of al­l, con­sid­er not do­ing it. But if you re­al­ly have to, this piece dis­cuss­es the prob­lems you’re apt to face and of­fers some ad­vice on im­prov­ing your chances of suc­cess ...
 
Don’t Invent XML Languages · The X in XML stands for “Extensible”; one big sell­ing point is that you can in­vent your own XML lan­guages to help you solve your own prob­lem­s. But I’ve be­come con­vinced, over the last cou­ple of years, that you shouldn’t. Un­less you re­al­ly have to. This piece ex­plains why. And, there’s a com­pan­ion piece en­ti­tled On XML Lan­guage De­sign, in case you do re­al­ly have to ...
 
Upcoming Gig: NYS CTG, Albany · Those acronyms stand for “New York State Cen­ter for Tech­nol­o­gy in Govern­ment”, which is un­der the um­brel­la of the Univer­si­ty at Al­bany (part of SUNY). On Jan­uary 25th in Al­bany they’re hav­ing a con­clave en­ti­tled Think­ing Beyond Your Web Site: Les­sons from the XML Testbed Pro­ject, and I’m re­al­ly look­ing for­ward to be­ing a part of it. Ob­vi­ous­ly, the in­ter­sec­tion of XML and State Govern­ment is very, very in­ter­est­ing ter­ri­to­ry at this mo­ment in his­to­ry. I’ll try to evan­ge­lize of course, but it’s more im­por­tant that I do some lis­ten­ing; I think most of us in the in­dus­try don’t have a good enough un­der­stand­ing of the is­sues the end-users down in the trench­es are fac­ing.
 
Swiss Bank Account · I’ve been cast­ing around try­ing to find some­thing to write about the ECMA rubber-stamp Mi­crosoft is buy­ing for their Of­fice file for­mats but have been un­able to rise much above “blecch”. Si­mon Phipp­s, in salut­ing IBM’s wise re­fusal to play the game, man­ages to bring some grace and even a lit­tle hu­mor to bear.
 
Drop the <!DOCTYPE> · Back when we cooked up XML in 1996-97, there were good rea­sons to have that ug­ly upper-case gib­ber­ish at the top of your XML doc­u­ments. That was al­most ten years ago; now it’s time to do away with it, and al­so time to have a spec for Doctype-free XML ...
 
John Cowan · He’s a leg­end in the XML com­mu­ni­ty, is the au­thor of TagSoup, is ridicu­lous­ly eru­dite on any num­ber of things, and is look­ing for a new job. I think he’d be a good bet.
 
RFC 4287 · I hadn’t seen the an­nounce­men­t, but this looks like a sta­ble of­fi­cial IETF link to RFC 4287, The Atom Syn­di­ca­tion For­mat. A lit­tle more work and we’ll have the pub­lish­ing pro­to­col done and I can re­turn to my plow (or equiv­a­lent). The work of the WG and ed­i­tors was just out­stand­ing, and the IETF did, as ad­ver­tised, pro­vide a use­ful quality-control pro­cess with­out un­du­ly get­ting in the way. Thanks ev­ery­one. The world now has a general-purpose syn­di­ca­tion for­mat that is smal­l, sta­ble, based on the last decade’s lesson­s, clean, and wide­ly im­ple­ment­ed. I feel hap­py.
 
Catcalls · It seems like my lit­tle thought ex­per­i­ment has touched a nerve. Scoble, Dare Obasan­jo, and Randy Hol­loway all push back, amaz­ing­ly enough all mak­ing the same ar­gu­men­t: how can I be against du­pli­ca­tion in office-document XML for­mat while at the same time be­ing mixed up in the Atom Pro­ject? The ar­gu­ment is fal­la­cious, but at least Robert and Randy made it in grown-up, po­lite terms, leav­ing the child­ish name-calling to Dare. Now, as for RSS and Atom: When I came on the scene in 2003, RSS was al­ready hope­less­ly frag­ment­ed, and there was ex­act­ly ze­ro chance of any of the large-egoed thin-skinned pro­po­nents of the var­i­ous ver­sions de­cid­ing to make nice with each oth­er. Atom is pre­cise­ly an at­tempt to re­duce the num­ber of vo­cab­u­lar­ies that im­ple­men­tors feel they have to sup­port. Turn­ing to the office-document space: right now the world has ex­act­ly one fin­ished, de­liv­ered, stan­dard­ized, totally-unencumbered, multiply-implemented XML-based of­fice doc­u­ment for­mat. You are the guys who want to in­tro­duce an­oth­er, in­com­pat­i­ble one. And I think that’s OK; but re­strict your in­ven­tion to the spe­cial­ized Mi­crosoft stuff that ODF can’t do, and don’t re-invent the ba­sic­s. Why is this con­tro­ver­sial?
 
The Saga Continues · The Mas­sachusetts Of­fice XML File For­mats saga, that is. The lat­est news is that the Mi­crosoft an­nounce­ments last week are play­ing well in Bos­ton. Com­mon­wealth sec­re­tary Thomas Tri­mar­co stat­ed “we are op­ti­mistic that Of­fice Open XML will meet our new standards”, and I’m op­ti­mistic too. Ob­vi­ous­ly the key word is “will”, since we haven’t seen what’s get­ting sub­mit­ted to ECMA and nobody’s seen what will come out of ECMA. Our own chief stan­dards geek Carl Cargill wrote Mr. Tri­mar­co a let­ter, which you can read over at Piper Cole’s we­blog.
 
Thought Experiments · I see that Mi­crosoft has post­ed a lit­i­ga­tion covenant on the Of­ficeXML for­mats (al­so read Bri­an Jones’ ex­e­ge­sis). In re­spon­se, there’s a bunch of le­gal pok­ing and prod­ding here and here; I don’t un­der­stand the le­gal ar­gu­ments, and I don’t think they’re the in­ter­est­ing part of the sto­ry any­how. So, let’s do two thought ex­per­i­ments. First, what if Mi­crosoft re­al­ly is do­ing the right thing? Se­cond, how can we avoid hav­ing two in­com­pat­i­ble file for­mat­s? [Up­date: There’s been a lot of re­ac­tion to this piece, and I ad­dressed some of those points here.] ...
 
Microsoft XML News · The newswires are buzzing to­day with Mi­crosoft XML ac­tion. So, what do you want from an XML-based stan­dard, whether it’s about syn­chro­niza­tion or spread­sheet­s? First, you want it to be sta­ble. Se­cond, you want it to be legal­ly un­en­cum­bered, so any­one can use it in their soft­ware. Th­ese things are re­al­ly es­sen­tial. Less es­sen­tial, but im­por­tan­t: you’d like it to have com­mu­ni­ty in­volve­men­t, some sort of open pro­cess; and fi­nal­ly, you’d like it to be, you know, tech­ni­cal­ly good. So let’s look at today’s head­lin­er­s, SSE and MSFT Of­fice XML. Stable? SSE at the mo­ment is just some­thing Ozzie and Win­er are kick­ing around, but who knows? As for Of­ficeXML, yup, this move to ECMA/ISO will make it sta­ble. Unen­cum­bered? SSE’s Creative-Commons li­cense looks pret­ty good to me. To­day, Jean Paoli told Scoble that they’d be do­ing some sort of “covenant not to sue” over Of­ficeXML. This would be great news, and we hope that, un­like the cur­rent li­cense, it’s GPL-friendly. This is re­al im­por­tan­t, be­cause nei­ther ECMA nor ISO have prob­lems with stan­dard­iz­ing heavily-encumbered tech­nol­o­gy. Open, trans­par­ent pro­cess­es? Wel­l, er, not ex­act­ly a Mi­crosoft strength. I hon­est­ly don’t know whether ECMA will pro­vide for mean­ing­ful in­put, or whether the process’ out­come, as for ex­am­ple OASIS al­lows, is com­plete­ly pre­de­ter­mined. You have to ad­mire the chutz­pah in pre-announcing that the ECMA and ISO pro­cess­es will fin­ish be­fore Of­fice 12 ship­s, if on­ly by min­utes, es­pe­cial­ly since one as­sumes that the idea is that Of­fice 12 is go­ing to com­ply with those stan­dard­s. Re­mark­able process-management and soft­ware de­vel­op­ment skills are ev­i­dent­ly in­volved. Fi­nal­ly, are these tech­nolo­gies ac­tu­al­ly any good? As for SSE, I don’t know a thing about syn­chro­niza­tion and Ray Ozzie knows lot­s, so I’ll hold my peace. On the Of­ficeXML side I have lots of opin­ion­s, but the opin­ion that’ll mat­ter is that of ISO JTC1 (I’d guess more specif­i­cal­ly SC34), which will soon be deal­ing with two at­tempts to stan­dard­ize a so­lu­tion to the same prob­lem. Should be fun to watch. Oh yes, and since we’re talk­ing about stan­dard­s, would MSDN please get a clue!?!?.
 
Bosworth in ACM · I rec­om­mend that ev­ery­one go read Learn­ing From the Web, a sub­stan­tial es­say by Adam Bos­worth, in the lat­est ACM Queue. It doesn’t say any­thing new that Adam hasn’t been telling ev­ery­one for the last cou­ple of years, but it’s nice to have a canon­i­cal ver­sion of his mes­sage writ­ten down some­where, for the world to point to and learn from.
 
Boston ODF Day · I spent Thurs­day the 27th in Bos­ton; I was in­vit­ed by Har­vard Law School’s Berk­man Cen­ter to par­tic­i­pate in a round-table dis­cus­sion of in­ter­op­er­abil­i­ty and stan­dards in gen­er­al, and the cur­rent Mas­sachusetts/ODF brouha­ha in par­tic­u­lar. It was in­ter­est­ing and in­struc­tive, and I rec­om­mend that any­one who cares about this check out the record­ing. To help un­der­stand the con­tex­t, there is this guy in the room from ACT who was push­ing back pret­ty hard against the new Mas­sachusetts pol­i­cy. His ar­gu­ments are lift­ed pret­ty well word for word from the Mi­crosoft talk­ing points, which was use­ful as the event might oth­er­wise have been a love-in.
 
XML 2005 · I just spent some qual­i­ty time with NXML-mode and a DocBook-derived tag-set (if the whole world would learn Emac­s, the prob­lems around XML edit­ing would dry right up­), pulling to­geth­er my pa­per, On Lan­guage Creation, for the XML 2005 con­fer­ence, next month in At­lanta. Can’t wait to hang with my tribe; be there or be ❏.
 
Rick Jelliffe · He’s been work­ing on XML since be­fore it was in­vent­ed, he knows ap­prox­i­mate­ly ev­ery­thing about XML and pub­lish­ing tech­nol­o­gy, he in­vent­ed Schema­tron (which you should be us­ing if you need to val­i­date XML in a com­plex or sub­tle way), he’s a nice guy, and he’s look­ing for a job. Go get him.
 
OpenOffice Mac Sanity · A week ago, in my OpenOf­fice.org con­fer­ence re­port, I wrote that the X11 Mac Port was be­ing aban­doned in fa­vor of a Co­coa ver­sion. Every bloody Mac site in the world picked this up as though it were a ma­jor news sto­ry, and now I hear from Pa­trick Luby, chief main­tain­er of NeoOf­fice/J, that as a re­sult, the peo­ple who’ve been sup­port­ing his work are threat­en­ing to cut him loose. This is mad­ness; at the mo­men­t, Neo/J is the on­ly ac­tu­al ship­ping ver­sion of OpenOf­fice that you can run on a Mac with the menus in the right place, with drag-&-drop and fonts that Just Work, and so on. This is go­ing to re­main the case for some time, be­cause the task of switch­ing over the cur­rent X11 ver­sion is go­ing to be huge, slow, and high-risk. (Pa­trick was al­so mad be­cause I said Neo/J was “behind”, and, with­out go­ing in­to de­tails of Ja­va and OO.o ver­sion­s, he’s got a point). So for the time be­ing, I’m go­ing to go on us­ing and sup­port­ing and prob­a­bly blog­ging about Neo/J, be­cause that’s all there is. And I still think that Ap­ple should take an in­ter­est in this work.
 
Some OO.oCon Lessons · Yeah, at the con­fer­ence there were speech­es and press brief­in­gs and so on, but the main thing was all the good stuff there to be learned, some of which is re­lat­ed here. Plus a rare live pho­to of a slash­dot­ting ex­pe­ri­ence from the in­sid­e. [Up­date: They fixed the video.] ...
 
New England Town Meeting · On the 16th of this mon­th, the Mas­sachusetts Tech­nol­o­gy Lead­er­ship Coun­cil host­ed a meet­ing at which Eric Kris­s, the state’s Sec­re­tary for Ad­min­is­tra­tion and Fi­nance, and Peter Quin­n, the CIO, dis­cussed the state’s re­cent pro­pos­al to stan­dard­ize on the Open Doc­u­ment For­mat. I re­ceived a set of meet­ing notes, which I re­pro­duce al­most as-is (spell-checked, re­moved per­son­al names and ed­i­to­ri­al­iz­ing). They rep­re­sent one attendee’s in­for­mal cap­ture of the pro­ceed­ings and have no of­fi­cial stand­ing. But there is some eye-opening stuff here. [Up­date: via David Ber­lind, there’s on­line au­dio of the meet­ing.] [Up­date: Aha! Bob Su­tor re­ports that the Mas­sachusetts de­ci­sion is now fi­nal. This is just the be­gin­ning of a long, long, road, and you know what? Mi­crosoft is too smart not to go down it; the on­ly ques­tion is when they start. See al­so Sam Ru­by on Brays, Fair­ness and Dou­ble­s­peak.] ...
 
Apple File Formats · The whole world has been giv­ing Mi­crosoft a hard time over their Of­fice XML file for­mat­s; it turns out that there are far worse sin­ner­s. Ap­ple, for one. Derek Beat­ty here at Sun ran across this write-up on their iWork (Keynote, Pages, and so on) file for­mat­s, which are XML-based. Item: there’s no at­tempt to con­form to OpenDoc­u­ment or any oth­er stan­dard. Item: they change them at will: “With the in­tro­duc­tion of Keynote 2.x, this schema file is out of date.” Item: They don’t ex­act­ly en­cour­age us­ing their specs to build soft­ware: “Although the in­for­ma­tion in this tech­ni­cal note may ap­pear use­ful, you should not re­ly on it for de­vel­op­ing or mod­i­fy­ing your own products.” And, to cap it al­l: “This doc­u­ment does not de­scribe the com­plete XML schema for ei­ther Pages 1.x or Keynote 2.x. The com­plete XML schema for both ap­pli­ca­tions is not avail­able and will not be made public.” [Em­pha­sis Apple’s.] Charm­ing stuff. [Up­date: Apple’s Ernie Prab­hakar push­es back pas­sion­ate­ly. I still don’t think any­one should store in­for­ma­tion that mat­ters in a da­ta for­mat that’s not open and doc­u­ment­ed, but Ernie makes some good points.] [Up­date: Oo­h! My own gen­uine Ap­ple Leak, on how the iWork XML got that way. Read on.]  ...
 
Got XML News? · In­ter­est­ing times in the world of XML. Now, At­lanta in Novem­ber may not be Paris in the spring, but it’s go­ing to be XML World Head­quar­ters Nov. 14-18, and the dead­line for late-breaking-news pa­pers is this Fri­day the 16th, so if you’ve got a sto­ry to tel­l, now’s the time.
 
Massachusetts Back-Room · The com­ment pe­ri­od for the new draft Mas­sachusetts office-file-format pol­i­cy end­ed last Fri­day the 9th. Dur­ing the week be­fore that date, there was some pret­ty in­tense back-room pol­i­tics go­ing on. There are a ton of in­dus­try as­so­ci­a­tions and lob­by­ing group­s, in­clud­ing: Mass Soft­ware Coun­cil, Tech­net New Eng­land, Mass High Tech Coun­cil, Mass Net­work Com­mu­ni­ca­tions Coun­cil, As­so­ci­at­ed In­dus­tries of Mas­sachusetts, and AeA. You can bet that ev­ery one of them was com­ing un­der pres­sure last week to speak up pro or con­tra the state’s po­si­tion. Since you have IBM and Sun on one side of this is­sue and Mi­crosoft on the oth­er, you can al­so bet that they were get­ting pulled both ways. I’m pret­ty sure that a lot of them end­ed up with a state­ment along the lines of “On the sub­ject of the new draft from the Com­mon­wealth of Mas­sachusetts, we’re in fa­vor of moth­er­hood and ap­ple pie.” But, I got my hands on a copy of the oth­er side’s talk­ing points, and I think they make in­ter­est­ing read­ing. [Up­date: I hear un­of­fi­cial­ly from some­one at Adobe cor­po­rate that they’re “generally hap­py with how things went”, so I was wrong, sor­ry. Fixed.] ...
 
Scott to Massachusetts · Since Scott McNealy doesn’t have his own blog, I’ll post the email he sent to the Com­mon­wealth of Mas­sachusetts to­day ...
 
Massachusetts XML · This Massachusetts-office-file-format sto­ry has legs, it’s still echo­ing around a week af­ter it broke. Oddly, there’s been rel­a­tive­ly lit­tle cov­er­age of the “this is a good move because...” for­m, so: This is a re­al­ly smart move by Mas­sachusetts... Be­cause this way, they max­i­mize the chances that the da­ta is re-usable by lots of dif­fer­ent pro­gram­s, and not just of­fice suites. Be­cause they are en­tire­ly 100% free of le­gal en­tan­gle­ments. Be­cause they max­i­mize the chances that the da­ta will still be us­able by their grand-children, in­de­pen­dent of the for­tunes of any soft­ware com­pa­ny. Be­cause if there’s some­thing that needs adding to the for­mat, there’s a stan­dards com­mit­tee whose job that is. I’m go­ing to close by quot­ing, once again, a para­graph from a let­ter that the Euro­pean Com­mis­sion sent to Sun last year, that I think says what needs to be said: Trans­paren­cy and ac­ces­si­bil­i­ty re­quire­ments dic­tate that pub­lic in­for­ma­tion and gov­ern­ment trans­ac­tions avoid de­pend­ing on tech­nolo­gies that im­ply or im­pose a spe­cif­ic prod­uct or plat­form on busi­ness­es or cit­i­zen­s. Amen.
 
Republished · At some point in the tran­si­tion to De­bian Sarge, some­thing broke in the the on­go­ing soft­ware. The perl code reads text us­ing an XML pro­ces­sor and var­i­ous pieces of it get stashed in a Mysql database. On­ly some­where along the line, non-ASCII UTF-8 char­ac­ters were get­ting trashed. I tried all sorts of stupid dodges, and was whin­ing away at Sam Ru­by via in­stant mes­sen­ger, and he said “of course, you could do it all as seven-bit ASCII via &#xBabe;... or you could rewrite it in Ru­by and It Would Be Much Better”. I shrieked “Get thee be­hind me foul tempter!” and have now jammed ev­ery­thing in­to 7-bit ASCII as it comes out of the XML parser, and of course all the prob­lems have gone away. Ac­tu­al­ly, the code got sim­pler, lots of XML es­cap­ing/unescap­ing calls are no longer nec­es­sary. This is one of the nice things about XML I guess, it al­lows you to be a good in­ter­na­tion­al­iza­tion cit­i­zen even when your soft­ware in­fras­truc­ture isn’t. It still feels evil. Any­how, the whole site’s been re­pub­lished, let me know if anything’s bust­ed. (By the way, if you’re read­ing this in my RSS feed and all the en­tries show up as new, switch to the Atom feed and that prob­lem will go away, be­cause Atom ac­tu­al­ly has unique IDs and dat­es­tamps that work.) [Up­dat­ed: Tony Coates (in­ter­est­ing new blog there, BTW) re­ports that Opera 8.02 gets it back­ward­s, which means that it’s one of the rare pieces of soft­ware that re­spects guids in RSS, but that it’s do­ing Atom 1.0 wrong.]
 
Massachusetts Ripples · While I was pon­der­ing what to write about this OpenDoc­u­ment sto­ry, it spilled all over the In­ter­net and gen­er­at­ed oceans of cov­er­age (thanks to Bob Su­tor for the link round-up). I won­der if Gov. Rom­ney has heard from Steve or Bill yet? To my eye, per­haps the best last word is this lead­er from ZDNet UK.
 
See You in Slovenia · I’m go­ing to be do­ing a keynote at the next OpenOf­fice.org con­fer­ence on Septem­ber 29th, in Koper-Capodistria. I love what OpenOf­fice is try­ing to do, and re­al­ly look­ing for­ward to my first vis­it to Slove­ni­a. Al­so, it’ll be a chance to do a speech that’s (most­ly) not about blog­ging or syn­di­ca­tion. Hope to see you there.
 
Summer School · I’ll be spend­ing the week in Ox­ford, par­tic­i­pat­ing in the CSW XML Sum­mer School, held at Wad­ham Col­lege. Osten­si­bly I’m here to lec­ture, but my re­al ob­jec­tive is to do a quick catch-up on what the XML ap­pli­ca­tion space looks like in A.D. 2005. My on­ly re­al gripe is that my ses­sion is sched­uled op­po­site XML in Health­care, which I’d re­al­ly like to at­tend. Ox­ford is ridicu­lous­ly pho­to­genic, I’ve in­clud­ed a cou­ple of snaps of Wad­ham ...
 
XML and Religion · I sus­pect that most peo­ple who read me al­so read Adam Bos­worth. But if you don’t, do.
 
New Office XML · The pop­u­lar wis­dom is that it takes Mi­crosoft un­til Re­lease 3 of any­thing to get it right; but the ear­ly word on the new Of­fice XML for­mat makes Re­lease 2 look pret­ty good. Read­ing be­tween the lines, the big news is, first, that the de­fault file-save for­mat is XML and, sec­ond, that the XML cov­er­age is com­plete (In the cur­rent Of­fice XML, Pow­erPoint is en­tire­ly ab­sent and Ex­cel has big holes). As­sum­ing Mi­crosoft pulls this of­f, it’s a ma­jor achieve­men­t. Along with patch­ing those holes, work­ing around the ba­sic OLE-container-ness of ev­ery­thing has to be trick­y; one of the nice things about MS Of­fice is that you can jam pret­ty well any­thing that talks OLE in­to the mid­dle of pret­ty well any Of­fice doc and it just work­s. I have ques­tions around the li­cens­ing: Bri­an Jones, linked above, says “royalty-free” but the cur­rent li­cens­ing lan­guage has some claus­es that make lawyers ner­vous, so let’s wait and see on that one. At one lev­el, it’s sad that while the rest of the world (in­clud­ing, late­ly, Adobe and IBM) has been hard at work on one wide-open, share­able, portable, stan­dard­ized XML of­fice doc­u­ment for­mat, Mi­crosoft put their en­er­gy in­to in­vent­ing an­oth­er one. Stil­l, this ought to be a step for­ward for Microsoft’s cus­tomer­s. The news cov­er­age says “late 2006”; good luck to the team in the tough job of get­ting it shipped.
 
OpenDocument! · On Mon­day there was what seems to me like a ma­jor news sto­ry: the an­nounce­ment that OpenDoc­u­ment 1.0 has been ap­proved as an OASIS Stan­dard. As I’ve said be­fore, OpenDoc­u­ment is al­most ex­act­ly what we had in mind when we built XML, start­ing back in 1996. Right now, it is the on­ly XML of­fice doc­u­ment for­mat that is stan­dard­ized, and it is al­so the on­ly one that is com­plete; Microsoft’s of­fer­ing is full of holes, start­ing with the ab­sence of Pow­erPoin­t. It’s al­so com­plete­ly 100% free of intellectual-property is­sues, any­one can use it for any­thing any­time any­where with­out ask­ing any­one first. Let me put it this way: if you oc­ca­sion­al­ly cre­ate doc­u­ments or spread­sheets or pre­sen­ta­tion­s, and if you think that you’d like to own them, in­de­pen­dent of your Of­fice soft­ware ven­dor, well, you have ex­act­ly one choice: OpenDoc­u­men­t. If those doc­s/spread­sheet­s/pre­sos might be long-lived, or con­tain high-value da­ta that you might want to re-use lat­er, and you don’t use OpenDoc­u­men­t, well there’s a word for that but I’m not go­ing to put it up on the front page at on­go­ing. By the way, at the re­quest of our friends in the Euro­pean Com­mis­sion, we’ve com­mit­ted to get­ting be­hind mak­ing OpenDoc­u­ment an ISO Stan­dard, too.
 
Not An April Fool’s Joke · Just fool­ish­ness. The XML Bi­nary Char­ac­ter­i­za­tion Work­ing Group has is­sued their fi­nal re­port which rec­om­mends (sur­prise, sur­prise) that the W3C pro­duce a “Binary XML” spec­i­fi­ca­tion. El­liotte Rusty Harold nails it. I don’t care if any­one wants to go off and pro­duce their own da­ta in­ter­change for­mat, bi­na­ry or not, open or not, stan­dard­ized or not, mapped to XML or not; as long as they don’t call it XML. “Binary XML” is an oxy­moron. And I should point out that the peo­ple at Sun who are build­ing a bi­na­ry da­ta for­mat with a map­ping to XML are call­ing it some­thing else en­tire­ly. Th­ese Binary-XML peo­ple are charg­ing head­long on­to the top of a very long, very steep, very slip­pery slope. [Up­date: Fur­ther joy. I see that this poorly-labelled ta­ble as­serts that XML pre­vents both “processing efficiency” and “forward compatibility”. Glad to hear it.]
 
Not An April Fool’s Joke · Norm Walsh has a densely-technical post show­ing a nasty prob­lem that’s cropped up in the in­ter­ac­tion be­tween XIn­clude, xm­l:base, and XML val­i­da­tion. Un­less you’re a se­ri­ous XML geek you prob­a­bly don’t want to wade through the de­tail­s, but in his con­clu­sion, Norm rais­es a startling point: “I think what pains me most about this sit­u­a­tion is that XIn­clude was in de­vel­op­ment for just over five years. It went through eleven drafts in­clud­ing three Can­di­date Rec­om­men­da­tion­s. Why didn’t we no­tice this un­til sev­er­al months af­ter XIn­clude was a Rec­om­men­da­tion? I’ll grant that XIn­clude is a fair­ly odd spec­i­fi­ca­tion, in the sense that it’s pro­vid­ing func­tion­al­i­ty that you’d ex­pect to oc­cur down in the pars­er (like en­ti­ties), but it’s on­ly 8,563 words long. If we can’t get a 16 page spec right in three CRs, what hope do we have of get­ting the XSL/XML Query fam­i­ly of spec­i­fi­ca­tions right? By the same met­ric I used on XIn­clude, I get just over a half mil­lion words (505,779) in those doc­u­ments. ” Half a mil­lion word­s... pret­ty scary.
 
Megginson Linkage · Another one of the key peo­ple around the birth of XML has joined the con­ver­sa­tion. This time it’s Dave Meg­gin­son, who’s best known as the chief de­sign­er of SAX, but has made con­tri­bu­tions large and small all over the uni­verse of de­scrip­tive markup. In­ter­est­ing­ly, one of his first en­tries calls for a key sim­pli­fi­ca­tion to XLink, one of the best XML ideas nev­er to have hit the big-time. Within an hour of read­ing David’s sug­ges­tion, which is of course ex­cel­len­t, I ran across Norm Walsh hold­ing forth on the same sub­ject; ap­par­ent­ly, chances are XLink will be­come more lightweight, which would be A Good Thing and might change the world, slight­ly.
 
Fast (They Say) and Open · There are a lot of peo­ple at Sun who are con­vinced that some sort of bi­na­ry XML rep­re­sen­ta­tion is a good idea. I’ve nev­er been con­vinced, but they’re se­ri­ous; they’ve draft­ed a pro­pos­al and are work­ing on get­ting it stan­dard­ized; in­for­mal­ly it’s called the “Fast Infoset” and of­fi­cial­ly it’s “ITU-T Rec. X.891 | ISO/IEC 24824-1”. I’ve been par­tic­u­lar­ly du­bi­ous be­cause it’s built on ASN.1, which I’ve had bad ex­pe­ri­ences with. But those most­ly had to do with bro­ken or un­avail­able soft­ware, and that ob­jec­tion may be moot, be­cause as Ed­uar­do Pelegri-Llopart writes, they’re ship­ping an Open Source im­ple­men­ta­tion. Ed­uar­do al­so tells me they’re get­ting lots of in­ter­est from out­side of Sun. Hey, as long as when­ev­er some­one tells me “I in­ter­change XML” that means they’re will­ing to in­ter­change streams of Uni­code char­ac­ters with an­gle brack­et­s, I’m OK.
 
Office Doc Format News · A cou­ple of low-key news items in the Of­fice Doc­u­ment XML space, worth high­light­ing be­cause I think this area is sig­nif­i­can­t, as do some im­por­tant peo­ple. First of­f, the peo­ple stan­dard­iz­ing this stuff over at OASIS (and soon, ISO) pub­lished a sec­ond draft, and, with­out any fan­fare, they changed the name from “OpenOffice.org XML Format” to “OpenDocument”, which is short­er, bet­ter, and not tied to any par­tic­u­lar im­ple­men­ta­tion. There’s ac­tion on the Mi­crosoft front too; check the mi­crosoft­.pub­lic.of­fice.xml and mi­crosoft­.pub­lic.xml news­group­s, where there are flur­ries of ques­tions dig­ging through the knot­ty cor­ners of WordML and Ex­celML; you nev­er re­al­ly un­der­stand a di­alect un­til you have to write a pro­gram to gen­er­ate it. I’m sure the de­tails will come out de­spite some cur­rent ir­ri­ta­tion, but this is a rea­son why Mi­crosoft should cast a friend­ly eye at the bor­ing, bu­reau­crat­ic, painful stan­dard­iza­tion pro­cess.
 
XTech, Time to Pitch In · This is the big Euro­pean sum­mer XML event; the call for pa­pers is clos­ing in a cou­ple days. Edd Dum­bil­l, the chair, tells me that the sub­mis­sions are ex­cel­lent on the serv­er side but there’s room on the client side, so all you XULoids and .NET-hacks and SVGers and XForm­lings, get with it. Lau­ren and I are go­ing to try to go (haven’t been to Am­s­ter­dam in ages); I’m par­tic­u­lar­ly in­ter­est­ed by the Open Da­ta track, which sounds like some­thing new in the world.
 
UBL by the Numbers · Via Jon Bosak, a point­er to this XML 2004 pre­sen­ta­tion (Pow­erPoin­t, sigh), about the Dan­ish Government’s de­ploy­ment of a bunch of XML tech­nolo­gies in­clud­ing UBL. Check out slides 4 & 5: they es­ti­mate the an­nu­al sav­ings achiev­able from in­voic­ing in UBL at some­where be­tween €100M and €160M. I may be out of step with the crowd but it seems painful­ly ob­vi­ous to me that UBL is go­ing to be huge and I don’t un­der­stand why more tech­nol­o­gy ven­dors (in­clud­ing my em­ploy­er) aren’t re­fo­cus­ing their e-business strat­e­gy around it.
 
Ms Maler · Hey, Eve’s here! I ex­pect great things. What she doesn’t high­light in her ba­sic bio is that she helped in­vent XML, and has great hair, and is fun­ny. Oh, and dig her URL. Now she’s a WordPress geek too; seems to be a grow­ing tribe.
 
xfy · The full name is xfy Tech­nol­o­gy; it’s one of the most in­ter­est­ing pieces of new XML soft­ware I’ve seen in a long time. On the sur­face it’s an edit­ing sys­tem, but the world has lots of those. There are three things that are in­ter­est­ing here. First of al­l, it’s from Just­sys­tem, a Ja­panese soft­ware ven­dor which has gone toe-to-toe with Mi­crosoft for a decade, carv­ing out their fair share (and then some) of the office-suite mar­ket; so they should be tak­en se­ri­ous­ly. Se­cond, it’s got the slick­est SVG-editing de­mo I’ve ev­er seen, you stretch shapes and watch the XML source code change, or vice ver­sa. Fi­nal­ly, they told us it was all-Java and I was watch­ing the de­mo and I was re­al­ly im­pressed at the snap­py, at­trac­tive UI, but then I got puz­zled and said “I thought you were us­ing Swing but, uh, what’s that?” “Well,” the soft-spoken young Ja­panese en­gi­neer al­lowed, “we are, but then we cre­at­ed some cus­tom controls.” It just may be that the world head­quar­ters of Ja­va UI in­no­va­tion is cur­rent­ly on the oth­er side of the Paci­fic.
 
More Relax · I of­ten cau­tion peo­ple against re­ly­ing too heav­i­ly on schema val­i­da­tion. “After all,” I say, “there is lots of ob­vi­ous run-time check­ing that schemas can’t do, for ex­am­ple, ver­i­fy­ing a part number.” It turns out I was wrong; with a lit­tle ex­tra work, you can wire in part-number validation—or pret­ty well any­thing else—to Re­laxNG. El­liotte Rusty Harold ex­plains how. Fur­ther ev­i­dence, if any were re­quired, that Re­laxNG is the world’s best schema lan­guage, and that any­one who who’s us­ing XML but not Re­laxNG should be ner­vous.
 
Extending Ruby · There’s a nice ar­ti­cle by Gar­rett Rooney over at O’Reilly’s ONLamp site. I’d al­ready men­tioned Garrett’s Ru­by wrap­per for Genx; in this piece, he us­es it as a case study on how to ex­tend Ru­by with C code. Neat.
 
Three Questions on XSD and WSDL · Last week at the Colorado Soft­ware Sum­mit, dur­ing my keynote I asked three ques­tions of the at­ten­dees, who were a few hun­dred most­ly se­nior de­vel­op­er­s, most­ly from the Ja­va ecosys­tem. (I’ve tucked a pic­ture in the body of this piece.) Do you use XML Schema? Pret­ty well ev­ery hand went up. Do you think you un­der­stand XML Schema? One hand went up. Do you like XML Schema? A scat­ter­ing of hand­s, maybe 20%. I asked the same three ques­tions about WSDL; sim­i­lar pat­tern, not quite as uni­ver­sal ex­po­sure, a few more thought they un­der­stood it. Just re­port­ing ...
 
The Fifth XML Developers’ Conference · I spent Wed­nes­day and Thurs­day at Chris Sells’ fifth XML Dev­con. This is a high-level gath­er­ing of the .NET XML (and thus Web-Services) com­mu­ni­ty. It’s be­ing blogged to the max (nice­ly ag­gre­gat­ed on the con­fer­ence site), and there’s an eWeek per­son here jour­nal­iz­ing in re­al­time. It’s been fun and ed­u­ca­tion­al ...
 
Applied XML · I’ll be at Chris Sells’ XML thing near Port­land this week, com­bin­ing the phys­i­cal risk of be­ing near an ac­tive vol­cano with the moral per­il of be­ing sur­round­ed by WS-* evan­ge­lists from Red­mond who think that the nat­u­ral lifes­pan of an XML doc­u­ment is mea­sured in mi­crosec­ond­s. I’m cranked, have ac­tu­al­ly been ly­ing awake at night think­ing of things I’d like to say to this gang and won­der­ing if they can or should be said. I’m of­ten guilty of ar­riv­ing at a con­fer­ence in time to speak and be­ing on the next plane out, but I’ll take in most of this one. It’s a way to dis­cov­er a ma­jor con­ti­nent on plan­et XML that, for me, has re­mained large­ly un­ex­plored. Cur­rent work­ing ti­tle of my talk: Bits on the Wire: Les­sons From the Syn­di­ca­tion Ex­plo­sion.
 
Smart EC · Be­cause of the way on­go­ing works I need fair­ly short head­li­nes, which is a pity, be­cause for this piece I want­ed to use The Euro­pean Com­mis­sion Makes Ex­treme­ly Smart Moves Con­cern­ing Open XML-Based Of­fice Doc­u­ment For­mats and Brow­beats Ven­dors Deft­ly; As a Re­sult the Open Of­fice XML For­mat Will Prob­a­bly Be­come an ISO Stan­dard ...
 
Genx Status · This is the per­ma­nent sta­tus page for Genx (tar­ball · docs). Genx is a li­brary, writ­ten in the C lan­guage, for gen­er­at­ing XML. Its goals are high per­for­mance, a sim­ple and in­tu­itive API, and out­put that is guar­an­teed to be well-formed; the out­put is al­so guar­an­teed to be Canon­i­cal XML, suit­able for use with digital-signature tech­nol­o­gy. There is a Python wrap­per. Genx comes with a GPL-Compatible but non-viral Open-Source li­cense. Lat­est news: In pro­duc­tion, car­ry­ing hun­dreds of thou­sands of sub­ti­tles per day; think­ing of tak­ing off the “beta” stam­p ...
 
On Custom Schemas · Not so long ago, I wrote a piece about open doc­u­ment for­mats. Just to­day there was an in­ter­est­ing (as al­ways) follow-up from Jon Udell, but what I want­ed to ad­dress here is Dare Obasanjo’s take, which is pret­ty well the Mi­crosoft par­ty line (not that Dare’s al­ways a party-line guy): the Of­fice soft­ware and its doc­u­ment for­mats are win­ners be­cause they al­low the use of cus­tom schemas for of­fice doc­u­ments. That’s more im­por­tan­t, they say, than the dodgy li­cens­ing terms and the miss­ing pieces. I used to be­lieve that cus­tom schemas for of­fice doc­u­ments were gen­er­al­ly a good idea, but I no longer do. Here’s why ...
 
Another Year’s Harvest · There’s a gen­er­al air of gath­er­ing ex­pec­ta­tion around the house; Lau­ren chairs the big an­nu­al XML con­fer­ence and this week­end is the clos­ing date for pa­per sub­mis­sion­s. She keeps van­ish­ing be­hind the near­est com­put­er to check the in­box, and com­ing back with a smile. Then the assembly-line takes over: re­view­er as­sign­ments, re­view­er chivvy­ing, pa­per se­lec­tion, and then it’s get­ting in­to lo­gis­tics time. If you’ve nev­er been to one of the­se, you should go; you can get deep­er in­to the deep XML is­sues in a cou­ple hours in the ho­tel bar than in a week of sem­i­nars any­where else. And if you’re do­ing some­thing red-hot and ex­cit­ing with XML that the world needs to hear about, it’s (bare­ly) not too late to put vir­tu­al pen to vir­tu­al pa­per.
 
OpenOffice · I spent the day Thurs­day at StarOf­fice in Ham­burg and came away with some of my ideas about XML & blog­ging changed. It was a side-trip; oth­er busi­ness took me to Brus­sels and OpenOf­fice wasn’t that far away and I had an agen­da there, which we’ll get to. But this is im­por­tant stuff, I think. [Up­dat­ed with a point­er.] [And again with Ge­of Glass’ OO.o-to-blog gate­way.] ...
 
TCP is So Over · Most of us have been hear­ing ru­mors about this skunkworks XCP thing for some time, but now they seem to be open to the pub­lic. As they say, “Light the Fiber!” Think about it this way: I first went to the mat with TCP/IP in 1984, when 4.2b­sd hit the street­s. A twenty-year run is plen­ty for most tech­nolo­gies, and I’d say TCP has pret­ty well had its time in the sun.
 
<’s Pointy End · Dave Walk­er over at freeform good­ness catch­es me with my XML pants, fig­u­ra­tive­ly speak­ing, down. I wrote a piece about leav­ing the W3C TAG en­ti­tled (clev­er­ly I thought) </TAG>. Un­for­tu­nate­ly that < in the ti­tle caused all sorts of grief and break­age, both here at on­go­ing and down­stream in the world of syn­di­ca­tion and ag­gre­ga­tion. I can fix my own prob­lem­s, but it’s deep­er down­stream; long ter­m, the an­swer is Atom. Here­with some thoughts on good pro­gram­ming prac­tices and the larg­er prob­lem. [Up­date: A cou­ple of notes on the “href problem.”] ...
 
Office Markup Languages · I’ve been at Sun less than a day and this guy gets in touch. Guy: “You’re the XML ex­pert, right?” Tim: “Well, er.” Guy: “Open Office’s XML is bet­ter than Mi­crosoft Of­fice XML, right?” Tim: “Uh, I don’t know.” ...
 
Genx Alpha · I just post­ed a Genx tar­ball; the doc­u­men­ta­tion is sep­a­rate­ly avail­able here. This is Al­pha code, not be­cause it’s all that bug­gy (it doesn’t do that much, af­ter al­l) but be­cause it’ll quite like­ly change once some oth­er smart peo­ple see the prob­lems I haven’t. There are quite a few de­par­tures from the de­signs I post­ed ear­li­er and where the en­su­ing dis­cus­sion got to, sim­ply be­cause I’ve now writ­ten the code; and I’m nev­er smart enough to un­der­stand the prob­lem un­til I’ve writ­ten the code. For those who care about such things, dis­cus­sion will prob­a­bly be most­ly on the XML-dev mail­ing list. Genx cur­rent­ly has an ultra-minimal copy­right state­ment but I plan to adopt the lat­est rev of the Apache copy­right be­fore I do an­oth­er re­lease. [Up­dat­ed: Oop­s, tar­ball was mis-placed; it’s there now.]
 
Writing Genx · In be­tween beach time and rain­for­est time, I’ve been cod­ing away on genx; here­with some im­pres­sions with one im­por­tant les­son and an in­ter­est­ing bit of his­to­ry ...
 
Reflexive Naming · I’m work­ing right now on the de­sign of an XML vo­cab­u­lary with an el­e­ment whose name is at­tribute which has an at­tribute whose name is name. It makes it hard to talk about the XPaths with­out a lot of stut­ter­ing.
 
Genx · It seems there’s some con­sid­er­able de­mand for a C-callable API which will write XML safe­ly and ef­fi­cient­ly. I sketched out an in­ter­face de­sign which you may pe­ruse here; I think it’ll be pret­ty self-evident to the C-literate. It com­piles and I wrote and test­ed the genxS­canUTF8() method, so it’s not en­tire­ly va­por. Upon con­sid­er­a­tion, I think it will be vir­tu­al­ly no ex­tra work to make it emit Canon­i­cal XML, ready to be signed, sealed and de­liv­ered (and Rich Salz said he would help) so why not? Ma­jor thanks to An­tho­ny J. Starks for the name—I am not a mem­ber of Gen X my­self, but I do share a city with Cou­p­land, so there you go. Since on­go­ing doesn’t have com­ments, I’ll post a point­er to this item over in the xml-dev mail­ing list, which is a nat­u­ral place to dis­cuss it. It would be very sur­pris­ing if this first-cut sketch didn’t con­tain some stupid er­rors, so go get ’em.
 
On Writing XML · In a re­cent es­say I of­fered, giv­en de­mand, to au­thor some XML-writing soft­ware. There’s been quite a bit of feed­back, and the con­sen­sus seems to be that the Ja­va com­mu­ni­ty is fair­ly well-served with XML writ­ing soft­ware, but that this would be re­al use­ful at the C lev­el. So that’ll be my cod­ing fun for the month of Fe­bru­ary. The rest of this es­say lists some of the Ja­va op­tions that peo­ple told me about, and in­tro­duces some is­sues around the C im­ple­men­ta­tion ...
 
History of XML Error Handling · I en­cour­age ev­ery­one to go and read Mark Pilgrim’s re­mark­able overview of the his­to­ry of XML error-handling. His sum­ma­ry is In the end, Tim ba­si­cal­ly said “there are two camps here, they both have good points, we aren’t go­ing to con­vince each oth­er on this one” and then pro­ceed­ed to com­pro­mise by do­ing it his way. Mark’s se­lec­tion of out-takes from the de­bate would seem to sup­port that nar­ra­tive. Ex­cuse me while I go off in a cor­ner and shake off the mega­lo­ma­ni­a. Let’s get re­al: even my Mom wouldn’t be­lieve that I could single-handedly im­pose so fun­da­men­tal a pol­i­cy de­ci­sion on this large and pas­sion­ate a com­mu­ni­ty by say­ing “Make it so.” What hap­pened was, we had a re­al­ly big, re­al­ly long, re­al­ly pas­sion­ate ar­gu­ment on the sub­jec­t; the camps came to be called “Draconians” and “Tolerants.” After this had gone on for some weeks and some hun­dreds of email­s, we took a vote and the Dra­co­ni­ans won 7-4. And in­deed, some among the Tol­er­ants cried foul over that vote. This was a good ex­am­ple of what we mean when we say “rough consensus” in that even those on the short side of the vote were will­ing to de­fend the pro­cess and the out­come; see Hol­lan­der and Sperberg-McQueen. Other in­ter­est­ing glimpses in­to this his­to­ry may be found here and, giv­ing the last word, as is ap­pro­pri­ate, to Jon Bosak, here.
 
The Three-Legged Future · There’s a re­al in­ter­est­ing note from Camp­bell and Swigart lament­ing the fact that, down in the cod­ing trench­es, the worlds of ob­jects and of RDBMSes and of XML are far from uni­fied, and that at­tempts in that di­rec­tion have been less than en­thralling. I think we just have to get used to it, and from here on in, the prac­tice of soft­ware en­gi­neer­ing is a three-legged dis­ci­pline ...
 
Deep XML · At the re­cent XML con­fer­ence, Norm Walsh host­ed a noc­turne on Prac­ti­cal RDF, the high­light of which was his tour through the nor­man.wal­sh.­name se­tup. From the out­side you may think this is a mere blog, but it’s ac­tu­al­ly a side-effect of a fright­en­ing­ly gnarly con­flu­ence of meta­da­ta streams which are shak­en and stirred to pro­duce a sprawl­ing net­work of re­sources a small part of which you might want to pe­ruse for Norm’s news & views. I have a pic­ture that made the au­di­ence at the ses­sion gasp in dis­be­lief ...
 
Notes on Bosworth · Adam Bos­worth been dis­cussing what he calls a “Web Ser­vices Browser” for months over at his blog, but I was re­al­ly hav­ing trou­ble get­ting the point. After his speech here at XML 2003, I think I sort of get it ...
 
A Day Late in Philly · I got to the big XML con­fer­ence here on its sec­ond day and it looks like I missed lots of in­ter­est­ing stuff. Oh well, I’m just here to hang out and chat, and I see by what I read that there are a few folks here that know but haven’t met. Flag me down for a talk, y’all; Wed­nes­day I’ll be wear­ing a pink sports-jacket so I’m easy to spot.
 
On Search: XML · Search­ing is all about tex­t, and the pro­por­tion of all the world’s text that is XML keeps get­ting high­er and high­er. So if you’re go­ing to do search, at some point you’re go­ing to have to think about search­ing XML. Here­with a sur­vey of some of the is­sues and prob­lems (which, like oth­er es­says as we ap­proach the end of On Search, con­tains opin­ions among the re­portage) ...
 
UTF-8+names · Here’s the prob­lem. You want to put “funny” char­ac­ters in your XML, ones that aren’t on your key­board, like “ñ” isn’t in Greece and “Δ” isn’t in Mex­i­co. XML has a bunch of ways to do this; some of them re­quire so­phis­ti­cat­ed soft­ware, oth­ers are re­al­ly ug­ly, and if you want to avoid both the ug­li­ness and the fan­cy soft­ware, you can use a DTD. Ex­cept for peo­ple don’t want to use DTDs ei­ther. This set of is­sues has been dark­en­ing the XML skies for years now, but we may have stum­bled on a way out of the box. (Warn­ing: Bit-banging tech­ni­calia of in­ter­est on­ly to XML ob­ses­sives) ...
 
Emacs, XML, Unicode · I was struck by Norm Walsh’s es­say Good­bye DTDs, in which he talks of go­ing to an all-RelaxNG en­vi­ron­men­t, no more DTDs. Within sec­onds of see­ing it I IM’d him ask­ing “What about spe­cial characters?” and he point­ed out that there would still be some en­ti­ty dec­la­ra­tions around. on­go­ing has a DTD too, but I’d rather it didn’t, so I de­cid­ed to see if I could wres­tle Emacs to the ground so I wouldn’t need one. Of pos­si­ble in­ter­est on­ly to the eleven peo­ple in the world who ed­it XML in Emacs and know what “i18n” stands for. [Up­dat­ed; skip to the end for a neato char-insertion func­tion.] ...
 
nXML Oh My · I just went and got James Clark’s new nXML Emacs XML edit­ing mod­e. I poked around a bit, won­der­ing which XML pars­er and Re­laxNG en­gine he was us­ing, and wor­ry­ing how much trou­ble I’d have get­ting this run­ning and hooked in to my hand-compiled Emacs here on OS X. No, it’s not like that. There are 12,587 lines of elisp here ap­par­ent­ly im­ple­ment­ing a com­plete XML 1.0 pro­ces­sor and Re­laxNG val­i­da­tion en­gine. Words fail me.
 
Spaghetti Doesn’t Want to be Free · A bril­liant note from Rick Jel­liffe of Topolo­gi, on the sub­ject of W3C XML Schemas, from which I ex­cerp­t: Any suf­fi­cient­ly mono­lith­ic tech­nol­o­gy is in­dis­tin­guish­able from spaghet­ti. Once a large tech­nol­o­gy is made from suf­fi­cient­ly in­ter­twined part­s, there is no way to or­der an ex­po­si­tion of it such that strongly-connected ideas are al­ways close to­geth­er. Spaghet­ti doesn't want to be free. (At least, "no way" to or­der the ex­po­si­tion with HTML-style pages: maybe WXS needs some­thing more like Nelson's tran­sclu­sion, where you can pull in frag­ments (with­out los­ing their con­tex­t) and em­bed them in­to run­ning tex­t, with­out the main­te­nance penal­ty of du­pli­cat­ed sec­tion­s.) In­deed, I think that is a for­got­ten ra­tio­nale for XML over SGML: dumb­ing down an in­ter­twined tech­nol­o­gy so that it could have a spec straightforward-enough that peo­ple could con­ve­nient­ly read it.
 
Dracon and Postel · There’s been a flur­ry of de­bate over in the PEAW mail­ing list about how to deal with bro­ken feed­s. Si­mul­ta­ne­ous­ly, Aaron Swartz as­serts Postel’s Law Has No Ex­cep­tions. Here­with a bit of back-fill on the rel­e­vant his­to­ry and trib­al knowl­edge, an ex­cur­sus in­to Athe­ni­an ju­rispru­dence, and opin­ions on what PEAW should do ...
 
Markup, Namespaces, and Meaning · Jon Udell has been think­ing so fu­ri­ous­ly about mix­ing names­paces and the mean­ing of markup that I imag­ine a vis­i­ble swirl of su­per­heat­ed brain en­er­gy above his home of­fice. I think that this whole area of thought is what over in the W3C TAG we re­fer to as a “rat-hole”. I.e., some­thing you can van­ish down nev­er to re-appear, or at least a place where you can waste a lot of time scur­ry­ing along twisty lit­tle pas­sages. Here­with some (I hope) de­mys­ti­fi­ca­tion ...
 
Namespace Pedantry · There is one as­pect of XML names­paces that keeps con­fus­ing peo­ple, and since I wrote the spec­i­fi­ca­tion, it’s at least part­ly my fault. In the last week alone it’s caught both Jon Udell and Aaron Swartz. There is no such thing as the “blank namespace” or the “empty namespace” or the “unqualified namespace.” An el­e­ment or an at­tribute is ei­ther in a names­pace or not, and if it’s in a names­pace, the names­pace has a name, and the name isn’t blank. I en­close an ex­am­ple and a bit more ex­pla­na­tion ...
 
XML Tribal Bash, Get Yer Papers In · Way back be­fore there was XML, there was SGML, and there was one big SGML con­fer­ence a year, with unimag­i­na­tive names: “SGML 1990”, “SGML 1991”, and so on. 1990 is when I start­ed go­ing. XML was an­nounced to the world at SGML ’96, an oc­ca­sion I’ll re­mem­ber till I die. All this is a lead-in to a plug for today’s ver­sion of that con­fer­ence, still unimag­i­na­tive­ly named: XML 2003. In par­tic­u­lar, I’d like to en­cour­age the kind of peo­ple who read me here to think about send­ing in a pa­per and get­ting on stage. Read on for de­tail­s ...
 
Generating Word · Via Don Box, I see that Oleg Tkachenko is gen­er­at­ing WordML and feed­ing the out­put to Word, and it's work­ing! ...
 
On Semantics and Markup · The term “Semantic Markup” is bandied about freely, and with ev­ery year that pass­es, it makes me more and more ner­vous. Here­with an ex­plo­ration of what, if any­thing, those two terms mean when placed side by side. (Warn­ing: way too long.) ...
 
Infopath · c|net says that Mi­crosoft won't be in­clud­ing In­fopath (former­ly known as XDoc­s) in the ba­sic MS Of­fice bundle. This seems all wrong, I don't get it ...
 
Why XML Doesn't Suck · Re­cent­ly in this space I com­plained that XML is too hard for pro­gram­mers. That ar­ti­cle got Slash­dot­ted and was sub­se­quent­ly read by over thir­ty thou­sand peo­ple; I got a lot of feed­back, quite a bit of it in­tel­li­gent and thought-provoking. This note will ar­gue that XML doesn't suck, and dis­cuss some of the is­sues around the dif­fi­cul­ties en­coun­tered by pro­gram­mer­s ...
 
XML Is Too Hard For Programmers · XML is a bounc­ing thriv­ing five-year-old now, and yet I've been feel­ing un­sat­is­fied with it, par­tic­u­lar­ly in re­cent times. In par­tic­u­lar in my ca­pac­i­ty as a pro­gram­mer ...
 
Let's Move XML-dev Now! · To the ex­tent that there is such a thing as an XML com­mu­ni­ty, it's found at a few con­fer­ences and on the xml-dev mail­ing list. Like many elec­tron­ic com­mu­ni­ties, xml-dev suf­fers from a few te­dious per­math­read­s, from reg­u­lar child­ish rant­ing, and from side-trips in­to the ab­struse. But if you ask a hard tech­ni­cal ques­tion on XML there, you'll prob­a­bly get an an­swer, al­most im­me­di­ate­ly. The prob­lem is that the mail­ing list is mis­man­aged, bro­ken, un­re­li­able, in­ac­ces­si­ble, and re­al­ly ought to find a new home with com­pe­tent grownup min­der­s ...
 
When Is it OK To Invent New Tags? · Tan­tek Çelik, smart Mi­crosoft brows­er guy, is blog­ging from the big W3C meet­ing now go­ing on in Bos­ton. Among oth­er things, he's mad be­cause some W3C spec­i­fi­ca­tions are writ­ten not in HTML but in a com­plete­ly dif­fer­ent XML lan­guage called xml­spec, and that lan­guage has some tags that are a lot like HTML tags, so why don't we just use HTML tags? I'll ad­dress some of the his­tor­i­cal back­ground and specific­s, but Tan­tek is point­ing at a re­al im­por­tant is­sue in the world of XML: when do you in­vent your own lan­guage, and when do you re-use some­one else's? Warn­ing: long, and load­ed with markup de­sign the­o­ry and ob­scure stan­dards his­to­ry. ...
 
Bosworth et al on XML, SOAP, Binary Data · To­day sees the pub­li­ca­tion of a thought piece by six au­thors the first of whom, al­pha­bet­i­cal­ly and (in my view any­how) in in­tel­lec­tu­al stand­ing, is Adam Bos­worth ...
 
Small XML-dev Flame War · I am a mem­ber of the xml-dev mail­ing list, the orig­i­nal XML-zealot con­clave and home to most of the peo­ple in the world who wor­ry se­ri­ous­ly about XML in gen­er­al; a very spe­cial and for­tu­nate­ly small shared ob­ses­sion ...
 
Examples, Examples, Examples! · To­day there was a re­lease of a draft of the UBL draft spec­i­fi­ca­tion­s. I pulled them down and there's le­galism­s, and def­i­ni­tion­s, and schemas, and UML, but not one sin­gle ex­am­ple of a UBL mes­sage.... Argh!!!!! ...
 
XML and Me · I helped in­vent XML. It hap­pened (most­ly) be­tween Ju­ly and De­cem­ber 1996. There were 11 peo­ple who did the heavy work on the XML Work­ing Group. There were three co-editors of the of­fi­cial XML spec­i­fi­ca­tion. I was one of the eleven and one of the three ...
 
The Importance of Bits on the Wire · Orig­i­nal­ly a rant­ing email to Dave Win­er pro­voked by some sil­ly state­ment out of Ap­ple I think. ...
 
author · Dad · software · colophon · rights
Random image, linked to its containing fragment

By .

I am an employee
of Amazon.com, but
the opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.