What happened was, I needed a small improvement to Expat, probably the most widely-used XML parsing engine on the planet, so I coded it up and sent off a PR and it’s now in release 2.3.0. There’s nothing terribly interesting about the problem or the solution, but it certainly made me think about coding and tooling and so on. (Warning: Of zero interest to anyone who isn’t a professional programmer.)

Back story · As I mentioned last month, I took a little programming job partly as a favor to a friend, of writing a parser to transmute a huge number of antique IBM GML files into XML. It wasn’t terribly hard but there was quite a bit of input variation so I couldn’t be confident unless I checked that every single output file was proper XML (“well-formed”, we XML geeks say).

Fortunately there’s an Expat-based command-line tool called xmlwf that can scan XML files for errors and produce useful human-readable complaints, and it operates at obscene speed. So what I wanted to do was run my parser over a few hundred GML files and then say, essentially, xmlwf * in the output directory.

Which didn’t work because, until very recently, xmlwf would just stop when it encountered the first non-well-formed file. So I added a -k option (“k” for “keep going”) so it could run over a thousand or so files and helpfully complain about the two that were broken.

Lessons from the PR · Most important, I hadn’t realized how great the programming environment is inside Amazon. It’s all git, but there’s no need for branches or PR’s. You make your changes, you commit, you use the tooling to launch a code review, you argue, you make more changes, you (probably) commit --amend (unless you think multiple commits are more instructive for some reason), and this repeats until everyone’s happy and you push into the CI/CD vortex.

Obviously other people might be working on the same stuff so you might have to do a git pull --rebase and there might be pain sorting out the results but that’s what they pay us for. (Right?)

Anyhow, you end up with a nice clean commit sequence in your codebase history and nobody ever has to think about branches or PR’s. (Obviously some larger tasks require branches but you’d be amazed how much you can live without them.)

Finding: Pull requests · Now that I’m out in the real world, it’s How Things Are Done. For good reasons. Doesn’t mean I have to like them. As evidence, I offer How to Rebase a Pull Request. Ewwww.

Finding: Coding tools · The last time I edited actual C code, nobody’d ever heard of Jetbrains and “VS Code” would have sounded like a mainframe thing. I found the back corner of my brain where those memories lived, shook it vigorously, and Emacs fell out. The thing I’m now using to type the text you’re now reading. Oh, yeah; that was then.

C code in Emacs in 2021

It’s 2021. No, really.

It worked fine. I mean, no autocomplete, but there was syntax coloring and indentation and whole cubic centimeters (probably) of brain cells woke up and remembered C. Dear reader, back in the day I wrote hundreds and hundreds of thousands of lines of the stuff, and I guess it doesn’t go away. In fact, the number of syntax errors was pretty well zero because the fingers just did the right thing.

Finding: The Mac as open-source platform · It’s not that great. Expat maintainer Sebastian Pipping quite properly drop-kicked my PR because it had coding-standards violations and a memory leak, revealed by the Travis CI setup. I lazily tried to avoid learning Travis and, with Sebastian’s help, figured out the shell incantations to run the CI. Only on the Mac they only sort of worked, and in particular Clang failed to spot the memory leak.

The best way to deal with this is probably to learn enough Docker (Docker Compose, probably) to make a fake Linux environment. I was well along the path to doing that when I realized I had a real Linux environment, namely tbray.org, the server sending you the HTML you are now reading.

(Except for it’s a Debian box that couldn’t do the clang-format coding-standards test but that’s OK, my Mac could manage after I used homebrew to install coreutils and moreutils and gnu-sed and various other handsome ecosystem fragments.)

I mean, I got it to go. But if I do it again, I’ll definitely wrestle Docker to the ground first. Which is irritating; this stuff should Just Work on a Mac. Without having a Homebrew dance party.

C · Well, yeah. We shouldn’t diss it too much, basically every useful online service you interact with is running on it. But after my -k option was added, clang found a memory leak in xmlwf. Which I tracked down and yeah, it was real, but it had also been there before my changes. And it wouldn’t be a problem in normal circumstances, until it suddenly was, and then you’d be unhappy. Which is why, in the fullness of time, most C should be replaced by Go (if you can tolerate garbage-collection latency) and Rust (if you can’t). Won’t happen in my lifetime.

Anyhow · Thanks to Sebastian, who was polite in the face of my repeated out-of-practice cluelessness. And hey, if you need to syntax-check huge numbers of XML files, your life just got a little easier.



Contributions

Comment feed for ongoing:Comments feed

From: TimD (Mar 25 2021, at 10:54)

Get virtual box and stand up a real linux distribution, if you're going to do more of this. Nothing like using the real thing.

[link]

From: Thom Hickey (Mar 25 2021, at 15:00)

I suppose it's nice to have the functionality, but I would have done with an external script.

--Th

[link]

author · Dad · software · colophon · rights

March 24, 2021
· Technology (87 fragments)
· · Software (73 more)
· · XML (136 more)

By .

The opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.