Matthew-6:9.txt#line=,1

That would be Our Father, who art in heaven. Which is to say, when you a URI that has #whatever on the end, what #whatever means depends on what kind of thing it is. Up till now, if it was just a chunk of text (text/plain in web-geek lingo), it didn’t mean anything. Now it does. I have a special interest in this one.

The special interest isn’t that I actually particularly need to do this; in fact, my first reaction to the draft was “Whatever for!?!?” but they had some decent motivating use-cases.

Once I got past that I had a big problem with the draft I read. It had an OK model of character/line addressing, as it still does. But it also had addressing by regex match, and furthermore, you could combine all the different selectors so as to point at a more or less arbitrary collection of discontinuous chunks of text.

I wrote back “YAGNI. Decent idea, but why not just slice it back to characters and lines and see if that’ll do it?” Some people who know me will be snickering at this point, because I am notorious for advising everybody to slash their draft specs brutally. It usually doesn’t work. Well, for the first time ever in my entire career, after a couple of back-and-forths with Erik and Martin, they said, basically, “OK”. I owe them a beer next time I see them.

Hey there, browser guys, you might as well implement this one, how hard can it be?

Contributions

Comment feed for ongoing:

From: Arve (Dec 03 2007, at 15:51)

I find the line addressing to be decent, and I might incorporate it into the UserJS I use for enhancing text output, but anything beyond that is IMO overkill.

[link]

From: Michael C. Harris (Dec 03 2007, at 17:45)

It should be Matthew-6:9.txt#line=,1 (with the equals sign), shouldn't it?

[link]

From: Seth A. Roby (Dec 03 2007, at 23:48)

So, using the plain text draft you linked to, how are we to pull up (for example) the abstract, without getting all the extraneous spacing added to format the document? As far as I can tell, you can't.

The root cause here is the draft being plain text, when it shold be something like HTML. It's closely related to the issue you raised regarding Bill de hÓra's name.

(Of course, making the document HTML still wouldn't be enough, in most cases. Linking to #abstract will probably just get me the title of the abstract; the actual text is generally not in that anchor. You'd need an XPath expression to be really exact.)

Still, even if it's not Turing Complete or anything, it's loads better than not being able to link to text/plain docs at all, so it's a step in the right direction.

[link]

From: Erik Wilde (Dec 04 2007, at 01:04)

http://dret.typepad.com/dretblog/2007/11/fragment-identi.html has some more information about the draft. amaya already implements it, and i hope to see some support in at least the not-so-slow-moving major browsers pretty soon. it's really not hard to implement.

http://dret.net/netdret/docs/draft-wilde-text-fragment-09.html is the current version of the draft, and it will be the text of the RFC (minus section 8). the actual publication of the RFC can take quite a bit more time, but there will be no more changes to the draft's text.

[link]

From: Sander (Dec 04 2007, at 01:13)

Seth: HTML version here: http://dret.net/netdret/docs/draft-wilde-text-fragment-09.html (probably one of many places, but this was the first one google pulled up for me).

[link]

From: Daniel Veillard (Dec 04 2007, at 04:55)

Unfortunately XInclude forbids

using fragment identifiers for

included resources. That and

parse="text" would have allowed to

import the enormous amount of text

only data onto modern XML data

processing toolchains.

I don't see a good way to use this

from an XML toolchain, maybe I'm

blinded and miss something !

Daniel

[link]

From: Erik Wilde (Dec 04 2007, at 12:09)

daniel: xinclude does not really "forbid" it, but it is unaware of the fact that there might be a fragment identifier for plain text documents. therefore, allowing fragment inclusion would violate the specification (which requires complete inclusion) and would have to be explicitly requested. such a request could be made in any combination of the following methods:

- the xml document containing xinclude elements specifies that fragments should be extracted. this could be done through a global setting (attribute, processing instruction) or through an attribute on the xinclude element.

- the application calling xipr could make the request, the two possible methods would be to use a special mode (xipr:xipr-with-text-fragments) or to use a parameter to control the behavior.

xinclude separates uris and fragment identifiers (at least for the xml case, where there is a separate @xpointer attribute). so the same scheme could be used for text fragments. this would be completely transparent to plain xinclude processors. xinclude allows other attributes than xinclude 1.0 attributes for include elements:

"Attributes other than those listed above may be placed on the xi:include element. Unprefixed attribute names are reserved for future versions of this specification, and must be ignored by XInclude 1.0 processors."

the (non-normative) xml schema even has an explicit <xs:anyAttribute namespace="##other" processContents="lax"/> for the include element.

[link]

From: Aidan Kehoe (Dec 07 2007, at 03:38)

YAGNI? Interesting, “go to http://..., search for [vaguely-specific text]” was and will probably remain my normal approach to referring people to paragraphs and so on on large HTML pages. Because, well, very little web text has hard line breaks :-) .

[link]

From: Michael C. Harris (Dec 11 2007, at 20:06)

@aidan, except that this is about text/plain documents, not "large HTML pages".

[link]

ongoing

What this is ·

Truth · Biz · Tech

author · Dad
colophon · rights

December 03, 2007
· Technology (90 fragments)
· · Web (398 more)

By Tim Bray.

The opinions expressed here
are my own, and no other party
necessarily agrees with them.

A full disclosure of my
professional interests is
on the author page.

I’m on Mastodon!