codedread

More On XPointer

As an update to yesterday's post I thought I'd elucidate what I've learned about XPointer.

First, some high-level overview: XPointer is "an extensible system for XML addressing", meaning that with it, you can address arbitrary portions of XML documents. XPointer is specified by the XPointer Framework and then realized through various "schemes".

The element XPointer scheme

The element scheme allows to do relatively simple addressing within an XML document. You can address an element that has id="foobar" like: element(foobar). You can also use a step-wise navigation within the element scheme to get at an arbitrary element. For example, element(/1/3/5) would address the fifth child of the third child of the first element (i.e. the root). Another example would be element(baz/2) which addresses the second child of the element with id="baz".

Now, how can you use an XPointer address? Well, the way in which I'm interested in doing this is by using it as a URL fragment. For instance, in yesterday's post I talked about linking to the "NodeList" section of this page. Well to do that with XPointer framework using the Element scheme, my URL would be:

http://www.w3.org/TR/DOM-Level-2-Core/ecma-script-binding.html#element(/1/2/3/5/1/15)

There are some problems with the above, not least of which is that the above document is NOT well-formed XML (see the <link> tags for starters). No, unfortunately the document is HTML tag soup. Could an XPointer implementation support an HTML document?

This brings me to another topic: Implementation Support of XPointer. Anne let me know that Mozilla has some form of XPointer support. I guess this bug was used to introduce it. It could enable some really cool features. But there are some problems with it. And also, someone wants to drop support because no one uses it yet (you have to love the referenced YouTube video in that bug, though).

So I took the HTML page and reformatted it to be proper XML (defined the   entity, closed all <link>, <hr>, <br> elements and wrapped the script in <![CDATA[ ]]> tags). Once I did this, the following URL worked in Firefox 2.0:

file://d/ecma-script-binding.xml#element(/1/2/3/5/1/15)

The fact that this works on XML documents but not on XHTML documents (and especially HTML documents) is a crying shame. I would love to use this feature to direct people to static, non-changing web pages like specifications...

The xpointer XPointer scheme

The element scheme is arguably the least complicated XPointer scheme. The other scheme of interest is the confusingly named xpointer XPointer scheme. This scheme uses the XPath language to allow richer addressing into documents. For instance, to address the first instance of text "foobar", I could use: xpointer(string-range(/, foobar)). Now wouldn't that be cool? Then I could change my URL to:

http://.../ecma-script-binding.html#xpointer(string-range(/,square%20bracket))

so that the reader is instantly drawn to the words "square bracket" in that document.

Now I realize that XPointer is inherently brittle and can break on even the simplest document change (hell, regular web links are similarly brittle when you come down to it). I also realize that the preferred mechanism is for authors to use the id attribute wherever possible, since all browsers support the simple #some-id anchor. Furthermore, I realize there are problems with trying to use XPointer (a XML technology) on HTML documents. However, despite all that, I still think this would be a cool feature to be able to use on those web pages that are static and not likely to change over time ("frozen" specs, blog entries). This benefit would be particularly noticed by those people browsing the web on smaller devices with potentially painful-to-access "text search" facilities (hint: iPhone and the like). Of course with any technology, both implementors and authors have to get behind it to make its benefits known.

§398 · October 10, 2007 · Firefox, Software, Technology, Web, XML · · [Print]

5 Comments to “More On XPointer”

Jeff Schiller says:

October 10, 2007 at 1:06 pm

Follow-Up Question #1: How would XPointer work with documents that are updated dynamically? Is it complete chaos?

Follow-Up Question #2: Wikipedia claims that XPointer is encumbered by a Sun Microsystems patent. Any truth to this? What are the ramifications?
Jeroen says:

October 21, 2007 at 10:25 am

You can get some XPointer support in Opera. I just wrote a userJS for the element and xmlns scheme (and a subset of the xpointer scheme). Adding support for the full xpointer scheme is harder, since you need some way to implement the string-range function. I don’t really know a way to do that unless I try to parse the XPath myself (which is a lot of work). The script is also not tested in an XML context. It should work though (maybe with some changes).

Examples that work: Whatwg History.

Your element example.

It’s also not dynamic, although you could maybe do that using getters/setters. It’s not that hard to port it to greasemonkey.
Jeff says:

October 21, 2007 at 4:17 pm

Thanks Jeroen – it works great! I had thought that Opera supported XPath in some way (or maybe that was XQuery). Anyway, I agree with you – the next step would be to get some greasemonkey action.

Btw, your Whatwg History link was broken (you didn’t fill in an href), so post that if you get a chance. Thanks!
Jeff says:

October 21, 2007 at 4:22 pm

Oh, one other thing – the user.js would need to be updated to support #element(some_id/3) = (the third child of the element with id=”foo”)
Jeroen says:

October 22, 2007 at 7:27 am

Ah, sorry, I guess I didn’t close my link properly.

I’ve set up some tests.

#element(some_id/3) worked for me, but it’s possible that I forgot to upload my latest version.

Opera supports XPath, but string-range is not part of it, but added by XPointer, that’s why it doesn’t work. There’s no way I know of to add your own functions to the XPath evaluation context.