[XOM-interest] XOM enforces whitespace for xpath text() expression

Discussion:

Pieper, Aaron

2011-05-23 22:19:58 UTC

Hello,

I am working on some unit tests which use XPath to draw assertions on a
document. I encountered some surprising behavior, when evaluating the
following document.

<endpoint>
<service>getData
<errors/>
</service>
</endpoint>

The XPath expression /endpoint/service[text()='getData'] returns a
single node in some XML frameworks (like Dom4J), but returns zero nodes
in Xom. This is because Xom preserves leading/trailing whitespace. I'm
able to work around this by embedding the NormalizingFactory sample code
into my tests. At first I was frustrated that Xom behaved differently
from other frameworks; but, experimenting a little, this behavior is
consistent with the XSD specification, where leading/trailing whitespace
will invalidate a document in some cases. So, I think Xom is doing the
right thing here.

It might make sense to streamline this NormalizingFactory and package it
with Xom.

- Aaron

Michael Kay

2011-05-23 23:06:43 UTC

Permalink

Post by Pieper, Aaron
Hello,
I am working on some unit tests which use XPath to draw assertions on a
document. I encountered some surprising behavior, when evaluating the
following document.
<endpoint>
<service>getData
<errors/>
</service>
</endpoint>
The XPath expression /endpoint/service[text()='getData'] returns a
single node in some XML frameworks (like Dom4J)

That's outrageously wrong. Gratuitously removing whitespace in mixed
content can have no possible excuse. It's not a violation of the XPath
spec, which allows you to construct the input tree any way you like, but
it's totally against the accepted semantics of XML.

Even if it weren't mixed content, for example <service> getData
</service>, it would be highly questionable. It would be justified only
if there's a schema that tells you something about the data type of the
service element.

Michael Kay
Saxonica

Pieper, Aaron

2011-05-23 23:49:55 UTC

Permalink

Post by Michael Kay

Post by Pieper, Aaron
<endpoint>
<service>getData
<errors/>
</service>
</endpoint>
The XPath expression /endpoint/service[text()='getData'] returns a
single node in some XML frameworks (like Dom4J)

That's right. Like I said in my original message, I think XOM is doing
the right thing here. You've added a sprinkling of hyperbole to my
original statement, but we're in agreement.

It sounds like you strongly believe that the idea of normalizing a
document (getting rid of whitespace) is dangerous, and that Xom
shouldn't encourage it. Some other frameworks facilitate this with
methods like Dom4J's builder.setStripWhitespaceText() method, or JDom's
Format.TextMode class, so I was surprised at first when XOM didn't offer
similar out-of-the-box functionality. But, I understand why XOM doesn't
want to go in that direction. It sounds like if someone is writing code
which is dependent on whitespace stripping, they should either avoid
XOM, or they should continue implementing their own NormalizingFactory.

- Aaron