Discussion:
[XOM-interest] XOM enforces whitespace for xpath text() expression
Pieper, Aaron
2011-05-23 22:19:58 UTC
Permalink
Hello,

I am working on some unit tests which use XPath to draw assertions on a
document. I encountered some surprising behavior, when evaluating the
following document.

<endpoint>
<service>getData
<errors/>
</service>
</endpoint>

The XPath expression /endpoint/service[text()='getData'] returns a
single node in some XML frameworks (like Dom4J), but returns zero nodes
in Xom. This is because Xom preserves leading/trailing whitespace. I'm
able to work around this by embedding the NormalizingFactory sample code
into my tests. At first I was frustrated that Xom behaved differently
from other frameworks; but, experimenting a little, this behavior is
consistent with the XSD specification, where leading/trailing whitespace
will invalidate a document in some cases. So, I think Xom is doing the
right thing here.

It might make sense to streamline this NormalizingFactory and package it
with Xom.

- Aaron
Michael Kay
2011-05-23 23:06:43 UTC
Permalink
Post by Pieper, Aaron
Hello,
I am working on some unit tests which use XPath to draw assertions on a
document. I encountered some surprising behavior, when evaluating the
following document.
<endpoint>
<service>getData
<errors/>
</service>
</endpoint>
The XPath expression /endpoint/service[text()='getData'] returns a
single node in some XML frameworks (like Dom4J)
That's outrageously wrong. Gratuitously removing whitespace in mixed
content can have no possible excuse. It's not a violation of the XPath
spec, which allows you to construct the input tree any way you like, but
it's totally against the accepted semantics of XML.

Even if it weren't mixed content, for example <service> getData
</service>, it would be highly questionable. It would be justified only
if there's a schema that tells you something about the data type of the
service element.

Michael Kay
Saxonica
Pieper, Aaron
2011-05-23 23:49:55 UTC
Permalink
Post by Michael Kay
Post by Pieper, Aaron
<endpoint>
<service>getData
<errors/>
</service>
</endpoint>
The XPath expression /endpoint/service[text()='getData'] returns a
single node in some XML frameworks (like Dom4J)
That's outrageously wrong. Gratuitously removing whitespace in mixed
content can have no possible excuse. It's not a violation of the XPath
spec, which allows you to construct the input tree any way you like, but
it's totally against the accepted semantics of XML.
That's right. Like I said in my original message, I think XOM is doing
the right thing here. You've added a sprinkling of hyperbole to my
original statement, but we're in agreement.

It sounds like you strongly believe that the idea of normalizing a
document (getting rid of whitespace) is dangerous, and that Xom
shouldn't encourage it. Some other frameworks facilitate this with
methods like Dom4J's builder.setStripWhitespaceText() method, or JDom's
Format.TextMode class, so I was surprised at first when XOM didn't offer
similar out-of-the-box functionality. But, I understand why XOM doesn't
want to go in that direction. It sounds like if someone is writing code
which is dependent on whitespace stripping, they should either avoid
XOM, or they should continue implementing their own NormalizingFactory.

- Aaron

Loading...