Discussion:
[XOM-interest] Xpath parsing problem
Mike Miller
2011-03-03 00:01:22 UTC
Permalink
Not sure if this went through the first time:

Hi,

I'm trying to use Xpath to drill into an xml document, I have already done
this successfully
using other simple xml documents, I say simple because there were no
namespaces or
other such things. This document has some namespace info but the tags I'm
initially trying
to get to, don't have namespace prefixes.

So my doc looks like this:
<root a bunch of namespace crap>
<warnings>
<warning>
some other nested tags in here
</warning>
</warnings>
<result>
<space>
<nb:entitycollection>
<nb:entityset> // may be multiple in doc
<nb:entity>
bunch of other tags in here
</nb:entity>
<nb:entity>
...
</nb:entity>
</nb:entityset>
</nb:entitycollection>
</space>
</result>
</root>

So I tried something like:
nodes = doc.query("/root/result/space");
then when I dumped the size of nodes, it is 0, but i expected 1 because
there is 1
in the document. I also tried something like this:
nodes = doc.query("/root/result/space/entitycollection"); and size 0 also
I want ot loop through the entitycollection children (maybe only 1) and
then from
there loop through the entityset and entity structures, But the query
using xpath
keeps saying that the list size is 0, as if it can't find any of my tags.
But if I try
to traverse the xml using getChildElements at each level, then I can see
all the
tags and children. Any ideas what the problem is?

Thanks,
Mike
Dave Pawson
2011-03-03 07:56:09 UTC
Permalink
Post by Mike Miller
?Hi,
I'm trying to use Xpath
<root a bunch of namespace crap>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Post by Mike Miller
?nodes = doc.query("/root/result/space");
?then when I dumped the size of nodes, it is 0, but i expected 1 because
there is 1
Your 'bunch of namespace crap' isn't crap
when you want to use xpath.

Perhaps you need a little more understanding there prior
to using xpath?
--
Dave Pawson
XSLT XSL-FO FAQ.
Docbook FAQ.
http://www.dpawson.co.uk
Mike Miller
2011-03-03 14:52:41 UTC
Permalink
I somewhat realized that namespace issues might be causing my dilema and
to tell you the truth, I think they are more of a pita than they're worth,
but
nevertheless, stating that I don't understand how namespaces fit into xpath
queries is not really a help, that's obvious.

So to help me out here, it was my impression that tags within an xml
document
that are not preceeded by <something: do not have namespaces applied, is
this
incorrect? If so, then I assume there is some default namespace that
applies to
all tags when one is not specified? If it's too much to explain this to me,
then
could I get a link that would clarify?

Or if its not too difficult, can I get a simple example of using the query
method
with xpath to drill into a document when namespaces are involved?

Thanks,
Mike
Post by Dave Pawson
Post by Mike Miller
Hi,
I'm trying to use Xpath
<root a bunch of namespace crap>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Post by Mike Miller
nodes = doc.query("/root/result/space");
then when I dumped the size of nodes, it is 0, but i expected 1 because
there is 1
Your 'bunch of namespace crap' isn't crap
when you want to use xpath.
Perhaps you need a little more understanding there prior
to using xpath?
--
Dave Pawson
XSLT XSL-FO FAQ.
Docbook FAQ.
http://www.dpawson.co.uk
_______________________________________________
XOM-interest mailing list
XOM-interest at lists.ibiblio.org
http://lists.ibiblio.org/mailman/listinfo/xom-interest
Dave Pawson
2011-03-03 14:58:58 UTC
Permalink
Post by Mike Miller
So to help me out here, it was my impression that tags within an xml
document
that are not preceeded by <something:? do not have namespaces applied, is
this
incorrect?
Depends :-)
Namespaces are inherited, so you need to look up the tree
from the point in question to see what namespaces are in scope.


If so, then I assume there is some default namespace that
Post by Mike Miller
applies to
all tags when one is not specified?? If it's too much to explain this to me,
then
could I get a link that would clarify?
W3C site?.... Better still,
Go buy Mike Kays book and learn about using namespaces with xpath
and XSLT, you won't regret it... unless you have a weak back.
Post by Mike Miller
Or if its not too difficult, can I get a simple example of using the query
method
with xpath to drill into a document when namespaces are involved?
<xsl:template match="prefix:elname/prefix:elName">
Easy as that... when you know what namespace the element is in?


Honest, while they are a PITA, they are like.... something you
hate, but need to do something else?

No, not much immediate help I know.

Call it an investment in the future?
--
Dave Pawson
XSLT XSL-FO FAQ.
Docbook FAQ.
http://www.dpawson.co.uk
Michael Kay
2011-03-03 15:01:46 UTC
Permalink
Stating that I don't understand how namespaces fit into xpath queries
is not really a help, that's obvious.

I thought it would help point you to the right place in the
FAQ/documentation.
So to help me out here, it was my impression that tags within an xml
document
that are not preceeded by<something: do not have namespaces applied, is
this
incorrect?
No. If an element name contains no colon, it takes on the default
namespace, which is the one introduced by xmlns="...." on that or the
nearest ancestor element that has such a declaration.
If so, then I assume there is some default namespace that
applies to
all tags when one is not specified? If it's too much to explain this to me,
then
could I get a link that would clarify?
Look for the namespaces chapter in the introduction of any basic XML
book, or google for "Namespaces tutorial".
Or if its not too difficult, can I get a simple example of using the query
method
with xpath to drill into a document when namespaces are involved?
Yes, it's not too difficult, just google for "XPath default namespace FAQ".

Michael Kay
Saxonica
Mike Miller
2011-03-03 15:10:24 UTC
Permalink
Ok, I suppose some research is in order. I'm getting lazy as I get older
and really wanted
something like: well just do this or put that in your xpath query and it
will work ;-)

BTW: I did see an XOM example somewhere, where a person showed you do
something
like: doc.query("/*:tag1/*:tag2"); so as to wildcard or ignore namespace
issues (which is
what I wanted to do). But XOM threw an exception about having a colon in
the xpath.
Is this doable?

Thanks,
Mike
Post by Mike Miller
Stating that I don't understand how namespaces fit into xpath queries is
not really a help, that's obvious.
I thought it would help point you to the right place in the
FAQ/documentation.
So to help me out here, it was my impression that tags within an xml
document
that are not preceeded by<something: do not have namespaces applied, is
this
incorrect?
No. If an element name contains no colon, it takes on the default
namespace, which is the one introduced by xmlns="...." on that or the
nearest ancestor element that has such a declaration.
If so, then I assume there is some default namespace that
applies to
all tags when one is not specified? If it's too much to explain this to me,
then
could I get a link that would clarify?
Look for the namespaces chapter in the introduction of any basic XML book,
or google for "Namespaces tutorial".
Or if its not too difficult, can I get a simple example of using the query
method
with xpath to drill into a document when namespaces are involved?
Yes, it's not too difficult, just google for "XPath default namespace
FAQ".
Michael Kay
Saxonica
_______________________________________________
XOM-interest mailing list
XOM-interest at lists.ibiblio.org
http://lists.ibiblio.org/mailman/listinfo/xom-interest
Michael Kay
2011-03-03 15:17:37 UTC
Permalink
Post by Mike Miller
BTW: I did see an XOM example somewhere, where a person showed you do
something
like: doc.query("/*:tag1/*:tag2"); so as to wildcard or ignore namespace
issues (which is
what I wanted to do). But XOM threw an exception about having a colon in
the xpath.
Is this doable?
This is XPath 2.0 syntax. I think XOM still only comes with an XPath 1.0
engine built-in. You can use an external XPath 2.0 engine though: Saxon
works with XOM input.

Michael Kay
Saxonica
Mike Miller
2011-03-03 15:24:21 UTC
Permalink
See, life could be Simple ;-) But if I want it to be, I now need to
configure XOM
with another XPATH engine, which results in more challenges. I just want my

document data, is that too much to ask ;-)

Thanks for the input and quick response time, much appreciated,
Mike
Post by Mike Miller
BTW: I did see an XOM example somewhere, where a person showed you do
Post by Mike Miller
something
like: doc.query("/*:tag1/*:tag2"); so as to wildcard or ignore namespace
issues (which is
what I wanted to do). But XOM threw an exception about having a colon in
the xpath.
Is this doable?
This is XPath 2.0 syntax. I think XOM still only comes with an XPath 1.0
engine built-in. You can use an external XPath 2.0 engine though: Saxon
works with XOM input.
Michael Kay
Saxonica
_______________________________________________
XOM-interest mailing list
XOM-interest at lists.ibiblio.org
http://lists.ibiblio.org/mailman/listinfo/xom-interest
Michael Kay
2011-03-03 15:34:58 UTC
Permalink
Post by Mike Miller
See, life could be Simple ;-) But if I want it to be, I now need to
configure XOM
with another XPATH engine, which results in more challenges. I just want my
document data, is that too much to ask ;-)
You could always go back to the old way of doing things, and pay for
your software ;-)

Michael Kay
Saxonica
Peter Murray-Rust
2011-03-03 15:36:42 UTC
Permalink
Post by Mike Miller
See, life could be Simple ;-) But if I want it to be, I now need to
configure XOM
with another XPATH engine, which results in more challenges. I just want my
You don't *have* to move to XPath2.0. By using XPathContext
http://www.xom.nu/apidocs/nu/xom/XPathContext.html you can choose a prefix
that binds your namespace to a prefix of your choice.
Post by Mike Miller
document data, is that too much to ask ;-)
You *have* to understand what the namespaces in your document are. You
don't give details on the "crap" but I guess it includes

xmlns=""

which assigns the default namespace to nodes in /a/b/c

or xmlns="http://foo.org"

which binds a specific namespace to it

By using XPathContext you can create an explicit prefix bound to the
namespace and include it in your XPath. Alternatively and I think simpler
you can use local-name() which will do the same as the * in XPath 2.0
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
Peter Murray-Rust
2011-03-03 08:49:50 UTC
Permalink
Although yours isn't primarily a XOM-related question it's worth noting that
XOM has good support for namespaces.

Firstly it's easy to go wrong in this area. The XOM serialization should
give you a pointer to what namespaces are carried by which nodes.

Yopu can also get them programmatically
public final class Namespace

extends Node <http://www.xom.nu/apidocs/nu/xom/Node.html>

Represents a namespace in scope. It is used by XOM's XPath implementation
for the namespace axis. However, it is not really part of the XOM data
model. Namespace objects are only created as needed when evaluating XPath.
While a namespace node has a parent element (which may be null), that
element does not know about these namespace nodes and cannot remove them.
(This is an inconsistency in the XPath data model, and is shared with
attributes which also have parents but are not children.)

and you can get them programmatically through:

Element.getNamespaceDeclarationCount

public final int getNamespaceDeclarationCount()

Returns the number of namespace declarations on this element. This counts
the namespace of the element itself (which may be the empty string), the
namespace of each attribute, and each namespace added by
addNamespaceDeclaration. However, prefixes used multiple times are only
counted once; and the xml prefix used for xml:base, xml:lang, and
xml:spaceis not counted even if one of these attributes is present on
the element.

The return value is almost always positive. It can be zero if and only if
the element itself has the prefix xml; e.g. <xml:space />. This is not
endorsed by the XML specification. The prefix xml is reserved for use by the
W3C, which has only used it for attributes to date. You really shouldn't do
this. Nonetheless, this is not malformed so XOM allows it.

Returns:the number of namespaces declared by this element
and also:
Element.getNamespacePrefix<http://www.xom.nu/apidocs/nu/xom/Element.html#getNamespacePrefix%28int%29>
(int index)
Returns the indexth namespace prefix *declared* on this element.
and
Element.getNamespaceURI<http://www.xom.nu/apidocs/nu/xom/Element.html#getNamespaceURI%28java.lang.String%29>
(String prefix)
Returns the namespace URI mapped to the specified prefix within
this element.

Apply these to your elements and you will probably be surprised, bewildered
and enlightened in that order.

As a matter of practice I often avoid "/a/b/c" in Xpath because it is easy
to overlook empty namespaces and similar problems and use:
*[local-name()='a']/*[local-name()='b']/*[local-name()='c']
if there is no chabce of collision (if there is, use additional
namespace-uri() values]

The problem is that failure to find nodes in XPath is often not a machine
error, simply a human error. And XOM, great as it is, cannot look into the
mind of the programmer.

P.
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
Michael Kay
2011-03-03 08:56:35 UTC
Permalink
Post by Mike Miller
<root a bunch of namespace crap>
If you imagine that you can treat namespaces as "crap" when using XPath,
you are very badly mistaken. It's like trying to send me mail addressed
as "mike" and treating the rest of the email address as "crap". It won't
work.

Michael Kay
Saxonica
Loading...