Discussion:
[XOM-interest] Namespace validation bug
Dan Pollitt
2012-05-21 06:56:08 UTC
Permalink
Hi,

I understand this to be a valid namespace URI:
"urn://schemas-microsoft-com:office:office"

However parsing an XML document containing this fragment:

...
<comment>
<div:div xmlns:div="http://www.w3.org/1999/xhtml"
xmlns="http://www.w3.org/1999/xhtml">
<div xmlns:o="urn://schemas-microsoft-com:office:office"
xmlns:st1="urn://schemas-microsoft-com:office:smarttags"
xmlns:v="urn://schemas-microsoft-com:vml"
xmlns:w="urn://schemas-microsoft-com:office:word">Stated criteria is
too ambiguous. We need clear direction on how to validate.</div>
</div:div>
</comment>
...

yields the following error:

Caused by: nu.xom.MalformedURIException: Bad port: office:office
at nu.xom.Verifier.checkPort(Verifier.java:610)
at nu.xom.Verifier.checkAuthority(Verifier.java:453)
at nu.xom.Verifier.checkAbsoluteURIReference(Verifier.java:906)
at nu.xom.Element.addNamespaceDeclaration(Element.java:1164)
at nu.xom.NonVerifyingHandler.startElement(NonVerifyingHandler.java:103)
at org.apache.xerces.parsers.AbstractSAXParser.startElement(Unknown Source)
at org.apache.xerces.impl.XMLNamespaceBinder.handleStartElement(Unknown Source)
at org.apache.xerces.impl.XMLNamespaceBinder.startElement(Unknown Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanStartElement(Unknown
Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl$FragmentContentDispatcher.dispatch(Unknown
Source)
at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown
Source)
at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
at nu.xom.Builder.build(Builder.java:1127)
... 11 more

I am using XOM 1.1 however looking at the source for 1.2.8 I think
this bug is still present?

Thanks,
Dan
Elliotte Rusty Harold
2012-05-22 00:01:29 UTC
Permalink
Post by Dan Pollitt
Hi,
"urn://schemas-microsoft-com:office:office"
Interesting. I'm going to have to go to the specs on this one. See
https://tools.ietf.org/html/rfc3986#section-3.2.3

According to that, the port is indeed malformed:

The port subcomponent of authority is designated by an optional port
number in decimal following the host and delimited from it by a
single colon (":") character.

port = *DIGIT

A scheme may define a default port. For example, the "http" scheme
defines a default port of "80", corresponding to its reserved TCP
port number. The type of port designated by the port number (e.g.,
TCP, UDP, SCTP) is defined by the URI scheme. URI producers and
normalizers should omit the port component and its ":" delimiter if
port is empty or if its value would be the same as that of the
scheme's default.

Furthermore, it's malformed as a URN since // is not allowed in the NID:

https://tools.ietf.org/html/rfc2141

So, yes, there's a bug here but it's not in XOM. XOM is correctly
informing you of an error in what looks like a Microsoft-defined
format.
--
Elliotte Rusty Harold
elharo at ibiblio.org
Michael Kay
2012-05-22 07:24:58 UTC
Permalink
Post by Elliotte Rusty Harold
So, yes, there's a bug here but it's not in XOM. XOM is correctly
informing you of an error in what looks like a Microsoft-defined format.

Actually, I think XOM is telling you that you spelt the namespace
incorrectly:


http://msdn.microsoft.com/en-us/library/ms875215(v=exchg.65).aspx

Michael Kay
Saxonica

Loading...