Discussion:
[XOM-interest] XOM and jar: URIs
Michael Whapples
2016-12-07 12:56:44 UTC
Permalink
When building a Document in XOM one can pass the builder.build method a
string for the base URI. If I pass to this method a jar: URI (eg.
jar:file:///D:/mydir/mybook.zip!/mybook.xml) as the base URI I get an
exception from XOM. It appears that the Verifier in XOM for verifying
the URI does not handle the jar: URI scheme correctly (it complains
about a double slash (//) in the path).


In Java (at least Java7 and higher) the jar: protocol is fully handled
and such a URI can be used for accessing files inside a zip file.


As far as I can tell allowing such a URI to be passed in would not cause
issues, the base URI could be used to resolve external resources
referenced from the XML file. May be I am missing something inside XOM
where this might cause an issue.


Would it be possible for this to be fixed in future versions of XOM?
Alternatively is there another way for me to have the base URI reference
a location in a zip file?


I tried creating a custom URL protocol handler for Java and modified the
base URI to use this protocol (eg.
zipfile:///D:/mydir/mybook.zip!/mybook.xml) but I could not get the JDK
to pick up this protocol handler (I was setting the
java.protocol.handler.pkgs system property but with no success). This
option seems too complicated, particularly when the jar: protocol is
supported by the JDK by default.


My only solution at the moment seems to be modify the base URI to
something XOM is happy with (eg.
jar-file:///D:/mydir/mybook.zip!/mybook.xml) pass this modified base URI
to build. Then within a entity resolver catch the use of the jar- prefix
in the base URI and change it back to jar: and manually resolve the
external entity. A lot of work spread over many parts of the code, so
far from satisfactory and why I would like a better solution.


Michael Whapples
Michael Kay
2016-12-07 15:33:39 UTC
Permalink
The problem is that the JAR URI scheme (invented by Sun, I believe) doesn't follow the generic URI syntax defined in the RFCs - it's non-conformant. This causes problems all over the place. In Saxon I special-case it to try and make it work (java.net.URI can't handle it properly, so I use java.net.URL instead). By allowing it, you're providing a convenience to users, but you're also opening a can of worms.

Michael Kay
Saxonica
When building a Document in XOM one can pass the builder.build method a string for the base URI. If I pass to this method a jar: URI (eg. jar:file:///D:/mydir/mybook.zip!/mybook.xml) as the base URI I get an exception from XOM. It appears that the Verifier in XOM for verifying the URI does not handle the jar: URI scheme correctly (it complains about a double slash (//) in the path).
In Java (at least Java7 and higher) the jar: protocol is fully handled and such a URI can be used for accessing files inside a zip file.
As far as I can tell allowing such a URI to be passed in would not cause issues, the base URI could be used to resolve external resources referenced from the XML file. May be I am missing something inside XOM where this might cause an issue.
Would it be possible for this to be fixed in future versions of XOM? Alternatively is there another way for me to have the base URI reference a location in a zip file?
I tried creating a custom URL protocol handler for Java and modified the base URI to use this protocol (eg. zipfile:///D:/mydir/mybook.zip!/mybook.xml) but I could not get the JDK to pick up this protocol handler (I was setting the java.protocol.handler.pkgs system property but with no success). This option seems too complicated, particularly when the jar: protocol is supported by the JDK by default.
My only solution at the moment seems to be modify the base URI to something XOM is happy with (eg. jar-file:///D:/mydir/mybook.zip!/mybook.xml) pass this modified base URI to build. Then within a entity resolver catch the use of the jar- prefix in the base URI and change it back to jar: and manually resolve the external entity. A lot of work spread over many parts of the code, so far from satisfactory and why I would like a better solution.
Michael Whapples
_______________________________________________
XOM-interest mailing list
http://lists.ibiblio.org/mailman/listinfo/xom-interest
Ed Davies
2016-12-07 18:39:47 UTC
Permalink
Doesn't jar:file:///D:/mydir/mybook.zip!/mybook.xml match the
path-rootless form defined in RFC 3986? OK, it has “:” in the
first segment and empty segments which are a bit odd but they
are both allowed by the syntax.
Post by Michael Kay
The problem is that the JAR URI scheme (invented by Sun, I believe) doesn't follow the generic URI syntax defined in the RFCs - it's non-conformant. This causes problems all over the place. In Saxon I special-case it to try and make it work (java.net.URI can't handle it properly, so I use java.net.URL instead). By allowing it, you're providing a convenience to users, but you're also opening a can of worms.
Michael Kay
Saxonica
When building a Document in XOM one can pass the builder.build method a string for the base URI. If I pass to this method a jar: URI (eg. jar:file:///D:/mydir/mybook.zip!/mybook.xml) as the base URI I get an exception from XOM. It appears that the Verifier in XOM for verifying the URI does not handle the jar: URI scheme correctly (it complains about a double slash (//) in the path).
In Java (at least Java7 and higher) the jar: protocol is fully handled and such a URI can be used for accessing files inside a zip file.
As far as I can tell allowing such a URI to be passed in would not cause issues, the base URI could be used to resolve external resources referenced from the XML file. May be I am missing something inside XOM where this might cause an issue.
Would it be possible for this to be fixed in future versions of XOM? Alternatively is there another way for me to have the base URI reference a location in a zip file?
I tried creating a custom URL protocol handler for Java and modified the base URI to use this protocol (eg. zipfile:///D:/mydir/mybook.zip!/mybook.xml) but I could not get the JDK to pick up this protocol handler (I was setting the java.protocol.handler.pkgs system property but with no success). This option seems too complicated, particularly when the jar: protocol is supported by the JDK by default.
My only solution at the moment seems to be modify the base URI to something XOM is happy with (eg. jar-file:///D:/mydir/mybook.zip!/mybook.xml) pass this modified base URI to build. Then within a entity resolver catch the use of the jar- prefix in the base URI and change it back to jar: and manually resolve the external entity. A lot of work spread over many parts of the code, so far from satisfactory and why I would like a better solution.
Michael Whapples
_______________________________________________
XOM-interest mailing list
http://lists.ibiblio.org/mailman/listinfo/xom-interest
_______________________________________________
XOM-interest mailing list
http://lists.ibiblio.org/mailman/listinfo/xom-interest
Michael Whapples
2016-12-08 14:36:17 UTC
Permalink
I agree with that reading of RFC3986, so it does seem like the jar URI
scheme is valid.


My other thought is that we need to consider that XOM is working in Java
only and whether the base URI passed to the build method will ever be
seen by anything else. As far as I can tell this base URI never actually
gets inserted into the XOM document model and so it will not go into the
XML, it only seems to be used by other Java classes in processing the
document and resolving relative URIs. As the Java documentation says
that all Java implementations must supply a default URL handler for the
jar protocol then this seems fine to me and such URIs should be allowed.


Michael Whapples
Post by Ed Davies
Doesn't jar:file:///D:/mydir/mybook.zip!/mybook.xml match the
path-rootless form defined in RFC 3986? OK, it has “:” in the
first segment and empty segments which are a bit odd but they
are both allowed by the syntax.
Post by Michael Kay
The problem is that the JAR URI scheme (invented by Sun, I believe) doesn't follow the generic URI syntax defined in the RFCs - it's non-conformant. This causes problems all over the place. In Saxon I special-case it to try and make it work (java.net.URI can't handle it properly, so I use java.net.URL instead). By allowing it, you're providing a convenience to users, but you're also opening a can of worms.
Michael Kay
Saxonica
When building a Document in XOM one can pass the builder.build method a string for the base URI. If I pass to this method a jar: URI (eg. jar:file:///D:/mydir/mybook.zip!/mybook.xml) as the base URI I get an exception from XOM. It appears that the Verifier in XOM for verifying the URI does not handle the jar: URI scheme correctly (it complains about a double slash (//) in the path).
In Java (at least Java7 and higher) the jar: protocol is fully handled and such a URI can be used for accessing files inside a zip file.
As far as I can tell allowing such a URI to be passed in would not cause issues, the base URI could be used to resolve external resources referenced from the XML file. May be I am missing something inside XOM where this might cause an issue.
Would it be possible for this to be fixed in future versions of XOM? Alternatively is there another way for me to have the base URI reference a location in a zip file?
I tried creating a custom URL protocol handler for Java and modified the base URI to use this protocol (eg. zipfile:///D:/mydir/mybook.zip!/mybook.xml) but I could not get the JDK to pick up this protocol handler (I was setting the java.protocol.handler.pkgs system property but with no success). This option seems too complicated, particularly when the jar: protocol is supported by the JDK by default.
My only solution at the moment seems to be modify the base URI to something XOM is happy with (eg. jar-file:///D:/mydir/mybook.zip!/mybook.xml) pass this modified base URI to build. Then within a entity resolver catch the use of the jar- prefix in the base URI and change it back to jar: and manually resolve the external entity. A lot of work spread over many parts of the code, so far from satisfactory and why I would like a better solution.
Michael Whapples
_______________________________________________
XOM-interest mailing list
http://lists.ibiblio.org/mailman/listinfo/xom-interest
_______________________________________________
XOM-interest mailing list
http://lists.ibiblio.org/mailman/listinfo/xom-interest
_______________________________________________
XOM-interest mailing list
http://lists.ibiblio.org/mailman/listinfo/xom-interest
Elliotte Rusty Harold
2016-12-12 15:31:44 UTC
Permalink
Maybe. Please file an RFE in the Github repo. It would be convenient
if we could keep discussion there.

https://github.com/elharo/xom/issues
Post by Michael Whapples
When building a Document in XOM one can pass the builder.build method a
string for the base URI. If I pass to this method a jar: URI (eg.
jar:file:///D:/mydir/mybook.zip!/mybook.xml) as the base URI I get an
exception from XOM. It appears that the Verifier in XOM for verifying the
URI does not handle the jar: URI scheme correctly (it complains about a
double slash (//) in the path).
In Java (at least Java7 and higher) the jar: protocol is fully handled and
such a URI can be used for accessing files inside a zip file.
As far as I can tell allowing such a URI to be passed in would not cause
issues, the base URI could be used to resolve external resources referenced
from the XML file. May be I am missing something inside XOM where this might
cause an issue.
Would it be possible for this to be fixed in future versions of XOM?
Alternatively is there another way for me to have the base URI reference a
location in a zip file?
I tried creating a custom URL protocol handler for Java and modified the
base URI to use this protocol (eg.
zipfile:///D:/mydir/mybook.zip!/mybook.xml) but I could not get the JDK to
pick up this protocol handler (I was setting the java.protocol.handler.pkgs
system property but with no success). This option seems too complicated,
particularly when the jar: protocol is supported by the JDK by default.
My only solution at the moment seems to be modify the base URI to something
XOM is happy with (eg. jar-file:///D:/mydir/mybook.zip!/mybook.xml) pass
this modified base URI to build. Then within a entity resolver catch the use
of the jar- prefix in the base URI and change it back to jar: and manually
resolve the external entity. A lot of work spread over many parts of the
code, so far from satisfactory and why I would like a better solution.
Michael Whapples
_______________________________________________
XOM-interest mailing list
http://lists.ibiblio.org/mailman/listinfo/xom-interest
--
Elliotte Rusty Harold
***@ibiblio.org
Loading...