Peter Murray-Rust
2012-05-31 12:28:38 UTC
I am reading SVG files with XOM, some of which have very long strings (e.g.
4 Mb) for attribute values. For example images (bitmaps) are encoded as
attribute values, such as
<image x="0" y="0" transform="matrix(0.144,0,0,0.1439,251.521,271.844)"
clip-path="url(#clipPath2)" width="1797"
xlink:href="data:image/png;
base64,iVBORw0KGgoAAAANSUhEUgAABwUAAAV4CAMAAAB2DvLsAAADAFBM...
...JRU5ErkJggg==" height="1400" preserveAspectRatio="none"
stroke-width="0" xmlns:xlink="http://www.w3.org/1999/xlink"/>
My code is
Document doc = new Builder().build(file);
For a file with one attribute value of 3.9 Mbytes the time is 9 seconds
while if the same string is PCDATA content the time is 0.1 seconds.
Is this expected? and is there anything I can do to improve the parsing
performance? I don't actually want the value - I simply want to read it in
and throw it away.
4 Mb) for attribute values. For example images (bitmaps) are encoded as
attribute values, such as
<image x="0" y="0" transform="matrix(0.144,0,0,0.1439,251.521,271.844)"
clip-path="url(#clipPath2)" width="1797"
xlink:href="data:image/png;
base64,iVBORw0KGgoAAAANSUhEUgAABwUAAAV4CAMAAAB2DvLsAAADAFBM...
...JRU5ErkJggg==" height="1400" preserveAspectRatio="none"
stroke-width="0" xmlns:xlink="http://www.w3.org/1999/xlink"/>
My code is
Document doc = new Builder().build(file);
For a file with one attribute value of 3.9 Mbytes the time is 9 seconds
while if the same string is PCDATA content the time is 0.1 seconds.
Is this expected? and is there anything I can do to improve the parsing
performance? I don't actually want the value - I simply want to read it in
and throw it away.
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069