[XOM-interest] Streaming Serializer

Discussion:

Bruno Oliveira

2010-04-12 18:46:34 UTC

Hi,
I am using the serializer object of XOM API, but when i try to serialize
large xml files, my VM don't have enough memory. How can i use the
serializer object in streaming mode?

Best regards

Kevin S. Clarke

2010-04-12 19:39:17 UTC

Permalink

This might interest you (depending on the structure of your XML)...
take a look at the second solution:

http://stackoverflow.com/questions/967288/how-to-stream-xml-data-using-xom

Kevin

On Mon, Apr 12, 2010 at 2:46 PM, Bruno Oliveira

Post by Bruno Oliveira
Hi,
I am using the serializer object of XOM API, but when i try to serialize
large xml files, my VM don't have enough memory. How can i use the
serializer object in streaming mode?
Best regards
_______________________________________________
XOM-interest mailing list
XOM-interest at lists.ibiblio.org
http://lists.ibiblio.org/mailman/listinfo/xom-interest

Elliotte Rusty Harold

2010-04-13 20:30:38 UTC

Permalink

I never really planned for streaming serialization because I didn't
think it was very useful. However people seem to keep asking for it,
so maybe it's time to think about this.

It's not obvious to me what this would look like. Maybe

serializer.startDocument("rootElement name")
serializer.write(node)
serializer.write(node)
serializer.write(node)
serializer.write(node)
...
serializer.endDocument()

Are you able to hold the document in memory but run out of memory only
when you serialize? Or is something else going on here?

--
Elliotte Rusty Harold
elharo at ibiblio.org

Ian Phillips

2010-04-13 22:15:54 UTC

Permalink

Haven't really given this much thought, but off the top of my head?

Post by Elliotte Rusty Harold
I never really planned for streaming serialization because I didn't
think it was very useful. However people seem to keep asking for it,
so maybe it's time to think about this.

Maybe like this:

serializer.startDocument("rootElement name")
for (int i = 0; i < 1000000000; ++i) {
serializer.write(createNodeWithRandomContent());
}
serializer.endDocument()

?

Or maybe with something that has an efficient (== small) in memory representation but an inefficient (== defined by an external standard) XML representation?

I'm not sure, but I can think of examples where it would be useful.

Also, perhaps, writing out the contents of a DB (embedded or otherwise) to disk in XML format - what happens if the DB is fairly large?

Cheers,
Ian.

#ifndef __COMMON_SENSE__ | Ian Phillips
#include <std_disclaimer> | ianp at ianp.org
#endif | http://ianp.org/

Peter Stibrany

2010-04-14 05:44:44 UTC

Permalink

I use XML streaming when sending results while result are still being
computed. This computation takes some time and can produce lot of
results, but as soon as few results are available, I start streaming
them to the client, and client can then show it in the UI. Both client
and server use XOM. Server simply sends root element and then element
(with mixed content) for each result. Client uses own NodeFactory
subclass to parse and "report" partial results to the upper layer. It
works very well on both sides. Server doesn't need to keep results in
the memory, and for client it isn't all-or-nothing.

As for API, simple writeXMLDeclaration, writeStartTag/writeEndTag (for
root element), write(Element) and flush() worked fine for me.

I had bigger problems with NodeFactory which has only
startMakingElement(String name, String namespace) -- at this point,
attributes are not yet known, and finishMakingElement(Element element)
-- at this point, entire element content is built. My code needed to
react to attributes in root element, but at no point do you get element
with attributes only, without building entire element content.

-Peter

Post by Ian Phillips
Haven't really given this much thought, but off the top of my head?

serializer.startDocument("rootElement name")
for (int i = 0; i < 1000000000; ++i) {
serializer.write(createNodeWithRandomContent());
}
serializer.endDocument()
?
Or maybe with something that has an efficient (== small) in memory representation but an inefficient (== defined by an external standard) XML representation?
I'm not sure, but I can think of examples where it would be useful.
Also, perhaps, writing out the contents of a DB (embedded or otherwise) to disk in XML format - what happens if the DB is fairly large?
Cheers,
Ian.
#ifndef __COMMON_SENSE__ | Ian Phillips
#include <std_disclaimer> | ianp at ianp.org
#endif | http://ianp.org/
_______________________________________________
XOM-interest mailing list
XOM-interest at lists.ibiblio.org
http://lists.ibiblio.org/mailman/listinfo/xom-interest

Asgeir Frimannsson

2010-04-14 06:45:55 UTC

Permalink

Post by Peter Stibrany
I had bigger problems with NodeFactory which has only
startMakingElement(String name, String namespace) -- at this point,
attributes are not yet known, and finishMakingElement(Element element)
-- at this point, entire element content is built. My code needed to
react to attributes in root element, but at no point do you get element
with attributes only, without building entire element content.

You get this with NodeFactory.makeAttribute(...).

cheers,
asgeir

Peter Stibrany

2010-04-14 12:07:10 UTC

Permalink

Post by Asgeir Frimannsson

You get this with NodeFactory.makeAttribute(...).

You get only one attribute name/value, and no element information. My
code simply needed to do something based on attributes in root element,
before reading rest of the document. I'm not saying it's not possible to
implement it, but it is not as straightforward as it could be.

My implementation look like this: I remember root element created in
makeRootElement. I also have a flag called "rootAttributesReported". In
startMakingElement and in finishMakingElement methods, if rootElement is
set and rootAttributesReported is false, I pass root element to the code
which checks its attributes. At this point, root element has attributes
set. I hook into finishMakingElement method because my root element may
be empty sometimes, thus startMakingElement would never be called ... it
is called for elements inside root elements.

As I'm saying, it is possible to implement. I'm not even sure if method
like "checkElementWithAttributes" would make sense in NodeFactory. But
it sure would help me.

-Peter