Discussion:
[XOM-interest] Serialization using explicit numeric Unicode points
Peter Murray-Rust
2012-08-28 13:43:12 UTC
Permalink
Is there a way to use XOM to serialize high code points as explicit numbers
(e.g.

• [that should read ampersand-hash-8-2-2-6-semicolon in case this
mail translates it])

or do I need to subclass Serializer?

[I am doing this so the XML is easier to inspect by hand - the various
em-dashes, etc. are difficult to distinguish and non-Unicode-compliant
tools can corrupt them].

P.
--
Peter Murray-Rust
Reader in Molecular Informatics
Unilever Centre, Dep. Of Chemistry
University of Cambridge
CB2 1EW, UK
+44-1223-763069
Elliotte Rusty Harold
2012-09-17 10:07:08 UTC
Permalink
Post by Peter Murray-Rust
Is there a way to use XOM to serialize high code points as explicit numbers
(e.g.
• [that should read ampersand-hash-8-2-2-6-semicolon in case this
mail translates it])
or do I need to subclass Serializer?
If you use UTF-8 you need to subclass Serializer. XOM deliberately
shields you from syntactic details such as how characters are encoded.


If you set the serializer to ASCII or Latin-1 though, then these
characters should be numerically escaped.
--
Elliotte Rusty Harold
elharo at ibiblio.org
Loading...