[XOM-interest] (no subject)

Discussion:

Jason McKendry

2010-09-30 09:48:45 UTC

Hello everyone,

I have built an application on top of XOM, and the last thing I haven't been able to iron out over the past few weeks has been a problem with character references and escape sequences to represent non-standard characters in my XML data files. I found information about how to use the xsl:output tag to add a character map, but since the result of each transformation is a XOM Nodes object, the xsl:output tag is never processed. I had to solve a similar problem using the DocType object, but I wasn't having any luck figuring out how to use that knowledge to solve this problem.

So far the best I've been able to come up with is either a way of having the character reference in my XML/XSL resolve to the character in the HTML source (which of course displays garbage in the browser), or to have something like &copy; in the source, which we know renders in the browser as ©. I have only been able to find advice that uses the xsl:output tag at this point.

My (hopefully temporary) workaround has been to cobble together some JavaScript that looks for custom tags in the HTML source via body onload, and then inserts the character with a unicode reference based on that location. It's okay for now, but my client wants to expand to multi-lingual versions of their site, and that will create a maintenance nightmare. Aside from that, I feel as though the XML should be where the actual data lives, not some pointer for a piece of client-side script to look for when I don't even have any assurance that it will always fire.

I've thought about going the other way, and having a custom piece of Java manipulate the output as text files, but that seems kludgy to me, too, and similarly painful to maintain.

I'd be very grateful for any guidance on where to look to solve this problem. Any advice or suggestions would be very welcome. I'm sure in all this time I'm not the only person to run into this!

Thank you,

Jason McKendry
namelessOperation();

Michael Kay

2010-09-30 10:18:39 UTC

Permalink

I think you're confused in your requirements.

XOM is a tree model, an abstract view of an XML document. In the tree
view of XML, characters are simply characters (things that correspond
one-to-one with unicode codepoints). The copyright symbol is one
character, one codepoint, and it is represented as such; you aren't
concerned with expansions such as © because those don't exist in
the tree view, they are only devices for serializing the XML within the
constraints of a restricted character repertoire. Creating entity or
character references is something that should only happen when you
serialize from the tree model to lexical XML, you should never attempt
to have such references present in the tree itself.

Michael Kay
Saxonica

Post by Jason McKendry
Hello everyone,
I have built an application on top of XOM, and the last thing I haven't been able to iron out over the past few weeks has been a problem with character references and escape sequences to represent non-standard characters in my XML data files. I found information about how to use the xsl:output tag to add a character map, but since the result of each transformation is a XOM Nodes object, the xsl:output tag is never processed. I had to solve a similar problem using the DocType object, but I wasn't having any luck figuring out how to use that knowledge to solve this problem.
So far the best I've been able to come up with is either a way of having the character reference in my XML/XSL resolve to the character in the HTML source (which of course displays garbage in the browser), or to have something like&copy; in the source, which we know renders in the browser as©. I have only been able to find advice that uses the xsl:output tag at this point.
My (hopefully temporary) workaround has been to cobble together some JavaScript that looks for custom tags in the HTML source via body onload, and then inserts the character with a unicode reference based on that location. It's okay for now, but my client wants to expand to multi-lingual versions of their site, and that will create a maintenance nightmare. Aside from that, I feel as though the XML should be where the actual data lives, not some pointer for a piece of client-side script to look for when I don't even have any assurance that it will always fire.
I've thought about going the other way, and having a custom piece of Java manipulate the output as text files, but that seems kludgy to me, too, and similarly painful to maintain.
I'd be very grateful for any guidance on where to look to solve this problem. Any advice or suggestions would be very welcome. I'm sure in all this time I'm not the only person to run into this!
Thank you,
Jason McKendry
namelessOperation();
_______________________________________________
XOM-interest mailing list
XOM-interest at lists.ibiblio.org
http://lists.ibiblio.org/mailman/listinfo/xom-interest

Elliotte Rusty Harold

2010-09-30 11:01:08 UTC

Permalink

On Thu, Sep 30, 2010 at 5:48 AM, Jason McKendry

You should never have to worry about character references. Just use
the characters you want to use like ? in your text and let XOM decide
how to encode them on output. Unless by "non-standard characters" you
mean characters that aren't even in Unicode--e.g. Klingon--in which
case there's not a lot XOM can do for you.

--
Elliotte Rusty Harold
elharo at ibiblio.org

Continue reading on narkive:

Search results for '[XOM-interest] (no subject)' (Questions and Answers)

replies

How do you determine subject and predicate in questions?

started 2009-10-08 16:54:11 UTC

mathematics

replies

Help me identify the German verb, subject and object?

started 2011-03-07 13:41:55 UTC

languages

replies