Discussion:
[XOM-interest] invalid byte 2 of 3 byte UTF-8 error received
Sandy Mustard
2013-09-24 00:15:41 UTC
Permalink
I have some text that I need to convert to xml. The text is:
"Ing. Libor Vos\xe1hlo\nSenior project manager\nforteq Czech\nKopisty 1-
areal MUS,a.s.\nCZ - 43401, Most - Kopisty\nTel: +420 476 203 826\nFax:
+420 476 203 835\nMobil: +420 602 109 446\n"

I am converting the "\n" sequences to new lines (x0a).
Note the character sequence that is \xe1. I convert this sequence with
a character with a value of 225 (xe1).
I am using XOM to create the XML
Attribute attr = new Attribute("Text", textString);
xmlClass.addAttribute(attr);
and here is the output:

<MyElem Text="Ing. Libor Vos?hlo&#x0A;Senior project manager&#x0A;forteq
Czech&#x0A;Kopisty 1- areal MUS,a.s.&#x0A;CZ - 43401, Most -
Kopisty&#x0A;Tel: +420 476 203 826&#x0A;Fax: +420 476 203
835&#x0A;Mobil: +420 602 109 446&#x0A;" />

Note the correct last name in the data (with the ?)

However, when reading this in various XML parsers, I get a
"java.io.UTFDataFormatException: Invalid byte 2 of 3-byte UTF-8
sequence" error.

How do I pass this string to XOM to get the correct UTF-8 encoding?

Thank you.

Sandy Mustard
Sandy Mustard
2013-09-24 00:58:02 UTC
Permalink
Well, perhaps I figured it out.

I added the following after my character conversions:

byte[] chrs =
textString.getBytes("UTF-8"); // <<<<<<<<<<<<<<<<<<<<<<
textString= new String(chrs); //
<<<<<<<<<<<<<<<<<<<<<<<
Attribute attr = new Attribute("Text",
textString);

I'm hoping this is correct.

SandyMustard
Post by Sandy Mustard
"Ing. Libor Vos\xe1hlo\nSenior project manager\nforteq Czech\nKopisty 1-
+420 476 203 835\nMobil: +420 602 109 446\n"
I am converting the "\n" sequences to new lines (x0a).
Note the character sequence that is \xe1. I convert this sequence with
a character with a value of 225 (xe1).
I am using XOM to create the XML
Attribute attr = new Attribute("Text", textString);
xmlClass.addAttribute(attr);
<MyElem Text="Ing. Libor Vos?hlo&#x0A;Senior project manager&#x0A;forteq
Czech&#x0A;Kopisty 1- areal MUS,a.s.&#x0A;CZ - 43401, Most -
Kopisty&#x0A;Tel: +420 476 203 826&#x0A;Fax: +420 476 203
835&#x0A;Mobil: +420 602 109 446&#x0A;" />
Note the correct last name in the data (with the ?)
However, when reading this in various XML parsers, I get a
"java.io.UTFDataFormatException: Invalid byte 2 of 3-byte UTF-8
sequence" error.
How do I pass this string to XOM to get the correct UTF-8 encoding?
Thank you.
Sandy Mustard
_______________________________________________
XOM-interest mailing list
XOM-interest at lists.ibiblio.org
http://lists.ibiblio.org/mailman/listinfo/xom-interest
-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2014.0.4117 / Virus Database: 3604/6692 - Release Date: 09/23/13
Loading...