[XOM-interest] tagsoup+XOM+text nodes?

Misha Koshelev

2010-05-23 15:02:54 UTC

Dear All:

My apologies if this is a simple question. I am working on a Web Automation Framework like WebDriver but that allows window hiding (it uses the SWT browser). You can find info here:
http://www.mkosh.com/
along with a current (unstable) version that uses nekohtml/Xerces under http://www.mkosh.com/unstable/

XOM+tagSoup has a great interface for doing XPath! and seems to be quite favored among professionals.

However, I remember that last time I tried to use it (a week ago), I ran into the following problem. I have an HTML document:
<html>
<head></head>
<body>
<div>Hello <a href="somewhere">Some Text</a></div>
</body>
</html>

I would like to extract the text "Hello " above if I have an Element object for the <div> tag.

Now, for some reason, when I was going through the Nodes produced by tagsoup/XOM in my program, there were no text nodes.

In case, the only way I could extract text was by using the toXML() method and parsing the results myself to get rid of the <a href...

Is there a way to turn on Text node parsing, or am I missing something simple?

Thank you!
Misha

p.s. Please reply all