Misha Koshelev
2010-05-23 15:02:54 UTC
Dear All:
My apologies if this is a simple question. I am working on a Web Automation Framework like WebDriver but that allows window hiding (it uses the SWT browser). You can find info here:
http://www.mkosh.com/
along with a current (unstable) version that uses nekohtml/Xerces under http://www.mkosh.com/unstable/
XOM+tagSoup has a great interface for doing XPath! and seems to be quite favored among professionals.
However, I remember that last time I tried to use it (a week ago), I ran into the following problem. I have an HTML document:
<html>
<head></head>
<body>
<div>Hello <a href="somewhere">Some Text</a></div>
</body>
</html>
I would like to extract the text "Hello " above if I have an Element object for the <div> tag.
Now, for some reason, when I was going through the Nodes produced by tagsoup/XOM in my program, there were no text nodes.
In case, the only way I could extract text was by using the toXML() method and parsing the results myself to get rid of the <a href...
Is there a way to turn on Text node parsing, or am I missing something simple?
Thank you!
Misha
p.s. Please reply all
My apologies if this is a simple question. I am working on a Web Automation Framework like WebDriver but that allows window hiding (it uses the SWT browser). You can find info here:
http://www.mkosh.com/
along with a current (unstable) version that uses nekohtml/Xerces under http://www.mkosh.com/unstable/
XOM+tagSoup has a great interface for doing XPath! and seems to be quite favored among professionals.
However, I remember that last time I tried to use it (a week ago), I ran into the following problem. I have an HTML document:
<html>
<head></head>
<body>
<div>Hello <a href="somewhere">Some Text</a></div>
</body>
</html>
I would like to extract the text "Hello " above if I have an Element object for the <div> tag.
Now, for some reason, when I was going through the Nodes produced by tagsoup/XOM in my program, there were no text nodes.
In case, the only way I could extract text was by using the toXML() method and parsing the results myself to get rid of the <a href...
Is there a way to turn on Text node parsing, or am I missing something simple?
Thank you!
Misha
p.s. Please reply all