Jason Novotny
2009-10-29 00:23:03 UTC
Hi,
I'm completely new to xpath but after finding an interesting post on
using XOM, Saxon and TagSoup I have a simple example below:
XMLReader tagsoup =
XMLReaderFactory.createXMLReader("org.ccil.cowan.tagsoup.Parser");
tagsoup.setFeature("http://xml.org/sax/features/namespace-prefixes",true);
Builder builder = new Builder(tagsoup);
String url = "http://myurl.com";
Document doc = builder.build(url);
XPathContext context = new XPathContext("html",
"http://www.w3.org/1999/xhtml");
// THIS ONE WORKS
Nodes estimated = doc.query("//html:span[@class='small_title'][2]",
context);
for (int i = 0; i < estimated.size(); i++) {
System.err.println(estimated.get(i).toXML());
}
// THIS DOESN'T
Nodes results =
doc.query("//html/body/table/tbody/tr/td/table/tbody/tr/td/table[2]/tbody/tr/td[2]/table/tbody/tr/td/table[4]/tbody/tr[3]/td[3]",
context);
for (int i = 0; i < results.size(); i++) {
System.err.println(results.get(i).toXML());
}
So my problem is the syntax-- the first query works fine but the second
one doesn't-- I actually pulled this query string from using Firebug and
doing "copy XPath"...
Any guidance on libraries I can use or the proper syntax with dealing
with HTML querying would be greatly appreciated!
Thanks, Jason
I'm completely new to xpath but after finding an interesting post on
using XOM, Saxon and TagSoup I have a simple example below:
XMLReader tagsoup =
XMLReaderFactory.createXMLReader("org.ccil.cowan.tagsoup.Parser");
tagsoup.setFeature("http://xml.org/sax/features/namespace-prefixes",true);
Builder builder = new Builder(tagsoup);
String url = "http://myurl.com";
Document doc = builder.build(url);
XPathContext context = new XPathContext("html",
"http://www.w3.org/1999/xhtml");
// THIS ONE WORKS
Nodes estimated = doc.query("//html:span[@class='small_title'][2]",
context);
for (int i = 0; i < estimated.size(); i++) {
System.err.println(estimated.get(i).toXML());
}
// THIS DOESN'T
Nodes results =
doc.query("//html/body/table/tbody/tr/td/table/tbody/tr/td/table[2]/tbody/tr/td[2]/table/tbody/tr/td/table[4]/tbody/tr[3]/td[3]",
context);
for (int i = 0; i < results.size(); i++) {
System.err.println(results.get(i).toXML());
}
So my problem is the syntax-- the first query works fine but the second
one doesn't-- I actually pulled this query string from using Firebug and
doing "copy XPath"...
Any guidance on libraries I can use or the proper syntax with dealing
with HTML querying would be greatly appreciated!
Thanks, Jason