Posted by

Posted on

October 7, 2014

Posted under

Comments

Website scraping with JSoup and XMLBeam — Part 3

In the last two articles I introduced website scraping with XMLBeam and JSoup respectively.

In this article I’ll do a final comparison of the two libraries — however do not expect anything professional. It will be just a simple run-and-measure-time analysis of the two libraries on my dataset — and on two machines.

Continue reading →

Posted by

GHajba

Posted on

September 30, 2014

Posted under

Java, Software Development

Comments

2 Comments

Website scraping with JSoup and XMLBeam — Part 2

In the last article I covered XMLBeam for scraping a not so well formed HTML site which gave me a lot of pain. Now I’ll look at the same task implemented with JSoup. I guess I can mention it at the beginning that the ill-formedness caused no pain with this tool.

Continue reading →

Posted by

GHajba

Posted on

September 23, 2014

Posted under

Java, Software Development

Comments

4 Comments

Website scraping with JSoup and XMLBeam — Part 1

Last time I was writing about XMLBeam, a new tool for parsing XML documents based on XPath. At the end of the article I mentioned that I’ll write about parsing an HTML website with XMLBeam and JSoup to compare them which one is better to use.

This article is the first part which is introducing the task and covering the XMLBeam implementation. The next article will tell more about JSoup and a comparison between the two tools.

Continue reading →

JaPy Software

We're just barely good enough for better software

Tag Archives: try-with-resources

Website scraping with JSoup and XMLBeam — Part 3

Website scraping with JSoup and XMLBeam — Part 2

Website scraping with JSoup and XMLBeam — Part 1

Website scraping with JSoup and XMLBeam — Part 3

Share this:

Website scraping with JSoup and XMLBeam — Part 2

Share this:

Website scraping with JSoup and XMLBeam — Part 1

Share this: