Website scraping with JSoup and XMLBeam — Part 3

In the last two articles I introduced website scraping with XMLBeam and JSoup respectively.

In this article I’ll do a final comparison of the two libraries — however do not expect anything professional. It will be just a simple run-and-measure-time analysis of the two libraries on my dataset — and on two machines.

Continue reading

Website scraping with JSoup and XMLBeam — Part 2

In the last article I covered XMLBeam for scraping a not so well formed HTML site which gave me a lot of pain. Now I’ll look at the same task implemented with JSoup. I guess I can mention it at the beginning that the ill-formedness caused no pain with this tool.

Continue reading

Website scraping with JSoup and XMLBeam — Part 1

Last time I was writing about XMLBeam, a new tool for parsing XML documents based on XPath. At the end of the article I mentioned that I’ll write about parsing an HTML website with XMLBeam and JSoup to compare them which one is better to use.

This article is the first part which is introducing the task and covering the XMLBeam implementation. The next article will tell more about JSoup and a comparison between the two tools.

Continue reading