Here we will illustrate the use of the crawler4j (https://github.com/yasserg/crawler4j) web crawler. We will use an adapted version of the basic crawler found at https://github.com/yasserg/crawler4j/tree/master/src/test/java/edu/uci/ics/crawler4j/examples/basic. We will create two classes: CrawlerController and SampleCrawler. The former class set ups the crawler while the latter contains the logic that controls what pages will be processed.
As with our previous crawler, we will crawl the Wikipedia article dealing with Bishop Rock. The results using this crawler will be smaller as many extraneous pages are ignored.
Let's look at the CrawlerController class first. There are several parameters that are used ...