Current Features
- Ability to download from HTTP and FTP
- User configurable multithreadedness on three levels: Application, Web, and Download
- Chainable link filtering. Options: -i,-e,-als, and -re can be chained together and are ANDed to determine whether a link is accepted.
- Custom BeanShell script can be specified for determining whether a link is accepted or rejected. Used via the -als option. This script extends the base functionality of jGetFile, it is not required to crawl a site.
- More info to come...
Future Features
- Ability to specify a single threaded crawler model instead of multithreaded one, to reduce the overhead of using threads. Even though one can specify using only one thread to crawl new links with, jGetFile still creates a new thread for each link. This does cause overhead, and can be optimized to use a non-threaded approach.
- Allow the use Jython or other scripting languages along side of BeanShell.