jGetFile

Examples

Simple Download Scenario
Download with url inclusion/exclusion filters
Using a custom BeanShell Script filter to determine whether to traverse a link.
A Power User's configuration
Downloading all Fedora Core 4 iso's

Simple download scenario

Conditions: Download all top rated zip files from Project Gutenberg

In this scenario, there are only two options required for jgetfile, -r and -ext.
The -d or -it is always advised unless an infinitely recursive search is desired.

java -jar jgetfile.jar -r http://www.gutenberg.org/browse/scores/top -ext .zip -d 2

Note that due to way Project Gutenberg names files, the -ffn option (Flat File Namer) will not always result in an easily recognizable file name. One example of a downloaded file name from the above configuration is www.gutenberg.org__files_10609_10609.zip.

Download with url inclusion/exclusion filters

Conditions: Download all .doc and .pdf from http://www.foo.com/pathfoo, and only
traverse links that match http://www.foo.com/pathfoo

By using the -i option, you are telling jgetfile that you only want to follow links
that include the specified address; this does not have to be a fully qualified address
The inclusion can be http://www.foo.com or even http://fo, both will work depending
on the needs at hand.

The inclusion filter comes in handy when you want to traverse only links from a specific
web site, and to not waste time traversing advertisement links or other
non-relevant links. java -jar jgetfile.jar -r http://www.foo.com/pathfoo -ext .pdf,.doc -d 2 -i http://www.foo.com/pathfoo

Using a custom BeanShell Script filter to determine whether to traverse a link.

Conditions: Download all .doc and .pdf files from http://www.foo.com/pathfoo, traverse
only links that begin with http://ww.foo.com/dogs AND don't traverse any links that
begin with http://www.foo.com/cats OR any links that begin with http://www.foo.com/birds.

By using the -als (Accept Link Script) option and specifying a custom BeanShell script, you can
create arbitrarily complex link filters. Note that the -als option is not intended to define
which extensions to download, but to define whether a link is traversed or discarded internally by jgetfile. See
documentation for the list of variables that are available for use in the script.

somescript.bsh:

if(link.startsWith("http://www.foo.com/cats") || link.startsWith("http://www.foo.com/birds"))
	acceptLink = false;
else if(link.startsWith("http://www.foo.com/dogs"))
	acceptLink = true;
else
	acceptLink = false;

java -jar jgetfile.jar -r http://www.foo.com/pathfoo -ext .pdf,.doc -d 2 -als somescript.bsh

A Power User's configuration

For those who want to optimize jGetFile, there are plenty of options to do so. Here is a step-by-step explanation of the configuration below. First, we want to download all .exe files from http://www.gnarlyapps.com/applisting to the local directory C:\downloads\apps. jGetFile should use the custom BeanShell script myFilterScript.bsh to determine which links to traverse or exclude. 10 crawler threads should be used which means that 10 internal threads will be used to handle parsing links out of new links. A maximum of 4 downloads per connection are allowed and the maximum connections to unique hosts is 5. A flat file naming model will be used which names files in the format host_path_file. Files less than 4 kilobytes should be deleted. jGetFile will dynamically exclude links from traversal where the content type of the html response is not text/html. Finally, a Max Depth model will be used with a max depth of 2.

java -jar jgetfile.jar -r http://www.gnarlyapps.com/applisting -dir C:\downloads\apps -als myFilterScript.bsh -ext .exe -ct 10 -md 4 -mc 5 -ffn -df 4000 -de -d 2

Downloading all Fedora Core 4 iso's

In this scenario we are downloading all iso's from one of the FC 4 mirrors. It is important to note that the -md (Max downloads per connection) option is set to 1. This means that we only want to download 1 file at a time, as not to overload the mirror.

java -jar jgetfile.jar -r ftp://ftp.linux.ncsu.edu/pub/fedora/linux/core/4/i386/iso -ext .iso -d 1 -dir C:\downloads -md 1 -ct 1