Norconex File System Crawler

Getting Started

Command Line Usage

usage: collector-fs[.bat|.sh]
 -a,--action <arg>      Required: one of start|resume|stop|checkcfg
 -c,--config <arg>      Required: File System Crawler configuration file.
 -v,--variables <arg>   Optional: variable file.
 -k,--checkcfg          Validates XML configuration. When combined with
                        -a, prevents execution on configuration error.

The above File System Crawler startup script is found in the root directory of your installation (where you extracted the Zip file you downloaded). Refer to the Flow Diagram and Configuration pages for documentation on all configuration options. Refer to ConfigurationLoader Javadoc for details on the optional variables file.

Java Integration

If you are using Maven, simply add the project dependency to your pom.xml. If you are not using Maven, you can add all JAR files found in your installation "lib" folder to your application classpath. Configure the FilesystemCollector class, by passing it a FilesystemCollectorConfig You can build the configuration using java, or by loading an XML configuration file using the CollectorConfigLoader class. Below is a sample code usage:

/* XML configuration: */
//FilesystemCollectorConfig config = (FilesystemCollectorConfig)
//        new CollectorConfigLoader(FilesystemCollectorConfig.class)
//            .loadCollectorConfig(myXMLFile, myVariableFile);
 
/* Java configuration: */
FilesystemCollectorConfig collectorConfig = new FilesystemCollectorConfig();
collectorConfig.setId("MyFilesystemCollector");
collectorConfig.setLogsDir("/tmp/logs/");
...
FilesystemCrawlerConfig crawlerConfig = new FilesystemConfig();
crawlerConfig.setId("MyFilesystemCrawler");
crawlerConfig.setStartPaths(
        new String[]{"/home/joe/myfiles", "/home/jack/hisfiles"});
...
collectorConfig.setCrawlerConfigs(crawlerConfig);
 
FilesystemCollector collector = new FilesystemCollector(collectorConfig);
collector.start(true);

Refer to the File System Crawler Javadoc for more documentation or the Configuration page for XML configuration options.

Extend the File System Crawler

To create your own feature implementations, create a new Java project in your favourite IDE. Use Maven or add to your classpath all the files contained in the lib folder of the File System Crawler installation. Configure your project to have its binary output directory to be the classes folder of the importer. Automatically, code created and stored under classes will be picked up by the File System Crawler when you run it.

SMB/CIFS Support

To fetch documents using the SMB/JCIFS protocol, you will need manually download install the following library: jcifs-1.3.17.jar. Command-line users can simply add it to the Collector's "lib" folder. Maven users can use the following:

    <dependency>
      <groupId>jcifs</groupId>
      <artifactId>jcifs</artifactId>
      <version>1.3.17</version>
    </dependency>

This extra step is required due to JCIFS licensing incompatibilities affecting distribution.

You should be using File System Crawler version 2.7.0 or higher for SMB/CIFS support.