public abstract class AbstractMongoCrawlDataStoreFactory extends Object implements ICrawlDataStoreFactory, IXMLConfigurable
Mongo implementation of ICrawlDataStore
.
All the references are stored in a collection named 'references'. They go from the "QUEUED", "ACTIVE" and "PROCESSED" stages.
The cached references are stored in a separated collection named "cached".
As of 1.8.0, password
can take a password that has been
encrypted using EncryptionUtil
(or command-line encrypt.[bat|sh]).
In order for the password to be decrypted properly by the crawler, you need
to specify the encryption key used to encrypt it. The key can be stored
in a few supported locations and a combination of
passwordKey
and passwordKeySource
must be specified to properly
locate the key. The supported sources are:
passwordKeySource |
passwordKey |
---|---|
key |
The actual encryption key. |
file |
Path to a file containing the encryption key. |
environment |
Name of an environment variable containing the key. |
property |
Name of a JVM system property containing the key. |
Implementing classes should contain the following XML configuration usage:
<crawlDataStoreFactory class="(class name)"> <host>(Optional Mongo server hostname. Default to localhost)</host> <port>(Optional Mongo port. Default to 27017)</port> <dbname>(Optional Mongo database name. Default to crawl id)</dbname> <username>(Optional user name)</username> <password>(Optional user password)</password> <cachedCollectionName>(Custom "cached" collection name)</cachedCollectionName> <referencesCollectionName>(Custom "references" collection name)</referencesCollectionName> <mechanism>(Optional authentication mechanism)</mechanism> <sslEnabled>[false|true]</sslEnabled> <sslInvalidHostNameAllowed>[false|true]</sslInvalidHostNameAllowed> <!-- Use the following if password is encrypted. --> <passwordKey>(the encryption key or a reference to it)</passwordKey> <passwordKeySource>[key|file|environment|property]</passwordKeySource> </crawlDataStoreFactory>
If "username" is not provided, no authentication will be attempted. The "username" must be a valid user that has the "readWrite" role over the database (set with "dbname").
As of 1.8.1, it is now possible to specify the MongoDB authentication mechanism to use. The following are supported:
When no mechanism is specified, the default mechanism will be the Challenge Response (MONGODB-CR) for MongoDB 2 and and SCRAM SHA1 (SCRAM-SHA-1) for MongoDB 3+. The following is an example forcing MONGODB-CR authentication:
<username>joe_user</username> <password>joe_pwd</password> <mechanism>MONGODB-CR</mechanism>
As of 1.9.0, you can define your own collection names with
setReferencesCollectionName(String)
and
setCachedCollectionName(String)
.
As of 1.10.0, you can enable SSL.
BaseMongoSerializer
Constructor and Description |
---|
AbstractMongoCrawlDataStoreFactory() |
Modifier and Type | Method and Description |
---|---|
ICrawlDataStore |
createCrawlDataStore(ICrawlerConfig config,
boolean resume)
Creates a new crawl data store.
|
protected abstract IMongoSerializer |
createMongoSerializer() |
boolean |
equals(Object other) |
String |
getCachedCollectionName()
Gets the cached collection name.
|
MongoConnectionDetails |
getConnectionDetails() |
String |
getReferencesCollectionName()
Gets the references collection name.
|
int |
hashCode() |
void |
loadFromXML(Reader in) |
void |
saveToXML(Writer out) |
void |
setCachedCollectionName(String cachedCollectionName)
Sets the cached collection name.
|
void |
setReferencesCollectionName(String referencesCollectionName)
Sets the references collection name.
|
String |
toString() |
public ICrawlDataStore createCrawlDataStore(ICrawlerConfig config, boolean resume)
ICrawlDataStoreFactory
createCrawlDataStore
in interface ICrawlDataStoreFactory
config
- crawler configurationresume
- whether the crawler was started or resumedpublic MongoConnectionDetails getConnectionDetails()
public String getReferencesCollectionName()
public void setReferencesCollectionName(String referencesCollectionName)
referencesCollectionName
- collection namepublic String getCachedCollectionName()
public void setCachedCollectionName(String cachedCollectionName)
cachedCollectionName
- collection nameprotected abstract IMongoSerializer createMongoSerializer()
public void loadFromXML(Reader in) throws IOException
loadFromXML
in interface IXMLConfigurable
IOException
public void saveToXML(Writer out) throws IOException
saveToXML
in interface IXMLConfigurable
IOException
Copyright © 2014–2021 Norconex Inc.. All rights reserved.