Norconex Neo4j Committer

Configuration

When used with a Norconex Crawler, you can use the following XML to configure Neo4j as the <committer> section of your Norconex Crawler configuration.

<committer class="com.norconex.committer.neo4j.Neo4jCommitter">
 
  <!-- Mandatory settings: -->
  <uri>...</uri>
  <user>...</user>
  <password>...</password>
  <authentType>...</authentType>
  <multiValuesJoiner>...</multiValuesJoiner>
 
  <!-- Other settings: -->
  <nodeTopology>[ONE_NODE|NO_CONTENT|SPLITTED]</nodeTopology>
  <primaryLabel>...</primaryLabel>
  <relationships>
    <relationship type="..." direction="[NONE|INCOMING|OUTGOING|BOTH]">
      <sourcePropertyKey>...</sourcePropertyKey>
      <targetPropertyKey>...</targetPropertyKey>
    </relationship>
  </relationships>
 
  <additionalLabels>
    <sourceField keep="[false|true]">...</sourceField>
  </additionalLabels>
 
  <!-- Use the following if password is encrypted. -->
  <passwordKey>...</passwordKey>
  <passwordKeySource>[key|file|environment|property]</passwordKeySource>
 
  <sourceReferenceField keep="[false|true]">...</sourceReferenceField>
  <targetReferenceField>...</targetReferenceField>
  <sourceContentField keep="[false|true]">...</sourceContentField>
  <targetContentField>...</targetContentField>
  <queueDir>...</queueDir>
  <queueSize>...</queueSize>
  <commitBatchSize>...</commitBatchSize>
  <maxRetries>...</maxRetries>
  <maxRetryWait>...</maxRetryWait>
</committer>

Tag descriptions:

Tag Description
uri Connection Uri. E.g., "bolt://localhost:7687".
user The Neo4j username.
password The Neo4j password.
passwordKey Optional password key if password is encrypted. Refer to the API Documentation for more details.
passwordKeySource Optional password encryption key source. One of key, file, environment, or property. Refer to the API Documentation for more details.
authentType Only BASIC is supported for now.
multiValuesJoiner One or more characters to join multi-value fields. Default is "|".
nodeTopology The structure of a node for a committed document. Possible values:
ONE_NODE Default. Creates a node with metadata and content.
NO_CONTENT Creates a node without content.
SPLITTED Creates three nodes, one main node with the ID for the committed document, one with the content (linked to the main node) and another with metadata also linked to the main node.
primaryLabel Primary label name used for all created nodes.
additionalLabels It is possible to add other labels on a newly created node. To do that, specify one or more metadata fields using sourceField elements.
relationships Relationships is where you define relationships between nodes. If a source field/property or target field/property does not exist, it will be created automatically. Possible values for the "direction" attribute are: NONE, INCOMING, OUTGOING, or BOTH. The "type" attribute is an identifier/name for your relationship.
sourceReferenceField Name of source field that will be mapped to the Neo4j target id field. Default is the document reference the Committer stores as committer.reference. Once re-mapped, the metadata source field is deleted, unless keep is set to true.
targetReferenceField Name of target id field. Default is id. Typically is a tableName primary key.
sourceContentField Source field name containing a document content/body. Default is not a field, but rather the document body content. Once re-mapped, the metadata source field is deleted, unless keep is set to true.
targetContentField Neo4j target field name for a document content/body. Default is: content.
queueDir Path where to queue files before sending them to Neo4j. Default is: ./committer-queue
queueSize Number of documents or deletes to queue before sending to Neo4j. Default is: 1000.
commitBatchSize Maximum number of documents to send Neo4j at once. Default is: 100.
maxRetries Maximum number of retries upon commit failures. Default is: 0 (no retry).
maxRetryWait Delay between retries. Default is: 0 (no delay).