Class ExternalTransformer
- java.lang.Object
-
- com.norconex.importer.handler.AbstractImporterHandler
-
- com.norconex.importer.handler.transformer.AbstractDocumentTransformer
-
- com.norconex.importer.handler.transformer.impl.ExternalTransformer
-
- All Implemented Interfaces:
IXMLConfigurable
,IImporterHandler
,IDocumentTransformer
public class ExternalTransformer extends AbstractDocumentTransformer
Transforms a document using an external application to do so.
This class relies on
ExternalHandler
for most of the work. Refer toExternalHandler
for full documentation.To parse/extract raw text from files, it is recommended to use
ExternalParser
instead.XML configuration usage:
<handler class="com.norconex.importer.handler.transformer.impl.ExternalTransformer"> <!-- multiple "restrictTo" tags allowed (only one needs to match) --> <restrictTo> <fieldMatcher>(field-matching expression)</fieldMatcher> <valueMatcher>(value-matching expression)</valueMatcher> </restrictTo> <command> c:\Apps\myapp.exe ${INPUT} ${OUTPUT} ${INPUT_META} ${OUTPUT_META} ${REFERENCE} </command> <metadata inputFormat="[json|xml|properties]" outputFormat="[json|xml|properties]"> <!-- Pattern only used when no output format is specified. Repeat as needed. --> <pattern>(regular expression)</pattern> </metadata> <environment> <!-- repeat variable tag as needed --> <variable name="(environment variable name)"> (environment variable value) </variable> </environment> <tempDir> (Optional directory where to store temporary files used for transformation.) </tempDir> </handler>
XML usage example:
<handler class="ExternalTransformer"> <command>/path/transform/app ${INPUT} ${OUTPUT}</command> <metadata> <pattern field="docnumber" valueGroup="1"> DocNo:(\d+) </pattern> </metadata> </handler>
The above example invokes an external application that accepts two files as arguments: the first one being the file to transform, the second one being holding the transformation result. It also extract a document number from STDOUT, found as "DocNo:1234" and storing it as "docnumber".
- Since:
- 2.7.0
- Author:
- Pascal Essiembre
- See Also:
ExternalHandler
-
-
Constructor Summary
Constructors Constructor Description ExternalTransformer()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
addEnvironmentVariable(String name, String value)
Adds an environment variables to the list of previously assigned variables (if any).void
addEnvironmentVariables(Map<String,String> environmentVariables)
Adds the environment variables, keeping environment variables previously assigned.void
addMetadataExtractionPattern(String field, String pattern)
Adds a metadata extraction pattern that will extract the whole text matched into the given field.void
addMetadataExtractionPattern(String field, String pattern, int valueGroup)
Adds a metadata extraction pattern, which will extract the value from the specified group index upon matching.void
addMetadataExtractionPatterns(RegexFieldValueExtractor... patterns)
Adds a metadata extraction pattern that will extract matching field names/values.boolean
equals(Object other)
String
getCommand()
Gets the command to execute.Map<String,String>
getEnvironmentVariables()
Gets environment variables.List<RegexFieldValueExtractor>
getMetadataExtractionPatterns()
Gets metadata extraction patterns.String
getMetadataInputFormat()
Gets the format of the metadata input file sent to the external application.String
getMetadataOutputFormat()
Gets the format of the metadata output file from the external application.PropertySetter
getOnSet()
Gets the property setter to use when a metadata value is set.Path
getTempDir()
Gets directory where to store temporary files used for transformation.int
hashCode()
protected void
loadHandlerFromXML(XML xml)
Loads configuration settings specific to the implementing class.protected void
saveHandlerToXML(XML xml)
Saves configuration settings specific to the implementing class.void
setCommand(String command)
Sets the command to execute.void
setEnvironmentVariables(Map<String,String> environmentVariables)
Sets the environment variables.void
setMetadataExtractionPatterns(RegexFieldValueExtractor... patterns)
Sets metadata extraction patterns.void
setMetadataInputFormat(String metadataInputFormat)
Sets the format of the metadata input file sent to the external application.void
setMetadataOutputFormat(String metadataOutputFormat)
Sets the format of the metadata output file from the external application.void
setOnSet(PropertySetter onSet)
Sets the property setter to use when a metadata value is set.void
setTempDir(Path tempDir)
Sets directory where to store temporary files used for transformation.String
toString()
protected void
transformApplicableDocument(HandlerDoc doc, InputStream input, OutputStream output, ParseState parseState)
-
Methods inherited from class com.norconex.importer.handler.transformer.AbstractDocumentTransformer
transformDocument
-
Methods inherited from class com.norconex.importer.handler.AbstractImporterHandler
addRestriction, addRestriction, addRestrictions, clearRestrictions, detectCharsetIfBlank, getRestrictions, isApplicable, loadFromXML, removeRestriction, removeRestriction, saveToXML
-
-
-
-
Method Detail
-
getCommand
public String getCommand()
Gets the command to execute.- Returns:
- the command
-
setCommand
public void setCommand(String command)
Sets the command to execute. Make sure to escape spaces in executable path and its arguments as well as other special command line characters.- Parameters:
command
- the command
-
getMetadataExtractionPatterns
public List<RegexFieldValueExtractor> getMetadataExtractionPatterns()
Gets metadata extraction patterns. See class documentation.- Returns:
- map of patterns and field names
-
addMetadataExtractionPattern
public void addMetadataExtractionPattern(String field, String pattern)
Adds a metadata extraction pattern that will extract the whole text matched into the given field.- Parameters:
field
- target field to store the matching pattern.pattern
- the pattern
-
addMetadataExtractionPattern
public void addMetadataExtractionPattern(String field, String pattern, int valueGroup)
Adds a metadata extraction pattern, which will extract the value from the specified group index upon matching.- Parameters:
field
- target field to store the matching pattern.pattern
- the patternvalueGroup
- which pattern group to return.
-
addMetadataExtractionPatterns
public void addMetadataExtractionPatterns(RegexFieldValueExtractor... patterns)
Adds a metadata extraction pattern that will extract matching field names/values.- Parameters:
patterns
- extraction pattern
-
setMetadataExtractionPatterns
public void setMetadataExtractionPatterns(RegexFieldValueExtractor... patterns)
Sets metadata extraction patterns. Clears any previously assigned patterns.- Parameters:
patterns
- extraction pattern
-
getEnvironmentVariables
public Map<String,String> getEnvironmentVariables()
Gets environment variables.- Returns:
- environment variables or
null
if using the current process environment variables
-
setEnvironmentVariables
public void setEnvironmentVariables(Map<String,String> environmentVariables)
Sets the environment variables. Clearing any prevously assigned environment variables. Setnull
to use the current process environment variables (default).- Parameters:
environmentVariables
- environment variables
-
addEnvironmentVariables
public void addEnvironmentVariables(Map<String,String> environmentVariables)
Adds the environment variables, keeping environment variables previously assigned. Existing variables of the same name will be overwritten. To clear all previously assigned variables and use the current process environment variables, passnull
tosetEnvironmentVariables(Map)
.- Parameters:
environmentVariables
- environment variables
-
addEnvironmentVariable
public void addEnvironmentVariable(String name, String value)
Adds an environment variables to the list of previously assigned variables (if any). Existing variables of the same name will be overwritten. Setting a variable with anull
name has no effect whilenull
values are converted to empty strings.- Parameters:
name
- environment variable namevalue
- environment variable value
-
getMetadataInputFormat
public String getMetadataInputFormat()
Gets the format of the metadata input file sent to the external application. One of "json" (default), "xml", or "properties" is expected. Only applicable when the${INPUT}
token is part of the command.- Returns:
- metadata input format
-
setMetadataInputFormat
public void setMetadataInputFormat(String metadataInputFormat)
Sets the format of the metadata input file sent to the external application. One of "json" (default), "xml", or "properties" is expected. Only applicable when the${INPUT}
token is part of the command.- Parameters:
metadataInputFormat
- format of the metadata input file
-
getMetadataOutputFormat
public String getMetadataOutputFormat()
Gets the format of the metadata output file from the external application. By default no format is set, and metadata extraction patterns are used to extract metadata information. One of "json", "xml", or "properties" is expected. Only applicable when the${OUTPUT}
token is part of the command.- Returns:
- metadata output format
-
setMetadataOutputFormat
public void setMetadataOutputFormat(String metadataOutputFormat)
Sets the format of the metadata output file from the external application. One of "json" (default), "xml", or "properties" is expected. Set tonull
for relying metadata extraction patterns instead. Only applicable when the${OUTPUT}
token is part of the command.- Parameters:
metadataOutputFormat
- format of the metadata output file
-
getOnSet
public PropertySetter getOnSet()
Gets the property setter to use when a metadata value is set.- Returns:
- property setter
- Since:
- 3.0.0
-
setOnSet
public void setOnSet(PropertySetter onSet)
Sets the property setter to use when a metadata value is set.- Parameters:
onSet
- property setter- Since:
- 3.0.0
-
getTempDir
public Path getTempDir()
Gets directory where to store temporary files used for transformation.- Returns:
- temporary directory
-
setTempDir
public void setTempDir(Path tempDir)
Sets directory where to store temporary files used for transformation.- Parameters:
tempDir
- temporary directory
-
transformApplicableDocument
protected void transformApplicableDocument(HandlerDoc doc, InputStream input, OutputStream output, ParseState parseState) throws ImporterHandlerException
- Specified by:
transformApplicableDocument
in classAbstractDocumentTransformer
- Throws:
ImporterHandlerException
-
loadHandlerFromXML
protected void loadHandlerFromXML(XML xml)
Description copied from class:AbstractImporterHandler
Loads configuration settings specific to the implementing class.- Specified by:
loadHandlerFromXML
in classAbstractImporterHandler
- Parameters:
xml
- XML configuration
-
saveHandlerToXML
protected void saveHandlerToXML(XML xml)
Description copied from class:AbstractImporterHandler
Saves configuration settings specific to the implementing class.- Specified by:
saveHandlerToXML
in classAbstractImporterHandler
- Parameters:
xml
- the XML
-
equals
public boolean equals(Object other)
- Overrides:
equals
in classAbstractImporterHandler
-
hashCode
public int hashCode()
- Overrides:
hashCode
in classAbstractImporterHandler
-
toString
public String toString()
- Overrides:
toString
in classAbstractImporterHandler
-
-