public class RegexFieldValueExtractor extends Object implements IXMLConfigurable
Simplify extraction of field/value pairs (or "key/value") from text using regular expression. Match groups can be used to identify the fields and values. Field matching is optional and can be set explicitly instead. If both a "toField" and a "fieldGroup" are provided, the toField act as a default when no fields could be obtained from matching. At least one of "toField" or "fieldGroup" must be specified. If fieldGroup is specified without a "toField" and finds no matches, the matching of the value is ignored. If no value group is provided, it assumes the entire regex match is the value. If more than one value is extracted for a given toField, they will be available as a list.
When initialized with a "pattern" only instead of passing or configuring
a Regex
instance, a default one will be created, assuming
case insensitivity and dots matching any character.
toField="(toField name)"
fieldGroup="(toField name match group index)"
valueGroup="(value match group index)"
onSet="[append|prepend|replace|optional]"
ignoreCase="[false|true]"
ignoreDiacritic="[false|true]"
dotAll="[false|true]"
unixLines="[false|true]"
literal="[false|true]"
comments="[false|true]"
multiline="[false|true]"
canonEq="[false|true]"
unicodeCase="[false|true]"
unicodeCharacterClass="[false|true]"
The above are configurable attributes consuming classes can expect.
The actual regular expression is expected to be the tag content.
Many of the available attributes on XML configuration represent the
regular expression flags as defined in Pattern
.
<sampleConfig
fieldGroup="1"
valueGroup="2">
(DocNo):(\d+)
</sampleConfig>
The above is configured to extract "DocNo" as the toField, and the following numeric characters will make up the value.
Modifier and Type | Field and Description |
---|---|
static RegexFieldValueExtractor[] |
EMPTY_ARRAY |
Constructor and Description |
---|
RegexFieldValueExtractor() |
RegexFieldValueExtractor(Regex regex) |
RegexFieldValueExtractor(Regex regex,
int fieldGroup,
int valueGroup) |
RegexFieldValueExtractor(Regex regex,
String field) |
RegexFieldValueExtractor(Regex regex,
String field,
int valueGroup) |
RegexFieldValueExtractor(String pattern) |
RegexFieldValueExtractor(String pattern,
int fieldGroup,
int valueGroup) |
RegexFieldValueExtractor(String pattern,
String field) |
RegexFieldValueExtractor(String pattern,
String field,
int valueGroup) |
public static final RegexFieldValueExtractor[] EMPTY_ARRAY
public RegexFieldValueExtractor()
public RegexFieldValueExtractor(String pattern)
public RegexFieldValueExtractor(String pattern, String field, int valueGroup)
public RegexFieldValueExtractor(String pattern, int fieldGroup, int valueGroup)
public RegexFieldValueExtractor(Regex regex)
public RegexFieldValueExtractor(Regex regex, int fieldGroup, int valueGroup)
public Regex getRegex()
public RegexFieldValueExtractor setRegex(Regex regex)
public int getFieldGroup()
public RegexFieldValueExtractor setFieldGroup(int fieldGroup)
public int getValueGroup()
public RegexFieldValueExtractor setValueGroup(int valueGroup)
public String getToField()
public RegexFieldValueExtractor setToField(String field)
public PropertySetter getOnSet()
public RegexFieldValueExtractor setOnSet(PropertySetter onSet)
onSet
- property setterpublic void extractFieldValues(Properties dest, CharSequence text)
public Properties extractFieldValues(CharSequence text)
public static void extractFieldValues(Properties dest, CharSequence text, List<RegexFieldValueExtractor> extractors)
public static Properties extractFieldValues(CharSequence text, List<RegexFieldValueExtractor> extractors)
public static void extractFieldValues(Properties dest, CharSequence text, RegexFieldValueExtractor... extractors)
public static Properties extractFieldValues(CharSequence text, RegexFieldValueExtractor... extractors)
public void loadFromXML(XML xml)
IXMLConfigurable
loadFromXML
in interface IXMLConfigurable
xml
- the XML to load into this objectpublic void saveToXML(XML xml)
IXMLConfigurable
saveToXML
in interface IXMLConfigurable
xml
- the XML that will representing this objectCopyright © 2008–2023 Norconex Inc.. All rights reserved.