Class Regex
- java.lang.Object
-
- com.norconex.commons.lang.text.Regex
-
- All Implemented Interfaces:
IXMLConfigurable
public class Regex extends Object implements IXMLConfigurable
Builder and utility methods making it easier to construct and use regular expressions. In addition, you can obtain a
Matcher
with support for empty ornull
valuesEmpty and
null
valuesSince 3.0.0, you can force
null
and empty strings to be considered a positive match, regardless of the specified pattern. To do so, setmatchEmpty
totrue
. To have blank values (containing white spaces only) considered as positive matches, also settrim
totrue
. When matching empties, doing replacement on anull
value behaves as if the value is an empty string.XML configuration usage:
ignoreCase="[false|true]" ignoreDiacritic="[false|true]" dotAll="[false|true]" unixLines="[false|true]" literal="[false|true]" comments="[false|true]" multiline="[false|true]" canonEq="[false|true]" unicodeCase="[false|true]" unicodeCharacterClass="[false|true]" trim="[false|true]" matchEmpty="[false|true]"
The above are configurable attributes consuming classes can expect. The actual regular expression is expected to be the tag content. Many of the available attributes on XML configuration represent the regular expression flags as defined in
Pattern
.XML usage example:
<sampleConfig ignoreCase="true" dotAll="true"> ^start.*end$ </sampleConfig>
The above will match any text that starts with "start" and ends with "ends", regardless if there are new line characters in between.
- Since:
- 2.0.0
- Author:
- Pascal Essiembre
- See Also:
Pattern
-
-
Field Summary
Fields Modifier and Type Field Description static int
UNICODE_CASE_INSENSTIVE_FLAG
Convenience flag that combinesPattern.UNICODE_CASE
andPattern.CASE_INSENSITIVE
static int
UNICODE_MARK_INSENSTIVE_FLAG
Flag that ignores diacritical marks when matching or replacing (e.g.
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description protected boolean
canEqual(Object other)
Regex
canonEq()
Regex
comments()
Pattern
compile()
Compiles a previously set pattern.Pattern
compile(String pattern)
Compiles the given pattern without assigning it to this object.static Pattern
compileDotAll(String regex, boolean ignoreCase)
Compiles a "dotall" pattern (dots match all, including new lines) with optional case sensitivity.RegexFieldValueExtractor
createKeyValueExtractor()
RegexFieldValueExtractor
createKeyValueExtractor(int keyGroup, int valueGroup)
RegexFieldValueExtractor
createKeyValueExtractor(String key)
RegexFieldValueExtractor
createKeyValueExtractor(String key, int valueGroup)
Regex
dotAll()
boolean
equals(Object o)
static String
escape(String pattern)
Escape special characters with a backslash (\) in a regular expression.Set<Integer>
getFlags()
String
getPattern()
int
hashCode()
Regex
ignoreCase()
Regex
ignoreDiacritic()
Ignores diacritical marks when matching or replacing (e.g.boolean
isCanonEq()
boolean
isComments()
boolean
isDotAll()
boolean
isIgnoreCase()
boolean
isIgnoreDiacritic()
boolean
isLiteral()
boolean
isMatchEmpty()
Gets whethernull
or empty strings should be considered a positive match.boolean
isMultiline()
boolean
isTrim()
Gets whether values should be trimmed before being evaluated (as perString.trim()
).boolean
isUnicodeCase()
boolean
isUnicodeCharacterClass()
boolean
isUnixLines()
Regex
literal()
void
loadFromXML(XML xml)
Load XML configuration values and initialized this object with them.Regex
matchEmpty()
Sets thatnull
or empty strings should be considered a positive match.Matcher
matcher(CharSequence text)
Matches the previously set pattern against the given text.Matcher
matcher(String pattern, CharSequence text)
Matches the the given pattern against the given text without assigning the pattern to this object.Regex
multiline()
void
saveToXML(XML xml)
Saves this object as XML.Regex
setCanonEq(boolean canonEq)
Regex
setComments(boolean comments)
Regex
setDotAll(boolean dotAll)
Regex
setFlags(int... flags)
Regex
setIgnoreCase(boolean ignoreCase)
Regex
setIgnoreDiacritic(boolean ignoreDiacritic)
Regex
setLiteral(boolean literal)
Regex
setMatchEmpty(boolean matchEmpty)
Sets whethernull
or empty strings should be considered a positive match.Regex
setMultiline(boolean multiline)
Regex
setPattern(String pattern)
Regex
setTrim(boolean trim)
Sets whether values should be trimmed before being evaluated (as perString.trim()
).Regex
setUnicodeCase(boolean unicode)
Regex
setUnicodeCharacterClass(boolean unicode)
Regex
setUnixLines(boolean unixLines)
String
toString()
Regex
trim()
Sets that values should be trimmed before being evaluated (as perString.trim()
).Regex
unicodeCase()
Regex
unicodeCharacterClass()
Regex
unixLines()
-
-
-
Field Detail
-
UNICODE_MARK_INSENSTIVE_FLAG
public static final int UNICODE_MARK_INSENSTIVE_FLAG
Flag that ignores diacritical marks when matching or replacing (e.g. accents). This flag is not supported by JavaPattern
and only works when used with this class.- See Also:
- Constant Field Values
-
UNICODE_CASE_INSENSTIVE_FLAG
public static final int UNICODE_CASE_INSENSTIVE_FLAG
Convenience flag that combinesPattern.UNICODE_CASE
andPattern.CASE_INSENSITIVE
- See Also:
- Constant Field Values
-
-
Method Detail
-
dotAll
public Regex dotAll()
-
setDotAll
public Regex setDotAll(boolean dotAll)
-
isDotAll
public boolean isDotAll()
-
ignoreCase
public Regex ignoreCase()
-
setIgnoreCase
public Regex setIgnoreCase(boolean ignoreCase)
-
isIgnoreCase
public boolean isIgnoreCase()
-
unixLines
public Regex unixLines()
-
setUnixLines
public Regex setUnixLines(boolean unixLines)
-
isUnixLines
public boolean isUnixLines()
-
literal
public Regex literal()
-
setLiteral
public Regex setLiteral(boolean literal)
-
isLiteral
public boolean isLiteral()
-
comments
public Regex comments()
-
setComments
public Regex setComments(boolean comments)
-
isComments
public boolean isComments()
-
multiline
public Regex multiline()
-
setMultiline
public Regex setMultiline(boolean multiline)
-
isMultiline
public boolean isMultiline()
-
canonEq
public Regex canonEq()
-
setCanonEq
public Regex setCanonEq(boolean canonEq)
-
isCanonEq
public boolean isCanonEq()
-
unicodeCase
public Regex unicodeCase()
-
setUnicodeCase
public Regex setUnicodeCase(boolean unicode)
-
isUnicodeCase
public boolean isUnicodeCase()
-
unicodeCharacterClass
public Regex unicodeCharacterClass()
-
setUnicodeCharacterClass
public Regex setUnicodeCharacterClass(boolean unicode)
-
isUnicodeCharacterClass
public boolean isUnicodeCharacterClass()
-
ignoreDiacritic
public Regex ignoreDiacritic()
Ignores diacritical marks when matching or replacing (e.g. accents).- Returns:
- this instance
-
setIgnoreDiacritic
public Regex setIgnoreDiacritic(boolean ignoreDiacritic)
-
isIgnoreDiacritic
public boolean isIgnoreDiacritic()
-
isMatchEmpty
public boolean isMatchEmpty()
Gets whethernull
or empty strings should be considered a positive match.- Returns:
true
ifnull
and empty strings are considered a match- Since:
- 3.0.0
-
setMatchEmpty
public Regex setMatchEmpty(boolean matchEmpty)
Sets whethernull
or empty strings should be considered a positive match. To also consider blank values as positive matches, usesetTrim(boolean)
.- Parameters:
matchEmpty
-true
to havenull
and empty strings are considered a match.- Returns:
- this instance
- Since:
- 3.0.0
-
matchEmpty
public Regex matchEmpty()
Sets thatnull
or empty strings should be considered a positive match. Same as invokingsetMatchEmpty(boolean)
withtrue
.- Returns:
- this instance
- Since:
- 3.0.0
-
isTrim
public boolean isTrim()
Gets whether values should be trimmed before being evaluated (as perString.trim()
).- Returns:
true
if values are trimmed before evaluation- Since:
- 3.0.0
-
setTrim
public Regex setTrim(boolean trim)
Sets whether values should be trimmed before being evaluated (as perString.trim()
).- Parameters:
trim
-true
to trim values before evaluation- Returns:
- this instance
- Since:
- 3.0.0
-
trim
public Regex trim()
Sets that values should be trimmed before being evaluated (as perString.trim()
). Same as invokingsetTrim(boolean)
withtrue
.- Returns:
- this instance
- Since:
- 3.0.0
-
setFlags
public Regex setFlags(int... flags)
-
getPattern
public String getPattern()
-
compile
public Pattern compile()
Compiles a previously set pattern.
For text-matching with diacritical mark insensitivity support enabled, or for
trim
andmatchEmpty
support, usematcher(CharSequence)
instead.- Returns:
- compiled pattern
-
compile
public Pattern compile(String pattern)
Compiles the given pattern without assigning it to this object.
For text-matching with diacritical mark insensitivity support enabled, or for
trim
andmatchEmpty
support, usematcher(String, CharSequence)
instead.- Parameters:
pattern
- the pattern to compile- Returns:
- compiled pattern
- Throws:
IllegalArgumentException
- if pattern isnull
-
compileDotAll
public static Pattern compileDotAll(String regex, boolean ignoreCase)
Compiles a "dotall" pattern (dots match all, including new lines) with optional case sensitivity.- Parameters:
regex
- regular expressionignoreCase
-true
to ignore character case.- Returns:
- compiled pattern
-
escape
public static String escape(String pattern)
Escape special characters with a backslash (\) in a regular expression. This is an alternative toPattern.quote(String)
for when you do not want the string to be treated as a literal.- Parameters:
pattern
- the pattern to escape- Returns:
- escaped pattern
-
matcher
public Matcher matcher(CharSequence text)
Matches the previously set pattern against the given text.- Parameters:
text
- the text to match- Returns:
- matcher
-
matcher
public Matcher matcher(String pattern, CharSequence text)
Matches the the given pattern against the given text without assigning the pattern to this object. Since 3.0.0,null
or empty text will generate no match unlessisMatchEmpty()
istrue
, in which case it will match positively.- Parameters:
pattern
- the pattern to matchtext
- the text to match- Returns:
- matcher
-
createKeyValueExtractor
public RegexFieldValueExtractor createKeyValueExtractor()
-
createKeyValueExtractor
public RegexFieldValueExtractor createKeyValueExtractor(String key)
-
createKeyValueExtractor
public RegexFieldValueExtractor createKeyValueExtractor(String key, int valueGroup)
-
createKeyValueExtractor
public RegexFieldValueExtractor createKeyValueExtractor(int keyGroup, int valueGroup)
-
loadFromXML
public void loadFromXML(XML xml)
Description copied from interface:IXMLConfigurable
Load XML configuration values and initialized this object with them.- Specified by:
loadFromXML
in interfaceIXMLConfigurable
- Parameters:
xml
- the XML to load into this object
-
saveToXML
public void saveToXML(XML xml)
Description copied from interface:IXMLConfigurable
Saves this object as XML.- Specified by:
saveToXML
in interfaceIXMLConfigurable
- Parameters:
xml
- the XML that will representing this object
-
canEqual
protected boolean canEqual(Object other)
-
-