public class SegmentCountURLFilter extends Object implements IOnMatchFilter, IReferenceFilter, IDocumentFilter, IMetadataFilter, IXMLConfigurable
Filters URL based based on the number of URL segments. A URL with a number of segments equal or more than the specified count will either be included or excluded, as specified.
By default segments are obtained by breaking the URL text at each forward slashes (/), starting after the host name. You can define different or additional segment separator characters.
When duplicate
is true
, it will count the maximum
number of duplicate segments found.
<filter
class="com.norconex.collector.http.filter.impl.SegmentCountURLFilter"
onMatch="[include|exclude]"
count="(numeric value)"
duplicate="[false|true]"
separator="(a regex identifying segment separator)"/>
<filter
class="SegmentCountURLFilter"
onMatch="exclude"
count="5"/>
The above example will reject URLs with more than 5 forward slashes after the domain.
Pattern
Modifier and Type | Field and Description |
---|---|
static int |
DEFAULT_SEGMENT_COUNT
Default segment count.
|
static String |
DEFAULT_SEGMENT_SEPARATOR_PATTERN
Default segment separator pattern.
|
Constructor and Description |
---|
SegmentCountURLFilter()
Constructor.
|
SegmentCountURLFilter(int count)
Constructor.
|
SegmentCountURLFilter(int count,
OnMatch onMatch)
Constructor.
|
SegmentCountURLFilter(int count,
OnMatch onMatch,
boolean duplicate)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
boolean |
acceptDocument(Doc document) |
boolean |
acceptMetadata(String reference,
Properties metadata) |
boolean |
acceptReference(String url) |
boolean |
equals(Object other) |
int |
getCount() |
OnMatch |
getOnMatch() |
String |
getSeparator()
Gets the segment separator pattern
|
int |
hashCode() |
boolean |
isDuplicate() |
void |
loadFromXML(XML xml) |
void |
saveToXML(XML xml) |
void |
setCount(int count) |
void |
setDuplicate(boolean duplicate) |
void |
setOnMatch(OnMatch onMatch) |
void |
setSeparator(String separator) |
String |
toString() |
public static final String DEFAULT_SEGMENT_SEPARATOR_PATTERN
public static final int DEFAULT_SEGMENT_COUNT
public SegmentCountURLFilter()
public SegmentCountURLFilter(int count)
count
- how many segmentpublic SegmentCountURLFilter(int count, OnMatch onMatch)
count
- how many segmentonMatch
- what to do on matchpublic SegmentCountURLFilter(int count, OnMatch onMatch, boolean duplicate)
count
- how many segmentonMatch
- what to do on matchduplicate
- whether to handle duplicatespublic String getSeparator()
public final void setSeparator(String separator)
public int getCount()
public final void setCount(int count)
public boolean isDuplicate()
public final void setDuplicate(boolean duplicate)
public OnMatch getOnMatch()
getOnMatch
in interface IOnMatchFilter
public void setOnMatch(OnMatch onMatch)
public boolean acceptDocument(Doc document)
acceptDocument
in interface IDocumentFilter
public boolean acceptMetadata(String reference, Properties metadata)
acceptMetadata
in interface IMetadataFilter
public boolean acceptReference(String url)
acceptReference
in interface IReferenceFilter
public void loadFromXML(XML xml)
loadFromXML
in interface IXMLConfigurable
public void saveToXML(XML xml)
saveToXML
in interface IXMLConfigurable
Copyright © 2009–2023 Norconex Inc.. All rights reserved.