public class SegmentCountURLFilter extends AbstractOnMatchFilter implements IReferenceFilter, IDocumentFilter, IMetadataFilter, IXMLConfigurable
Filters URL based based on the number of URL segments. A URL with a number of segments equal or more than the specified count will either be included or excluded, as specified.
By default segments are obtained by breaking the URL text at each forward slashes (/), starting after the host name. You can define different or additional segment separator characters.
When duplicate
is true
, it will count the maximum
number of duplicate segments found.
<filter class="com.norconex.collector.http.filter.impl.SegmentCountURLFilter" onMatch="[include|exclude]" count="(numeric value)" duplicate="[false|true]" separator="(a regex identifying segment separator)" />
The following will reject URLs with more than 5 forward slashes after the domain.
<filter class="com.norconex.collector.http.filter.impl.SegmentCountURLFilter" onMatch="exclude" count="5" />
Pattern
Modifier and Type | Field and Description |
---|---|
static int |
DEFAULT_SEGMENT_COUNT
Default segment count.
|
static String |
DEFAULT_SEGMENT_SEPARATOR_PATTERN
Default segment separator pattern.
|
Constructor and Description |
---|
SegmentCountURLFilter()
Constructor.
|
SegmentCountURLFilter(int count)
Constructor.
|
SegmentCountURLFilter(int count,
OnMatch onMatch)
Constructor.
|
SegmentCountURLFilter(int count,
OnMatch onMatch,
boolean duplicate)
Constructor.
|
Modifier and Type | Method and Description |
---|---|
boolean |
acceptDocument(ImporterDocument document) |
boolean |
acceptMetadata(String reference,
Properties metadata) |
boolean |
acceptReference(String url) |
boolean |
equals(Object obj) |
int |
getCount() |
String |
getSeparator()
Gets the segment separator pattern
|
int |
hashCode() |
boolean |
isDuplicate() |
void |
loadFromXML(Reader in) |
void |
saveToXML(Writer out) |
void |
setCount(int count) |
void |
setDuplicate(boolean duplicate) |
void |
setSeparator(String separator) |
String |
toString() |
getOnMatch, loadFromXML, saveToXML, setOnMatch
public static final String DEFAULT_SEGMENT_SEPARATOR_PATTERN
public static final int DEFAULT_SEGMENT_COUNT
public SegmentCountURLFilter()
public SegmentCountURLFilter(int count)
count
- how many segmentpublic SegmentCountURLFilter(int count, OnMatch onMatch)
count
- how many segmentonMatch
- what to do on matchpublic SegmentCountURLFilter(int count, OnMatch onMatch, boolean duplicate)
count
- how many segmentonMatch
- what to do on matchduplicate
- whether to handle duplicatespublic String getSeparator()
public final void setSeparator(String separator)
public int getCount()
public final void setCount(int count)
public boolean isDuplicate()
public final void setDuplicate(boolean duplicate)
public boolean acceptDocument(ImporterDocument document)
acceptDocument
in interface IDocumentFilter
public boolean acceptMetadata(String reference, Properties metadata)
acceptMetadata
in interface IMetadataFilter
public boolean acceptReference(String url)
acceptReference
in interface IReferenceFilter
public void loadFromXML(Reader in)
loadFromXML
in interface IXMLConfigurable
public void saveToXML(Writer out) throws IOException
saveToXML
in interface IXMLConfigurable
IOException
public String toString()
toString
in class AbstractOnMatchFilter
public int hashCode()
hashCode
in class AbstractOnMatchFilter
public boolean equals(Object obj)
equals
in class AbstractOnMatchFilter
Copyright © 2009–2021 Norconex Inc.. All rights reserved.