public class ReferenceDelayResolver extends AbstractDelayResolver
Introduces different delays between document downloads based on matching document reference (URL) patterns. There are a few ways the actual delay value can be defined (in order):
One of these following scope dictates how the delay is applied, listed in order from the best behaved to the least.
As of 2.7.0, XML configuration entries expecting millisecond durations
can be provided in human-readable format (English only), as per
DurationParser
(e.g., "5 minutes and 30 seconds" or "5m30s").
<delay
class="com.norconex.collector.http.delay.impl.ReferenceDelayResolver"
default="(milliseconds)"
ignoreRobotsCrawlDelay="[false|true]"
scope="[crawler|site|thread]">
<pattern
delay="(delay in milliseconds)">
(regular expression applied against document reference)
</pattern>
(... repeat pattern tag as needed ...)
</delay>
<pre>
<delay
class="ReferenceDelayResolver"
default="3 seconds">
<pattern
delay="10 seconds">
.*\.pdf
</pattern>
</delay>
The above examlpe will increase the delay to 10 seconds when encountering PDFs from a default of 3 seconds.
Modifier and Type | Class and Description |
---|---|
static class |
ReferenceDelayResolver.DelayReferencePattern |
DEFAULT_DELAY, SCOPE_CRAWLER, SCOPE_SITE, SCOPE_THREAD
Constructor and Description |
---|
ReferenceDelayResolver() |
Modifier and Type | Method and Description |
---|---|
boolean |
equals(Object other) |
List<ReferenceDelayResolver.DelayReferencePattern> |
getDelayReferencePatterns() |
int |
hashCode() |
protected void |
loadDelaysFromXML(XML xml)
Loads explicit configuration of delays form XML.
|
protected long |
resolveExplicitDelay(String url)
Resolves explicitly specified delay, in milliseconds.
|
protected void |
saveDelaysToXML(XML xml)
Saves explicit configuration of delays to XML.
|
void |
setDelayReferencePatterns(List<ReferenceDelayResolver.DelayReferencePattern> delayPatterns) |
String |
toString() |
delay, getDefaultDelay, getScope, isIgnoreRobotsCrawlDelay, loadFromXML, saveToXML, setDefaultDelay, setIgnoreRobotsCrawlDelay, setScope
public List<ReferenceDelayResolver.DelayReferencePattern> getDelayReferencePatterns()
public void setDelayReferencePatterns(List<ReferenceDelayResolver.DelayReferencePattern> delayPatterns)
protected long resolveExplicitDelay(String url)
AbstractDelayResolver
resolveExplicitDelay
in class AbstractDelayResolver
url
- URL for which to resolve delayprotected void loadDelaysFromXML(XML xml)
AbstractDelayResolver
loadDelaysFromXML
in class AbstractDelayResolver
xml
- configurationprotected void saveDelaysToXML(XML xml)
AbstractDelayResolver
saveDelaysToXML
in class AbstractDelayResolver
xml
- XMLpublic boolean equals(Object other)
equals
in class AbstractDelayResolver
public int hashCode()
hashCode
in class AbstractDelayResolver
public String toString()
toString
in class AbstractDelayResolver
Copyright © 2009–2023 Norconex Inc.. All rights reserved.