Class PhantomJSDocumentFetcher
- java.lang.Object
-
- com.norconex.collector.http.fetch.AbstractHttpFetcher
-
- com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- All Implemented Interfaces:
IHttpFetcher
,IEventListener<Event>
,IXMLConfigurable
,EventListener
,Consumer<Event>
@Deprecated public class PhantomJSDocumentFetcher extends AbstractHttpFetcher
Deprecated.Since 3.0.0 useWebDriverHttpFetcher
Deprecation notice
PhantomJS headless browser is no longer maintained by its owner. As such, starting with version 3.0.0, use of PhantomJSDocumentFetcher is strongly discouraged and HttpClientProxy support for it has been dropped. With more popular browsers (e.g. Chrome) now supporting operating in headless mode, we now have more stable options. Please consider using
WebDriverHttpFetcher
instead when attempting to crawl a JavaScript-driven website.
An alternative to the
GenericHttpFetcher
which relies on an external PhantomJS installation to fetch web pages. While less efficient, this implementation is meant to provide some way to crawl sites making heavy use of JavaScript to render their pages. This class tells the PhantomJS headless browser to wait a certain amount of time for the page to load extra content via Ajax requests before grabbing all loaded HTML.Considerations
Relying on an external software to fetch pages is slower and not as scalable and may be less stable. The use of
GenericHttpFetcher
should be preferred whenever possible. Use at your own risk. Use PhantomJS 2.1 (or possibly higher).Handling of non-HTML Pages
It is usually only useful to use PhantomJS for HTML pages with JavaScript. Other types of documents are fetched using an instance of
GenericHttpFetcher
To find out if we are dealing with an HTML documents, this fetcher needs to know the content type first. By default, the content type of a document is not known before a physical copy is obtained. This means PhantomJS has to first download the document and if it is not an HTML document at that point, it will be re-downloaded again with the generic document fetcher. By default, these content-types are considered HTML:text/html, application/xhtml+xml, application/vnd.wap.xhtml+xml, application/x-asp
Those can be overwritten with
setContentTypePattern(String)
.Avoid double-downloads
To avoid downloading the document twice as described above, you can configure a metadata fetcher (such as
GenericHttpFetcher
). This will attempt get the content type by first making an HTTP HEAD request.Alternatively, if you have a URL pattern that identifies your HTML pages (and only HTML pages), you can specify it using
setReferencePattern(String)
. Only URLs matching the provided regular expression will be fetched by PhantomJS. By default there is no pattern for discriminating on URL references.Taking screenshots of pages
Thanks to PhantomJS, one can save images of pages being crawled, including those rendered with JavaScript!
Since 2.8.0, you have to explicitely enabled screenshots with
setScreenshotEnabled(boolean)
. Also screenshots now share the same size by default. In addition, you can now control how screenshots are resized and how they are stored stored. Storage options:-
inline: Stores a Base64 string of the scaled image, in the format
specified, in a
collector.featured-image-inline
field. The string is ready to be used inline, in a <img src="..."> tag. -
disk: Stores the scaled image on the file system, in the format
and directory specified. A reference to the file on disk is stored
in a
collector.featured-image-path
field.
Since 2.8.0, it is possible to specify a resource timeout so that slow individual page resources do not cause PhantomJS to hang for a long time.
PhantomJS exit values
Since 2.9.1, it is possible to specify which PhantomJS exit values are to be considered "valid". Use a comma-separated-list of integers using the
setValidExitCodes(int...)
method. By default, only zero is considered valid.XML configuration entries expecting millisecond durations can be provided in human-readable format (English only), as per
DurationParser
(e.g., "5 minutes and 30 seconds" or "5m30s").XML configuration usage:
<documentFetcher class="com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher" detectContentType="[false|true]" detectCharset="[false|true]" screenshotEnabled="[false|true]"> <exePath>(path to PhantomJS executable)</exePath> <scriptPath> (Optional path to a PhantomJS script. Defaults to scripts/phantom.js) </scriptPath> <renderWaitTime> (Milliseconds to wait for the entire page to load. Defaults to 3000, i.e., 3 seconds.) </renderWaitTime> <resourceTimeout> (Optional Milliseconds to wait for a page resource to load. Defaults is unspecified.) </resourceTimeout> <options> <opt>(optional extra PhantomJS command-line option)</opt> <!-- You can have multiple opt tags --> </options> <referencePattern> (Regular expression matching URLs for which to use the PhantomJS browser. Non-matching URLs will fallback to using GenericDocumentFetcher.) </referencePattern> <contentTypePattern> (Regular expression matching content types for which to use the PhantomJS browser. Non-matching content types will use the GenericDocumentFetcher.) </contentTypePattern> <validExitCodes>(defaults to 0)</validExitCodes> <validStatusCodes>(defaults to 200)</validStatusCodes> <notFoundStatusCodes>(defaults to 404)</notFoundStatusCodes> <headersPrefix>(string to prefix headers)</headersPrefix> <!-- Only applicable when screenshotEnabled is true: --> <screenshotDimensions> (Pixel size of the browser page area to capture: [width]x[height]. E.g., 800x600. Only used when a screenshot path is specified. Default is undefined. It will try to load all it can and may produce vertically long images.) </screenshotDimensions> <screenshotZoomFactor> (A decimal value to scale the screenshot image. E.g., 0.25 will make the image 25% its regular size, which is 25% of the above dimension if specified. Default is 1, i.e., 100%) </screenshotZoomFactor> <screenshotScaleDimensions> (Target pixel size the main image should be scaled to. Default is 300.) </screenshotScaleDimensions> <screenshotScaleStretch> [false|true] (Whether to stretch to match scale size. Default keeps aspect ratio.) </screenshotScaleStretch> <screenshotScaleQuality> [auto|low|medium|high|max] (Default is "auto", which tries the best balance between quality and speed based on image size. The lower the quality the faster it is to scale images.) </screenshotScaleQuality> <screenshotImageFormat> (Target format of stored image. E.g., "jpg", "png", "gif", "bmp", ... Default is "png") </screenshotImageFormat> <screenshotStorage> [disk|inline] (One or both, comma-separated. Default is "disk".) </screenshotStorage> <!-- Only applicable for "disk" storage: --> <screenshotStorageDiskDir structure="[url2path|date|datetime]"> (Path where to save screenshots.) </screenshotStorageDiskDir> <screenshotStorageDiskField> (Overwrite default field where to store the screenshot path.) </screenshotStorageDiskField> <!-- Only applicable for "inline" storage: --> <screenshotStorageInlineField> (Overwrite default field where to store the inline screenshot.) </screenshotStorageInlineField> </documentFetcher>
When specifying an image size, the format is
[width]x[height]
or a single value. When a single value is used, that value represents both the width and height (i.e., a square).The "validStatusCodes" and "notFoundStatusCodes" elements expect a coma-separated list of HTTP response code. If a code is added in both elements, the valid list takes precedence.
Usage example:
The following configures HTTP Collector to use PhantomJS with a proxy to use HttpClient, only for URLs ending with ".html".
<httpcollector id="MyHttpCollector"> ... <crawlers> <crawler id="MyCrawler"> ... <documentFetcher class="com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher"> <exePath>/path/to/phantomjs.exe</exePath> <renderWaitTime>5000</renderWaitTime> <referencePattern>^.*\.html$</referencePattern> </documentFetcher> ... </crawler> </crawlers> ... <!-- Only if you need to use the HttpClient proxy (see documentation): --> <collectorListeners> <listener class="com.norconex.collector.http.fetch.impl.HttpClientProxyCollectorListener" /> </collectorListeners> </httpcollector>
- Since:
- 2.7.0
- Author:
- Pascal Essiembre
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
PhantomJSDocumentFetcher.Quality
Deprecated.static class
PhantomJSDocumentFetcher.Storage
Deprecated.static class
PhantomJSDocumentFetcher.StorageDiskStructure
Deprecated.
-
Field Summary
Fields Modifier and Type Field Description static String
COLLECTOR_PHANTOMJS_SCREENSHOT_INLINE
Deprecated.static String
COLLECTOR_PHANTOMJS_SCREENSHOT_PATH
Deprecated.static String
DEFAULT_CONTENT_TYPE_PATTERN
Deprecated.static int
DEFAULT_RENDER_WAIT_TIME
Deprecated.static String
DEFAULT_SCREENSHOT_IMAGE_FORMAT
Deprecated.static Dimension
DEFAULT_SCREENSHOT_SCALE_SIZE
Deprecated.static PhantomJSDocumentFetcher.Storage
DEFAULT_SCREENSHOT_STORAGE
Deprecated.static String
DEFAULT_SCREENSHOT_STORAGE_DISK_DIR
Deprecated.static float
DEFAULT_SCREENSHOT_ZOOM_FACTOR
Deprecated.static String
DEFAULT_SCRIPT_PATH
Deprecated.
-
Constructor Summary
Constructors Constructor Description PhantomJSDocumentFetcher()
Deprecated.PhantomJSDocumentFetcher(int[] validStatusCodes)
Deprecated.
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description protected boolean
accept(HttpMethod httpMethod)
Deprecated.Whether the supplied HttpMethod is supported by this fetcher.boolean
accept(Doc doc, HttpMethod httpMethod)
Deprecated.boolean
equals(Object other)
Deprecated.IHttpFetchResponse
fetch(CrawlDoc doc, HttpMethod httpMethod)
Deprecated.Performs an HTTP request for the supplied document reference and HTTP method.String
getContentTypePattern()
Deprecated.String
getExePath()
Deprecated.String
getHeadersPrefix()
Deprecated.List<Integer>
getNotFoundStatusCodes()
Deprecated.Gets HTTP status codes to be considered as "Not found" state.List<String>
getOptions()
Deprecated.String
getReferencePattern()
Deprecated.int
getRenderWaitTime()
Deprecated.int
getResourceTimeout()
Deprecated.Gets the milliseconds timeout after which any resource requested will stop trying and proceed with other parts of the page.Dimension
getScreenshotDimensions()
Deprecated.String
getScreenshotImageFormat()
Deprecated.Gets the screenshot image format (jpg, png, gif, bmp, etc.).Dimension
getScreenshotScaleDimensions()
Deprecated.Gets the pixel dimensions we want the stored screenshot to have.PhantomJSDocumentFetcher.Quality
getScreenshotScaleQuality()
Deprecated.Gets the screenshot scaling quality to use when when storage is "disk" or "inline".List<PhantomJSDocumentFetcher.Storage>
getScreenshotStorage()
Deprecated.Gets the screenshot storage mechanisms.String
getScreenshotStorageDiskDir()
Deprecated.Gets the directory where screenshots are saved when storage is "disk".String
getScreenshotStorageDiskField()
Deprecated.Gets the target document metadata field where to store the path to thescreen shot image file when storage is "disk".PhantomJSDocumentFetcher.StorageDiskStructure
getScreenshotStorageDiskStructure()
Deprecated.Gets the screenshot directory structure to create when storage is "disk".String
getScreenshotStorageInlineField()
Deprecated.Gets the target document metadata field where to store the inline (Base64) screenshot image when storage is "inline".float
getScreenshotZoomFactor()
Deprecated.String
getScriptPath()
Deprecated.String
getUserAgent()
Deprecated.List<Integer>
getValidExitCodes()
Deprecated.Sets valid PhantomJS exit values (defaults to 0).List<Integer>
getValidStatusCodes()
Deprecated.int
hashCode()
Deprecated.boolean
isDetectCharset()
Deprecated.boolean
isDetectContentType()
Deprecated.boolean
isScreenshotEnabled()
Deprecated.Gets whether to enable taking screenshot of crawled web pages.boolean
isScreenshotScaleStretch()
Deprecated.Gets whether the screenshot should be stretch to to fill all the scale dimensions.protected void
loadHttpFetcherFromXML(XML xml)
Deprecated.protected void
saveHttpFetcherToXML(XML xml)
Deprecated.void
setContentTypePattern(String contentTypePattern)
Deprecated.void
setDetectCharset(boolean detectCharset)
Deprecated.void
setDetectContentType(boolean detectContentType)
Deprecated.void
setExePath(String exePath)
Deprecated.void
setHeadersPrefix(String headersPrefix)
Deprecated.void
setNotFoundStatusCodes(int... notFoundStatusCodes)
Deprecated.Sets HTTP status codes to be considered as "Not found" state.void
setNotFoundStatusCodes(List<Integer> notFoundStatusCodes)
Deprecated.Sets HTTP status codes to be considered as "Not found" state.void
setOptions(String... options)
Deprecated.Sets optional extra PhantomJS command-line options.void
setOptions(List<String> options)
Deprecated.Sets optional extra PhantomJS command-line options.void
setReferencePattern(String referencePattern)
Deprecated.void
setRenderWaitTime(int renderWaitTime)
Deprecated.void
setResourceTimeout(int resourceTimeout)
Deprecated.Sets the milliseconds timeout after which any resource requested will stop trying and proceed with other parts of the page.void
setScreenshotDimensions(int width, int height)
Deprecated.void
setScreenshotDimensions(Dimension screenshotDimensions)
Deprecated.void
setScreenshotEnabled(boolean screenshotEnabled)
Deprecated.Sets whether to enable taking screenshot of crawled web pages.void
setScreenshotImageFormat(String screenshotImageFormat)
Deprecated.Sets the screenshot image format (jpg, png, gif, bmp, etc.).void
setScreenshotScaleDimensions(int width, int height)
Deprecated.Sets the pixel dimensions we want the stored screenshot to have.void
setScreenshotScaleDimensions(Dimension screenshotScaleDimensions)
Deprecated.Sets the pixel dimensions we want the stored screenshot to have.void
setScreenshotScaleQuality(PhantomJSDocumentFetcher.Quality screenshotScaleQuality)
Deprecated.Sets the screenshot scaling quality to use when when storage is "disk" or "inline".void
setScreenshotScaleStretch(boolean screenshotScaleStretch)
Deprecated.Sets whether the screenshot should be stretch to to fill all the scale dimensions.void
setScreenshotStorage(PhantomJSDocumentFetcher.Storage... screenshotStorage)
Deprecated.Sets the screenshot storage mechanisms.void
setScreenshotStorage(List<PhantomJSDocumentFetcher.Storage> screenshotStorage)
Deprecated.Sets the screenshot storage mechanisms.void
setScreenshotStorageDiskDir(String screenshotStorageDiskDir)
Deprecated.Sets the directory where screenshots are saved when storage is "disk".void
setScreenshotStorageDiskField(String screenshotStorageDiskField)
Deprecated.Sets the target document metadata field where to store the path to thescreen shot image file when storage is "disk".void
setScreenshotStorageDiskStructure(PhantomJSDocumentFetcher.StorageDiskStructure screenshotStorageDiskStructure)
Deprecated.Sets the screenshot directory structure to create when storage is "disk".void
setScreenshotStorageInlineField(String screenshotStorageInlineField)
Deprecated.Sets the target document metadata field where to store the inline (Base64) screenshot image when storage is "inline".void
setScreenshotZoomFactor(float screenshotZoomFactor)
Deprecated.void
setScriptPath(String scriptPath)
Deprecated.void
setValidExitCodes(int... validExitCodes)
Deprecated.Sets valid PhantomJS exit values (defaults to 0).void
setValidExitCodes(List<Integer> validExitCodes)
Deprecated.Sets valid PhantomJS exit values (defaults to 0).void
setValidStatusCodes(int... validStatusCodes)
Deprecated.Gets valid HTTP response status codes.void
setValidStatusCodes(List<Integer> validStatusCodes)
Deprecated.Gets valid HTTP response status codes.String
toString()
Deprecated.-
Methods inherited from class com.norconex.collector.http.fetch.AbstractHttpFetcher
accept, fetcherShutdown, fetcherStartup, fetcherThreadBegin, fetcherThreadEnd, getReferenceFilters, loadFromXML, saveToXML, setReferenceFilters, setReferenceFilters
-
-
-
-
Field Detail
-
DEFAULT_SCRIPT_PATH
public static final String DEFAULT_SCRIPT_PATH
Deprecated.- See Also:
- Constant Field Values
-
DEFAULT_RENDER_WAIT_TIME
public static final int DEFAULT_RENDER_WAIT_TIME
Deprecated.- See Also:
- Constant Field Values
-
DEFAULT_SCREENSHOT_ZOOM_FACTOR
public static final float DEFAULT_SCREENSHOT_ZOOM_FACTOR
Deprecated.- See Also:
- Constant Field Values
-
DEFAULT_CONTENT_TYPE_PATTERN
public static final String DEFAULT_CONTENT_TYPE_PATTERN
Deprecated.- See Also:
- Constant Field Values
-
DEFAULT_SCREENSHOT_STORAGE_DISK_DIR
public static final String DEFAULT_SCREENSHOT_STORAGE_DISK_DIR
Deprecated.- See Also:
- Constant Field Values
-
DEFAULT_SCREENSHOT_STORAGE
public static final PhantomJSDocumentFetcher.Storage DEFAULT_SCREENSHOT_STORAGE
Deprecated.
-
DEFAULT_SCREENSHOT_IMAGE_FORMAT
public static final String DEFAULT_SCREENSHOT_IMAGE_FORMAT
Deprecated.- See Also:
- Constant Field Values
-
DEFAULT_SCREENSHOT_SCALE_SIZE
public static final Dimension DEFAULT_SCREENSHOT_SCALE_SIZE
Deprecated.
-
COLLECTOR_PHANTOMJS_SCREENSHOT_PATH
public static final String COLLECTOR_PHANTOMJS_SCREENSHOT_PATH
Deprecated.- See Also:
- Constant Field Values
-
COLLECTOR_PHANTOMJS_SCREENSHOT_INLINE
public static final String COLLECTOR_PHANTOMJS_SCREENSHOT_INLINE
Deprecated.- See Also:
- Constant Field Values
-
-
Method Detail
-
getExePath
public String getExePath()
Deprecated.
-
setExePath
public void setExePath(String exePath)
Deprecated.
-
getScriptPath
public String getScriptPath()
Deprecated.
-
setScriptPath
public void setScriptPath(String scriptPath)
Deprecated.
-
getRenderWaitTime
public int getRenderWaitTime()
Deprecated.
-
setRenderWaitTime
public void setRenderWaitTime(int renderWaitTime)
Deprecated.
-
setOptions
public void setOptions(List<String> options)
Deprecated.Sets optional extra PhantomJS command-line options.- Parameters:
options
- extra command line arguments- Since:
- 3.0.0
-
setOptions
public void setOptions(String... options)
Deprecated.Sets optional extra PhantomJS command-line options.- Parameters:
options
- extra command line arguments
-
getScreenshotStorageDiskDir
public String getScreenshotStorageDiskDir()
Deprecated.Gets the directory where screenshots are saved when storage is "disk". Default is "./screenshots".- Returns:
- directory
- Since:
- 2.8.0
-
setScreenshotStorageDiskDir
public void setScreenshotStorageDiskDir(String screenshotStorageDiskDir)
Deprecated.Sets the directory where screenshots are saved when storage is "disk". Use this method to overwrite the default ("./screenshots").- Parameters:
screenshotStorageDiskDir
- directory- Since:
- 2.8.0
-
getScreenshotStorageDiskField
public String getScreenshotStorageDiskField()
Deprecated.Gets the target document metadata field where to store the path to thescreen shot image file when storage is "disk". Default is "collector.phantomjs-screenshot-path".- Returns:
- field name
- Since:
- 2.8.0
-
setScreenshotStorageDiskField
public void setScreenshotStorageDiskField(String screenshotStorageDiskField)
Deprecated.Sets the target document metadata field where to store the path to thescreen shot image file when storage is "disk". Use this method to overwrite the default ("collector.phantomjs-screenshot-path").- Parameters:
screenshotStorageDiskField
- field name- Since:
- 2.8.0
-
getScreenshotStorageInlineField
public String getScreenshotStorageInlineField()
Deprecated.Gets the target document metadata field where to store the inline (Base64) screenshot image when storage is "inline". Default is "collector.phantomjs-screenshot-inline".- Returns:
- field name
- Since:
- 2.8.0
-
setScreenshotStorageInlineField
public void setScreenshotStorageInlineField(String screenshotStorageInlineField)
Deprecated.Sets the target document metadata field where to store the inline (Base64) screenshot image when storage is "inline". Use this method to overwrite the default ("collector.phantomjs-screenshot-inline").- Parameters:
screenshotStorageInlineField
- field name- Since:
- 2.8.0
-
isScreenshotEnabled
public boolean isScreenshotEnabled()
Deprecated.Gets whether to enable taking screenshot of crawled web pages.- Returns:
true
if enabled- Since:
- 2.8.0
-
setScreenshotEnabled
public void setScreenshotEnabled(boolean screenshotEnabled)
Deprecated.Sets whether to enable taking screenshot of crawled web pages.- Parameters:
screenshotEnabled
-true
if enabled- Since:
- 2.8.0
-
getScreenshotDimensions
public Dimension getScreenshotDimensions()
Deprecated.
-
setScreenshotDimensions
public void setScreenshotDimensions(int width, int height)
Deprecated.
-
setScreenshotDimensions
public void setScreenshotDimensions(Dimension screenshotDimensions)
Deprecated.
-
getScreenshotZoomFactor
public float getScreenshotZoomFactor()
Deprecated.
-
setScreenshotZoomFactor
public void setScreenshotZoomFactor(float screenshotZoomFactor)
Deprecated.
-
getValidExitCodes
public List<Integer> getValidExitCodes()
Deprecated.Sets valid PhantomJS exit values (defaults to 0).- Returns:
- valid exit codes
- Since:
- 2.9.1
-
setValidExitCodes
public void setValidExitCodes(List<Integer> validExitCodes)
Deprecated.Sets valid PhantomJS exit values (defaults to 0).- Parameters:
validExitCodes
- valid exit codes- Since:
- 2.9.1
-
setValidExitCodes
public void setValidExitCodes(int... validExitCodes)
Deprecated.Sets valid PhantomJS exit values (defaults to 0).- Parameters:
validExitCodes
- valid exit codes- Since:
- 2.9.1
-
setValidStatusCodes
public void setValidStatusCodes(List<Integer> validStatusCodes)
Deprecated.Gets valid HTTP response status codes.- Parameters:
validStatusCodes
- valid status codes- Since:
- 3.0.0
-
setValidStatusCodes
public void setValidStatusCodes(int... validStatusCodes)
Deprecated.Gets valid HTTP response status codes.- Parameters:
validStatusCodes
- valid status codes
-
getNotFoundStatusCodes
public List<Integer> getNotFoundStatusCodes()
Deprecated.Gets HTTP status codes to be considered as "Not found" state. Default is 404.- Returns:
- "Not found" codes
-
setNotFoundStatusCodes
public final void setNotFoundStatusCodes(int... notFoundStatusCodes)
Deprecated.Sets HTTP status codes to be considered as "Not found" state.- Parameters:
notFoundStatusCodes
- "Not found" codes
-
setNotFoundStatusCodes
public final void setNotFoundStatusCodes(List<Integer> notFoundStatusCodes)
Deprecated.Sets HTTP status codes to be considered as "Not found" state.- Parameters:
notFoundStatusCodes
- "Not found" codes- Since:
- 3.0.0
-
getHeadersPrefix
public String getHeadersPrefix()
Deprecated.
-
setHeadersPrefix
public void setHeadersPrefix(String headersPrefix)
Deprecated.
-
isDetectContentType
public boolean isDetectContentType()
Deprecated.
-
setDetectContentType
public void setDetectContentType(boolean detectContentType)
Deprecated.
-
isDetectCharset
public boolean isDetectCharset()
Deprecated.
-
setDetectCharset
public void setDetectCharset(boolean detectCharset)
Deprecated.
-
getContentTypePattern
public String getContentTypePattern()
Deprecated.
-
setContentTypePattern
public void setContentTypePattern(String contentTypePattern)
Deprecated.
-
getReferencePattern
public String getReferencePattern()
Deprecated.
-
setReferencePattern
public void setReferencePattern(String referencePattern)
Deprecated.
-
getResourceTimeout
public int getResourceTimeout()
Deprecated.Gets the milliseconds timeout after which any resource requested will stop trying and proceed with other parts of the page.- Returns:
- the timeout value, or
-1
if undefined - Since:
- 2.8.0
-
setResourceTimeout
public void setResourceTimeout(int resourceTimeout)
Deprecated.Sets the milliseconds timeout after which any resource requested will stop trying and proceed with other parts of the page.- Parameters:
resourceTimeout
- the timeout value, or-1
for undefined- Since:
- 2.8.0
-
getScreenshotScaleDimensions
public Dimension getScreenshotScaleDimensions()
Deprecated.Gets the pixel dimensions we want the stored screenshot to have.- Returns:
- dimension
- Since:
- 2.8.0
-
setScreenshotScaleDimensions
public void setScreenshotScaleDimensions(Dimension screenshotScaleDimensions)
Deprecated.Sets the pixel dimensions we want the stored screenshot to have.- Parameters:
screenshotScaleDimensions
- dimension- Since:
- 2.8.0
-
setScreenshotScaleDimensions
public void setScreenshotScaleDimensions(int width, int height)
Deprecated.Sets the pixel dimensions we want the stored screenshot to have.- Parameters:
width
- image widthheight
- image height- Since:
- 2.8.0
-
isScreenshotScaleStretch
public boolean isScreenshotScaleStretch()
Deprecated.Gets whether the screenshot should be stretch to to fill all the scale dimensions. Default keeps aspect ratio.- Returns:
true
to stretch- Since:
- 2.8.0
-
setScreenshotScaleStretch
public void setScreenshotScaleStretch(boolean screenshotScaleStretch)
Deprecated.Sets whether the screenshot should be stretch to to fill all the scale dimensions. Default keeps aspect ratio.- Parameters:
screenshotScaleStretch
-true
to stretch- Since:
- 2.8.0
-
getScreenshotImageFormat
public String getScreenshotImageFormat()
Deprecated.Gets the screenshot image format (jpg, png, gif, bmp, etc.).- Returns:
- image format
- Since:
- 2.8.0
-
setScreenshotImageFormat
public void setScreenshotImageFormat(String screenshotImageFormat)
Deprecated.Sets the screenshot image format (jpg, png, gif, bmp, etc.).- Parameters:
screenshotImageFormat
- image format- Since:
- 2.8.0
-
getScreenshotStorage
public List<PhantomJSDocumentFetcher.Storage> getScreenshotStorage()
Deprecated.Gets the screenshot storage mechanisms.- Returns:
- storage mechanisms (never
null
) - Since:
- 2.8.0
-
setScreenshotStorage
public void setScreenshotStorage(List<PhantomJSDocumentFetcher.Storage> screenshotStorage)
Deprecated.Sets the screenshot storage mechanisms.- Parameters:
screenshotStorage
- storage mechanisms- Since:
- 3.0.0
-
setScreenshotStorage
public void setScreenshotStorage(PhantomJSDocumentFetcher.Storage... screenshotStorage)
Deprecated.Sets the screenshot storage mechanisms.- Parameters:
screenshotStorage
- storage mechanisms- Since:
- 2.8.0
-
getScreenshotStorageDiskStructure
public PhantomJSDocumentFetcher.StorageDiskStructure getScreenshotStorageDiskStructure()
Deprecated.Gets the screenshot directory structure to create when storage is "disk".- Returns:
- directory structure
- Since:
- 2.8.0
-
setScreenshotStorageDiskStructure
public void setScreenshotStorageDiskStructure(PhantomJSDocumentFetcher.StorageDiskStructure screenshotStorageDiskStructure)
Deprecated.Sets the screenshot directory structure to create when storage is "disk".- Parameters:
screenshotStorageDiskStructure
- directory structure- Since:
- 2.8.0
-
getScreenshotScaleQuality
public PhantomJSDocumentFetcher.Quality getScreenshotScaleQuality()
Deprecated.Gets the screenshot scaling quality to use when when storage is "disk" or "inline". Default isPhantomJSDocumentFetcher.Quality.AUTO
- Returns:
- quality
- Since:
- 2.8.0
-
setScreenshotScaleQuality
public void setScreenshotScaleQuality(PhantomJSDocumentFetcher.Quality screenshotScaleQuality)
Deprecated.Sets the screenshot scaling quality to use when when storage is "disk" or "inline".- Parameters:
screenshotScaleQuality
- quality- Since:
- 2.8.0
-
getUserAgent
public String getUserAgent()
Deprecated.
-
accept
public boolean accept(Doc doc, HttpMethod httpMethod)
Deprecated.- Specified by:
accept
in interfaceIHttpFetcher
- Overrides:
accept
in classAbstractHttpFetcher
-
accept
protected boolean accept(HttpMethod httpMethod)
Deprecated.Description copied from class:AbstractHttpFetcher
Whether the supplied HttpMethod is supported by this fetcher.- Specified by:
accept
in classAbstractHttpFetcher
- Parameters:
httpMethod
- the HTTP method- Returns:
true
if supported
-
fetch
public IHttpFetchResponse fetch(CrawlDoc doc, HttpMethod httpMethod) throws HttpFetchException
Deprecated.Description copied from interface:IHttpFetcher
Performs an HTTP request for the supplied document reference and HTTP method.
For each HTTP method supported, implementors should do their best to populate the document and its
CrawlDocInfo
with as much information they can.Unsupported HTTP methods should return an HTTP response with the
CrawlState.UNSUPPORTED
state. To prevent userse having to configure multiple HTTP clients, implementors should try to support both theGET
andHEAD
methods. POST is only used in special cases and is often not used during a crawl session.A
null
method is treated as aGET
.- Parameters:
doc
- document to fetch or to use to make the request.httpMethod
- HTTP method- Returns:
- an HTTP response
- Throws:
HttpFetchException
- problem when fetching the document- See Also:
HttpFetchResponseBuilder.unsupported()
-
loadHttpFetcherFromXML
protected void loadHttpFetcherFromXML(XML xml)
Deprecated.- Specified by:
loadHttpFetcherFromXML
in classAbstractHttpFetcher
-
saveHttpFetcherToXML
protected void saveHttpFetcherToXML(XML xml)
Deprecated.- Specified by:
saveHttpFetcherToXML
in classAbstractHttpFetcher
-
equals
public boolean equals(Object other)
Deprecated.- Overrides:
equals
in classAbstractHttpFetcher
-
hashCode
public int hashCode()
Deprecated.- Overrides:
hashCode
in classAbstractHttpFetcher
-
toString
public String toString()
Deprecated.- Overrides:
toString
in classAbstractHttpFetcher
-
-