Class PhantomJSDocumentFetcher
- java.lang.Object
-
- com.norconex.collector.http.fetch.AbstractHttpFetcher
-
- com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher
-
- All Implemented Interfaces:
IHttpFetcher,IEventListener<Event>,IXMLConfigurable,EventListener,Consumer<Event>
@Deprecated public class PhantomJSDocumentFetcher extends AbstractHttpFetcher
Deprecated.Since 3.0.0 useWebDriverHttpFetcherDeprecation notice
PhantomJS headless browser is no longer maintained by its owner. As such, starting with version 3.0.0, use of PhantomJSDocumentFetcher is strongly discouraged and HttpClientProxy support for it has been dropped. With more popular browsers (e.g. Chrome) now supporting operating in headless mode, we now have more stable options. Please consider using
WebDriverHttpFetcherinstead when attempting to crawl a JavaScript-driven website.
An alternative to the
GenericHttpFetcherwhich relies on an external PhantomJS installation to fetch web pages. While less efficient, this implementation is meant to provide some way to crawl sites making heavy use of JavaScript to render their pages. This class tells the PhantomJS headless browser to wait a certain amount of time for the page to load extra content via Ajax requests before grabbing all loaded HTML.Considerations
Relying on an external software to fetch pages is slower and not as scalable and may be less stable. The use of
GenericHttpFetchershould be preferred whenever possible. Use at your own risk. Use PhantomJS 2.1 (or possibly higher).Handling of non-HTML Pages
It is usually only useful to use PhantomJS for HTML pages with JavaScript. Other types of documents are fetched using an instance of
GenericHttpFetcherTo find out if we are dealing with an HTML documents, this fetcher needs to know the content type first. By default, the content type of a document is not known before a physical copy is obtained. This means PhantomJS has to first download the document and if it is not an HTML document at that point, it will be re-downloaded again with the generic document fetcher. By default, these content-types are considered HTML:text/html, application/xhtml+xml, application/vnd.wap.xhtml+xml, application/x-asp
Those can be overwritten with
setContentTypePattern(String).Avoid double-downloads
To avoid downloading the document twice as described above, you can configure a metadata fetcher (such as
GenericHttpFetcher). This will attempt get the content type by first making an HTTP HEAD request.Alternatively, if you have a URL pattern that identifies your HTML pages (and only HTML pages), you can specify it using
setReferencePattern(String). Only URLs matching the provided regular expression will be fetched by PhantomJS. By default there is no pattern for discriminating on URL references.Taking screenshots of pages
Thanks to PhantomJS, one can save images of pages being crawled, including those rendered with JavaScript!
Since 2.8.0, you have to explicitely enabled screenshots with
setScreenshotEnabled(boolean). Also screenshots now share the same size by default. In addition, you can now control how screenshots are resized and how they are stored stored. Storage options:-
inline: Stores a Base64 string of the scaled image, in the format
specified, in a
collector.featured-image-inlinefield. The string is ready to be used inline, in a <img src="..."> tag. -
disk: Stores the scaled image on the file system, in the format
and directory specified. A reference to the file on disk is stored
in a
collector.featured-image-pathfield.
Since 2.8.0, it is possible to specify a resource timeout so that slow individual page resources do not cause PhantomJS to hang for a long time.
PhantomJS exit values
Since 2.9.1, it is possible to specify which PhantomJS exit values are to be considered "valid". Use a comma-separated-list of integers using the
setValidExitCodes(int...)method. By default, only zero is considered valid.XML configuration entries expecting millisecond durations can be provided in human-readable format (English only), as per
DurationParser(e.g., "5 minutes and 30 seconds" or "5m30s").XML configuration usage:
<documentFetcher class="com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher" detectContentType="[false|true]" detectCharset="[false|true]" screenshotEnabled="[false|true]"> <exePath>(path to PhantomJS executable)</exePath> <scriptPath> (Optional path to a PhantomJS script. Defaults to scripts/phantom.js) </scriptPath> <renderWaitTime> (Milliseconds to wait for the entire page to load. Defaults to 3000, i.e., 3 seconds.) </renderWaitTime> <resourceTimeout> (Optional Milliseconds to wait for a page resource to load. Defaults is unspecified.) </resourceTimeout> <options> <opt>(optional extra PhantomJS command-line option)</opt> <!-- You can have multiple opt tags --> </options> <referencePattern> (Regular expression matching URLs for which to use the PhantomJS browser. Non-matching URLs will fallback to using GenericDocumentFetcher.) </referencePattern> <contentTypePattern> (Regular expression matching content types for which to use the PhantomJS browser. Non-matching content types will use the GenericDocumentFetcher.) </contentTypePattern> <validExitCodes>(defaults to 0)</validExitCodes> <validStatusCodes>(defaults to 200)</validStatusCodes> <notFoundStatusCodes>(defaults to 404)</notFoundStatusCodes> <headersPrefix>(string to prefix headers)</headersPrefix> <!-- Only applicable when screenshotEnabled is true: --> <screenshotDimensions> (Pixel size of the browser page area to capture: [width]x[height]. E.g., 800x600. Only used when a screenshot path is specified. Default is undefined. It will try to load all it can and may produce vertically long images.) </screenshotDimensions> <screenshotZoomFactor> (A decimal value to scale the screenshot image. E.g., 0.25 will make the image 25% its regular size, which is 25% of the above dimension if specified. Default is 1, i.e., 100%) </screenshotZoomFactor> <screenshotScaleDimensions> (Target pixel size the main image should be scaled to. Default is 300.) </screenshotScaleDimensions> <screenshotScaleStretch> [false|true] (Whether to stretch to match scale size. Default keeps aspect ratio.) </screenshotScaleStretch> <screenshotScaleQuality> [auto|low|medium|high|max] (Default is "auto", which tries the best balance between quality and speed based on image size. The lower the quality the faster it is to scale images.) </screenshotScaleQuality> <screenshotImageFormat> (Target format of stored image. E.g., "jpg", "png", "gif", "bmp", ... Default is "png") </screenshotImageFormat> <screenshotStorage> [disk|inline] (One or both, comma-separated. Default is "disk".) </screenshotStorage> <!-- Only applicable for "disk" storage: --> <screenshotStorageDiskDir structure="[url2path|date|datetime]"> (Path where to save screenshots.) </screenshotStorageDiskDir> <screenshotStorageDiskField> (Overwrite default field where to store the screenshot path.) </screenshotStorageDiskField> <!-- Only applicable for "inline" storage: --> <screenshotStorageInlineField> (Overwrite default field where to store the inline screenshot.) </screenshotStorageInlineField> </documentFetcher>When specifying an image size, the format is
[width]x[height]or a single value. When a single value is used, that value represents both the width and height (i.e., a square).The "validStatusCodes" and "notFoundStatusCodes" elements expect a coma-separated list of HTTP response code. If a code is added in both elements, the valid list takes precedence.
Usage example:
The following configures HTTP Collector to use PhantomJS with a proxy to use HttpClient, only for URLs ending with ".html".
<httpcollector id="MyHttpCollector"> ... <crawlers> <crawler id="MyCrawler"> ... <documentFetcher class="com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher"> <exePath>/path/to/phantomjs.exe</exePath> <renderWaitTime>5000</renderWaitTime> <referencePattern>^.*\.html$</referencePattern> </documentFetcher> ... </crawler> </crawlers> ... <!-- Only if you need to use the HttpClient proxy (see documentation): --> <collectorListeners> <listener class="com.norconex.collector.http.fetch.impl.HttpClientProxyCollectorListener" /> </collectorListeners> </httpcollector>- Since:
- 2.7.0
- Author:
- Pascal Essiembre
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static classPhantomJSDocumentFetcher.QualityDeprecated.static classPhantomJSDocumentFetcher.StorageDeprecated.static classPhantomJSDocumentFetcher.StorageDiskStructureDeprecated.
-
Field Summary
Fields Modifier and Type Field Description static StringCOLLECTOR_PHANTOMJS_SCREENSHOT_INLINEDeprecated.static StringCOLLECTOR_PHANTOMJS_SCREENSHOT_PATHDeprecated.static StringDEFAULT_CONTENT_TYPE_PATTERNDeprecated.static intDEFAULT_RENDER_WAIT_TIMEDeprecated.static StringDEFAULT_SCREENSHOT_IMAGE_FORMATDeprecated.static DimensionDEFAULT_SCREENSHOT_SCALE_SIZEDeprecated.static PhantomJSDocumentFetcher.StorageDEFAULT_SCREENSHOT_STORAGEDeprecated.static StringDEFAULT_SCREENSHOT_STORAGE_DISK_DIRDeprecated.static floatDEFAULT_SCREENSHOT_ZOOM_FACTORDeprecated.static StringDEFAULT_SCRIPT_PATHDeprecated.
-
Constructor Summary
Constructors Constructor Description PhantomJSDocumentFetcher()Deprecated.PhantomJSDocumentFetcher(int[] validStatusCodes)Deprecated.
-
Method Summary
All Methods Instance Methods Concrete Methods Deprecated Methods Modifier and Type Method Description protected booleanaccept(HttpMethod httpMethod)Deprecated.Whether the supplied HttpMethod is supported by this fetcher.booleanaccept(Doc doc, HttpMethod httpMethod)Deprecated.booleanequals(Object other)Deprecated.IHttpFetchResponsefetch(CrawlDoc doc, HttpMethod httpMethod)Deprecated.Performs an HTTP request for the supplied document reference and HTTP method.StringgetContentTypePattern()Deprecated.StringgetExePath()Deprecated.StringgetHeadersPrefix()Deprecated.List<Integer>getNotFoundStatusCodes()Deprecated.Gets HTTP status codes to be considered as "Not found" state.List<String>getOptions()Deprecated.StringgetReferencePattern()Deprecated.intgetRenderWaitTime()Deprecated.intgetResourceTimeout()Deprecated.Gets the milliseconds timeout after which any resource requested will stop trying and proceed with other parts of the page.DimensiongetScreenshotDimensions()Deprecated.StringgetScreenshotImageFormat()Deprecated.Gets the screenshot image format (jpg, png, gif, bmp, etc.).DimensiongetScreenshotScaleDimensions()Deprecated.Gets the pixel dimensions we want the stored screenshot to have.PhantomJSDocumentFetcher.QualitygetScreenshotScaleQuality()Deprecated.Gets the screenshot scaling quality to use when when storage is "disk" or "inline".List<PhantomJSDocumentFetcher.Storage>getScreenshotStorage()Deprecated.Gets the screenshot storage mechanisms.StringgetScreenshotStorageDiskDir()Deprecated.Gets the directory where screenshots are saved when storage is "disk".StringgetScreenshotStorageDiskField()Deprecated.Gets the target document metadata field where to store the path to thescreen shot image file when storage is "disk".PhantomJSDocumentFetcher.StorageDiskStructuregetScreenshotStorageDiskStructure()Deprecated.Gets the screenshot directory structure to create when storage is "disk".StringgetScreenshotStorageInlineField()Deprecated.Gets the target document metadata field where to store the inline (Base64) screenshot image when storage is "inline".floatgetScreenshotZoomFactor()Deprecated.StringgetScriptPath()Deprecated.StringgetUserAgent()Deprecated.List<Integer>getValidExitCodes()Deprecated.Sets valid PhantomJS exit values (defaults to 0).List<Integer>getValidStatusCodes()Deprecated.inthashCode()Deprecated.booleanisDetectCharset()Deprecated.booleanisDetectContentType()Deprecated.booleanisScreenshotEnabled()Deprecated.Gets whether to enable taking screenshot of crawled web pages.booleanisScreenshotScaleStretch()Deprecated.Gets whether the screenshot should be stretch to to fill all the scale dimensions.protected voidloadHttpFetcherFromXML(XML xml)Deprecated.protected voidsaveHttpFetcherToXML(XML xml)Deprecated.voidsetContentTypePattern(String contentTypePattern)Deprecated.voidsetDetectCharset(boolean detectCharset)Deprecated.voidsetDetectContentType(boolean detectContentType)Deprecated.voidsetExePath(String exePath)Deprecated.voidsetHeadersPrefix(String headersPrefix)Deprecated.voidsetNotFoundStatusCodes(int... notFoundStatusCodes)Deprecated.Sets HTTP status codes to be considered as "Not found" state.voidsetNotFoundStatusCodes(List<Integer> notFoundStatusCodes)Deprecated.Sets HTTP status codes to be considered as "Not found" state.voidsetOptions(String... options)Deprecated.Sets optional extra PhantomJS command-line options.voidsetOptions(List<String> options)Deprecated.Sets optional extra PhantomJS command-line options.voidsetReferencePattern(String referencePattern)Deprecated.voidsetRenderWaitTime(int renderWaitTime)Deprecated.voidsetResourceTimeout(int resourceTimeout)Deprecated.Sets the milliseconds timeout after which any resource requested will stop trying and proceed with other parts of the page.voidsetScreenshotDimensions(int width, int height)Deprecated.voidsetScreenshotDimensions(Dimension screenshotDimensions)Deprecated.voidsetScreenshotEnabled(boolean screenshotEnabled)Deprecated.Sets whether to enable taking screenshot of crawled web pages.voidsetScreenshotImageFormat(String screenshotImageFormat)Deprecated.Sets the screenshot image format (jpg, png, gif, bmp, etc.).voidsetScreenshotScaleDimensions(int width, int height)Deprecated.Sets the pixel dimensions we want the stored screenshot to have.voidsetScreenshotScaleDimensions(Dimension screenshotScaleDimensions)Deprecated.Sets the pixel dimensions we want the stored screenshot to have.voidsetScreenshotScaleQuality(PhantomJSDocumentFetcher.Quality screenshotScaleQuality)Deprecated.Sets the screenshot scaling quality to use when when storage is "disk" or "inline".voidsetScreenshotScaleStretch(boolean screenshotScaleStretch)Deprecated.Sets whether the screenshot should be stretch to to fill all the scale dimensions.voidsetScreenshotStorage(PhantomJSDocumentFetcher.Storage... screenshotStorage)Deprecated.Sets the screenshot storage mechanisms.voidsetScreenshotStorage(List<PhantomJSDocumentFetcher.Storage> screenshotStorage)Deprecated.Sets the screenshot storage mechanisms.voidsetScreenshotStorageDiskDir(String screenshotStorageDiskDir)Deprecated.Sets the directory where screenshots are saved when storage is "disk".voidsetScreenshotStorageDiskField(String screenshotStorageDiskField)Deprecated.Sets the target document metadata field where to store the path to thescreen shot image file when storage is "disk".voidsetScreenshotStorageDiskStructure(PhantomJSDocumentFetcher.StorageDiskStructure screenshotStorageDiskStructure)Deprecated.Sets the screenshot directory structure to create when storage is "disk".voidsetScreenshotStorageInlineField(String screenshotStorageInlineField)Deprecated.Sets the target document metadata field where to store the inline (Base64) screenshot image when storage is "inline".voidsetScreenshotZoomFactor(float screenshotZoomFactor)Deprecated.voidsetScriptPath(String scriptPath)Deprecated.voidsetValidExitCodes(int... validExitCodes)Deprecated.Sets valid PhantomJS exit values (defaults to 0).voidsetValidExitCodes(List<Integer> validExitCodes)Deprecated.Sets valid PhantomJS exit values (defaults to 0).voidsetValidStatusCodes(int... validStatusCodes)Deprecated.Gets valid HTTP response status codes.voidsetValidStatusCodes(List<Integer> validStatusCodes)Deprecated.Gets valid HTTP response status codes.StringtoString()Deprecated.-
Methods inherited from class com.norconex.collector.http.fetch.AbstractHttpFetcher
accept, fetcherShutdown, fetcherStartup, fetcherThreadBegin, fetcherThreadEnd, getReferenceFilters, loadFromXML, saveToXML, setReferenceFilters, setReferenceFilters
-
-
-
-
Field Detail
-
DEFAULT_SCRIPT_PATH
public static final String DEFAULT_SCRIPT_PATH
Deprecated.- See Also:
- Constant Field Values
-
DEFAULT_RENDER_WAIT_TIME
public static final int DEFAULT_RENDER_WAIT_TIME
Deprecated.- See Also:
- Constant Field Values
-
DEFAULT_SCREENSHOT_ZOOM_FACTOR
public static final float DEFAULT_SCREENSHOT_ZOOM_FACTOR
Deprecated.- See Also:
- Constant Field Values
-
DEFAULT_CONTENT_TYPE_PATTERN
public static final String DEFAULT_CONTENT_TYPE_PATTERN
Deprecated.- See Also:
- Constant Field Values
-
DEFAULT_SCREENSHOT_STORAGE_DISK_DIR
public static final String DEFAULT_SCREENSHOT_STORAGE_DISK_DIR
Deprecated.- See Also:
- Constant Field Values
-
DEFAULT_SCREENSHOT_STORAGE
public static final PhantomJSDocumentFetcher.Storage DEFAULT_SCREENSHOT_STORAGE
Deprecated.
-
DEFAULT_SCREENSHOT_IMAGE_FORMAT
public static final String DEFAULT_SCREENSHOT_IMAGE_FORMAT
Deprecated.- See Also:
- Constant Field Values
-
DEFAULT_SCREENSHOT_SCALE_SIZE
public static final Dimension DEFAULT_SCREENSHOT_SCALE_SIZE
Deprecated.
-
COLLECTOR_PHANTOMJS_SCREENSHOT_PATH
public static final String COLLECTOR_PHANTOMJS_SCREENSHOT_PATH
Deprecated.- See Also:
- Constant Field Values
-
COLLECTOR_PHANTOMJS_SCREENSHOT_INLINE
public static final String COLLECTOR_PHANTOMJS_SCREENSHOT_INLINE
Deprecated.- See Also:
- Constant Field Values
-
-
Method Detail
-
getExePath
public String getExePath()
Deprecated.
-
setExePath
public void setExePath(String exePath)
Deprecated.
-
getScriptPath
public String getScriptPath()
Deprecated.
-
setScriptPath
public void setScriptPath(String scriptPath)
Deprecated.
-
getRenderWaitTime
public int getRenderWaitTime()
Deprecated.
-
setRenderWaitTime
public void setRenderWaitTime(int renderWaitTime)
Deprecated.
-
setOptions
public void setOptions(List<String> options)
Deprecated.Sets optional extra PhantomJS command-line options.- Parameters:
options- extra command line arguments- Since:
- 3.0.0
-
setOptions
public void setOptions(String... options)
Deprecated.Sets optional extra PhantomJS command-line options.- Parameters:
options- extra command line arguments
-
getScreenshotStorageDiskDir
public String getScreenshotStorageDiskDir()
Deprecated.Gets the directory where screenshots are saved when storage is "disk". Default is "./screenshots".- Returns:
- directory
- Since:
- 2.8.0
-
setScreenshotStorageDiskDir
public void setScreenshotStorageDiskDir(String screenshotStorageDiskDir)
Deprecated.Sets the directory where screenshots are saved when storage is "disk". Use this method to overwrite the default ("./screenshots").- Parameters:
screenshotStorageDiskDir- directory- Since:
- 2.8.0
-
getScreenshotStorageDiskField
public String getScreenshotStorageDiskField()
Deprecated.Gets the target document metadata field where to store the path to thescreen shot image file when storage is "disk". Default is "collector.phantomjs-screenshot-path".- Returns:
- field name
- Since:
- 2.8.0
-
setScreenshotStorageDiskField
public void setScreenshotStorageDiskField(String screenshotStorageDiskField)
Deprecated.Sets the target document metadata field where to store the path to thescreen shot image file when storage is "disk". Use this method to overwrite the default ("collector.phantomjs-screenshot-path").- Parameters:
screenshotStorageDiskField- field name- Since:
- 2.8.0
-
getScreenshotStorageInlineField
public String getScreenshotStorageInlineField()
Deprecated.Gets the target document metadata field where to store the inline (Base64) screenshot image when storage is "inline". Default is "collector.phantomjs-screenshot-inline".- Returns:
- field name
- Since:
- 2.8.0
-
setScreenshotStorageInlineField
public void setScreenshotStorageInlineField(String screenshotStorageInlineField)
Deprecated.Sets the target document metadata field where to store the inline (Base64) screenshot image when storage is "inline". Use this method to overwrite the default ("collector.phantomjs-screenshot-inline").- Parameters:
screenshotStorageInlineField- field name- Since:
- 2.8.0
-
isScreenshotEnabled
public boolean isScreenshotEnabled()
Deprecated.Gets whether to enable taking screenshot of crawled web pages.- Returns:
trueif enabled- Since:
- 2.8.0
-
setScreenshotEnabled
public void setScreenshotEnabled(boolean screenshotEnabled)
Deprecated.Sets whether to enable taking screenshot of crawled web pages.- Parameters:
screenshotEnabled-trueif enabled- Since:
- 2.8.0
-
getScreenshotDimensions
public Dimension getScreenshotDimensions()
Deprecated.
-
setScreenshotDimensions
public void setScreenshotDimensions(int width, int height)Deprecated.
-
setScreenshotDimensions
public void setScreenshotDimensions(Dimension screenshotDimensions)
Deprecated.
-
getScreenshotZoomFactor
public float getScreenshotZoomFactor()
Deprecated.
-
setScreenshotZoomFactor
public void setScreenshotZoomFactor(float screenshotZoomFactor)
Deprecated.
-
getValidExitCodes
public List<Integer> getValidExitCodes()
Deprecated.Sets valid PhantomJS exit values (defaults to 0).- Returns:
- valid exit codes
- Since:
- 2.9.1
-
setValidExitCodes
public void setValidExitCodes(List<Integer> validExitCodes)
Deprecated.Sets valid PhantomJS exit values (defaults to 0).- Parameters:
validExitCodes- valid exit codes- Since:
- 2.9.1
-
setValidExitCodes
public void setValidExitCodes(int... validExitCodes)
Deprecated.Sets valid PhantomJS exit values (defaults to 0).- Parameters:
validExitCodes- valid exit codes- Since:
- 2.9.1
-
setValidStatusCodes
public void setValidStatusCodes(List<Integer> validStatusCodes)
Deprecated.Gets valid HTTP response status codes.- Parameters:
validStatusCodes- valid status codes- Since:
- 3.0.0
-
setValidStatusCodes
public void setValidStatusCodes(int... validStatusCodes)
Deprecated.Gets valid HTTP response status codes.- Parameters:
validStatusCodes- valid status codes
-
getNotFoundStatusCodes
public List<Integer> getNotFoundStatusCodes()
Deprecated.Gets HTTP status codes to be considered as "Not found" state. Default is 404.- Returns:
- "Not found" codes
-
setNotFoundStatusCodes
public final void setNotFoundStatusCodes(int... notFoundStatusCodes)
Deprecated.Sets HTTP status codes to be considered as "Not found" state.- Parameters:
notFoundStatusCodes- "Not found" codes
-
setNotFoundStatusCodes
public final void setNotFoundStatusCodes(List<Integer> notFoundStatusCodes)
Deprecated.Sets HTTP status codes to be considered as "Not found" state.- Parameters:
notFoundStatusCodes- "Not found" codes- Since:
- 3.0.0
-
getHeadersPrefix
public String getHeadersPrefix()
Deprecated.
-
setHeadersPrefix
public void setHeadersPrefix(String headersPrefix)
Deprecated.
-
isDetectContentType
public boolean isDetectContentType()
Deprecated.
-
setDetectContentType
public void setDetectContentType(boolean detectContentType)
Deprecated.
-
isDetectCharset
public boolean isDetectCharset()
Deprecated.
-
setDetectCharset
public void setDetectCharset(boolean detectCharset)
Deprecated.
-
getContentTypePattern
public String getContentTypePattern()
Deprecated.
-
setContentTypePattern
public void setContentTypePattern(String contentTypePattern)
Deprecated.
-
getReferencePattern
public String getReferencePattern()
Deprecated.
-
setReferencePattern
public void setReferencePattern(String referencePattern)
Deprecated.
-
getResourceTimeout
public int getResourceTimeout()
Deprecated.Gets the milliseconds timeout after which any resource requested will stop trying and proceed with other parts of the page.- Returns:
- the timeout value, or
-1if undefined - Since:
- 2.8.0
-
setResourceTimeout
public void setResourceTimeout(int resourceTimeout)
Deprecated.Sets the milliseconds timeout after which any resource requested will stop trying and proceed with other parts of the page.- Parameters:
resourceTimeout- the timeout value, or-1for undefined- Since:
- 2.8.0
-
getScreenshotScaleDimensions
public Dimension getScreenshotScaleDimensions()
Deprecated.Gets the pixel dimensions we want the stored screenshot to have.- Returns:
- dimension
- Since:
- 2.8.0
-
setScreenshotScaleDimensions
public void setScreenshotScaleDimensions(Dimension screenshotScaleDimensions)
Deprecated.Sets the pixel dimensions we want the stored screenshot to have.- Parameters:
screenshotScaleDimensions- dimension- Since:
- 2.8.0
-
setScreenshotScaleDimensions
public void setScreenshotScaleDimensions(int width, int height)Deprecated.Sets the pixel dimensions we want the stored screenshot to have.- Parameters:
width- image widthheight- image height- Since:
- 2.8.0
-
isScreenshotScaleStretch
public boolean isScreenshotScaleStretch()
Deprecated.Gets whether the screenshot should be stretch to to fill all the scale dimensions. Default keeps aspect ratio.- Returns:
trueto stretch- Since:
- 2.8.0
-
setScreenshotScaleStretch
public void setScreenshotScaleStretch(boolean screenshotScaleStretch)
Deprecated.Sets whether the screenshot should be stretch to to fill all the scale dimensions. Default keeps aspect ratio.- Parameters:
screenshotScaleStretch-trueto stretch- Since:
- 2.8.0
-
getScreenshotImageFormat
public String getScreenshotImageFormat()
Deprecated.Gets the screenshot image format (jpg, png, gif, bmp, etc.).- Returns:
- image format
- Since:
- 2.8.0
-
setScreenshotImageFormat
public void setScreenshotImageFormat(String screenshotImageFormat)
Deprecated.Sets the screenshot image format (jpg, png, gif, bmp, etc.).- Parameters:
screenshotImageFormat- image format- Since:
- 2.8.0
-
getScreenshotStorage
public List<PhantomJSDocumentFetcher.Storage> getScreenshotStorage()
Deprecated.Gets the screenshot storage mechanisms.- Returns:
- storage mechanisms (never
null) - Since:
- 2.8.0
-
setScreenshotStorage
public void setScreenshotStorage(List<PhantomJSDocumentFetcher.Storage> screenshotStorage)
Deprecated.Sets the screenshot storage mechanisms.- Parameters:
screenshotStorage- storage mechanisms- Since:
- 3.0.0
-
setScreenshotStorage
public void setScreenshotStorage(PhantomJSDocumentFetcher.Storage... screenshotStorage)
Deprecated.Sets the screenshot storage mechanisms.- Parameters:
screenshotStorage- storage mechanisms- Since:
- 2.8.0
-
getScreenshotStorageDiskStructure
public PhantomJSDocumentFetcher.StorageDiskStructure getScreenshotStorageDiskStructure()
Deprecated.Gets the screenshot directory structure to create when storage is "disk".- Returns:
- directory structure
- Since:
- 2.8.0
-
setScreenshotStorageDiskStructure
public void setScreenshotStorageDiskStructure(PhantomJSDocumentFetcher.StorageDiskStructure screenshotStorageDiskStructure)
Deprecated.Sets the screenshot directory structure to create when storage is "disk".- Parameters:
screenshotStorageDiskStructure- directory structure- Since:
- 2.8.0
-
getScreenshotScaleQuality
public PhantomJSDocumentFetcher.Quality getScreenshotScaleQuality()
Deprecated.Gets the screenshot scaling quality to use when when storage is "disk" or "inline". Default isPhantomJSDocumentFetcher.Quality.AUTO- Returns:
- quality
- Since:
- 2.8.0
-
setScreenshotScaleQuality
public void setScreenshotScaleQuality(PhantomJSDocumentFetcher.Quality screenshotScaleQuality)
Deprecated.Sets the screenshot scaling quality to use when when storage is "disk" or "inline".- Parameters:
screenshotScaleQuality- quality- Since:
- 2.8.0
-
getUserAgent
public String getUserAgent()
Deprecated.
-
accept
public boolean accept(Doc doc, HttpMethod httpMethod)
Deprecated.- Specified by:
acceptin interfaceIHttpFetcher- Overrides:
acceptin classAbstractHttpFetcher
-
accept
protected boolean accept(HttpMethod httpMethod)
Deprecated.Description copied from class:AbstractHttpFetcherWhether the supplied HttpMethod is supported by this fetcher.- Specified by:
acceptin classAbstractHttpFetcher- Parameters:
httpMethod- the HTTP method- Returns:
trueif supported
-
fetch
public IHttpFetchResponse fetch(CrawlDoc doc, HttpMethod httpMethod) throws HttpFetchException
Deprecated.Description copied from interface:IHttpFetcherPerforms an HTTP request for the supplied document reference and HTTP method.
For each HTTP method supported, implementors should do their best to populate the document and its
CrawlDocInfowith as much information they can.Unsupported HTTP methods should return an HTTP response with the
CrawlState.UNSUPPORTEDstate. To prevent userse having to configure multiple HTTP clients, implementors should try to support both theGETandHEADmethods. POST is only used in special cases and is often not used during a crawl session.A
nullmethod is treated as aGET.- Parameters:
doc- document to fetch or to use to make the request.httpMethod- HTTP method- Returns:
- an HTTP response
- Throws:
HttpFetchException- problem when fetching the document- See Also:
HttpFetchResponseBuilder.unsupported()
-
loadHttpFetcherFromXML
protected void loadHttpFetcherFromXML(XML xml)
Deprecated.- Specified by:
loadHttpFetcherFromXMLin classAbstractHttpFetcher
-
saveHttpFetcherToXML
protected void saveHttpFetcherToXML(XML xml)
Deprecated.- Specified by:
saveHttpFetcherToXMLin classAbstractHttpFetcher
-
equals
public boolean equals(Object other)
Deprecated.- Overrides:
equalsin classAbstractHttpFetcher
-
hashCode
public int hashCode()
Deprecated.- Overrides:
hashCodein classAbstractHttpFetcher
-
toString
public String toString()
Deprecated.- Overrides:
toStringin classAbstractHttpFetcher
-
-