PhantomJSDocumentFetcher (Norconex HTTP Collector 2.9.1 API)

java.lang.Object
- com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher

All Implemented Interfaces:

IHttpDocumentFetcher, IXMLConfigurable
```
public class PhantomJSDocumentFetcher
extends Object
implements IHttpDocumentFetcher, IXMLConfigurable
```
An alternative to the GenericDocumentFetcher which relies on an external PhantomJS installation to fetch web pages. While less efficient, this implementation is meant to provide some way to crawl sites making heavy use of JavaScript to render their pages. This class tells the PhantomJS headless browser to wait a certain amount of time for the page to load extra content via Ajax requests before grabbing all loaded HTML.

Considerations

Relying on an external software to fetch pages is slower and not as scalable and may be less stable. The use of GenericDocumentFetcher should be preferred whenever possible. Use at your own risk. Use PhantomJS 2.1 (or possibly higher).

Handling of non-HTML Pages

It is usually only useful to use PhantomJS for HTML pages with JavaScript. Other types of documents are fetched using an instance of GenericDocumentFetcher To find out if we are dealing with an HTML documents, this fetcher needs to know the content type first. By default, the content type of a document is not known before a physical copy is obtained. This means PhantomJS has to first download the document and if it is not an HTML document at that point, it will be re-downloaded again with the generic document fetcher. By default, these content-types are considered HTML:
```
 text/html, application/xhtml+xml, application/vnd.wap.xhtml+xml, application/x-asp
 
```
Those can be overwritten with setContentTypePattern(String).

Avoid double-downloads

To avoid downloading the document twice as described above, you can configure a metadata fetcher (such as GenericMetadataFetcher). This will attempt get the content type by first making an HTTP HEAD request.

Alternatively, if you have a URL pattern that identifies your HTML pages (and only HTML pages), you can specify it using setReferencePattern(String). Only URLs matching the provided regular expression will be fetched by PhantomJS. By default there is no pattern for discriminating on URL references.

How to maintain HTTP sessions

Normally, the HTTP crawler is meant to be used with Apache HttpClient which is usually configured using GenericHttpClientFactory. Doing so ensures HTTP sessions are maintained between each URL invocation. This is necessary for web sites expecting cookies or session information to be carried over each requests as part of HTTP headers. Unfortunately, session information is not maintained between requests when invoking PhantomJS for each URLs. This means Apache HttpClient is not used at all and configuring IHttpClientFactory has no effect for fetching documents. As a result, you may have trouble with specific web sites.

If that's the case, you may want to try adding the HttpClientProxyCollectorListener to your collector configuration. This will start an HTTP proxy and force PhantomJS to use it. That proxy will use HttpClient to fetch documents as you would normally expect and you can full advantage of GenericHttpClientFactory (or your own implementation of IHttpClientFactory). Using a proxy with secure (https) requests may not always give expected results either (e.g., screenshots maybe broken). If you run into issues with a given site, try both approaches and pick the one that works best for you.

Taking screenshots of pages

Thanks to PhantomJS, one can save images of pages being crawled, including those rendered with JavaScript!

Since 2.8.0, you have to explicitely enabled screenshots with setScreenshotEnabled(boolean). Also screenshots now share the same size by default. In addition, you can now control how screenshots are resized and how they are stored stored. Storage options:
- inline: Stores a Base64 string of the scaled image, in the format specified, in a collector.featured-image-inline field. The string is ready to be used inline, in a <img src="..."> tag.
- disk: Stores the scaled image on the file system, in the format and directory specified. A reference to the file on disk is stored in a collector.featured-image-path field.
Since 2.8.0, it is possible to specify a resource timeout so that slow individual page resources do not cause PhantomJS to hang for a long time.

PhantomJS exit values

Since 2.9.1, it is possible to specify which PhantomJS exit values are to be considered "valid". Use a comma-separated-list of integers using the setValidExitCodes(int...) method. By default, only zero is considered valid.

XML configuration entries expecting millisecond durations can be provided in human-readable format (English only), as per DurationParser (e.g., "5 minutes and 30 seconds" or "5m30s").

XML configuration usage:
```
  <documentFetcher
      class="com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher"
      detectContentType="[false|true]" detectCharset="[false|true]"
      screenshotEnabled="[false|true]">
      <exePath>(path to PhantomJS executable)</exePath>
      <scriptPath>
          (Optional path to a PhantomJS script. Defaults to scripts/phantom.js)
      </scriptPath>
      <renderWaitTime>
          (Milliseconds to wait for the entire page to load.
           Defaults to 3000, i.e., 3 seconds.)
      </renderWaitTime>
      <resourceTimeout>
          (Optional Milliseconds to wait for a page resource to load.
           Defaults is unspecified.)
      </resourceTimeout>
      <options>
        <opt>(optional extra PhantomJS command-line option)</opt>
        
      </options>
      <referencePattern>
          (Regular expression matching URLs for which to use the
           PhantomJS browser. Non-matching URLs will fallback
           to using GenericDocumentFetcher.)
      </referencePattern>
      <contentTypePattern>
          (Regular expression matching content types for which to use
           the PhantomJS browser. Non-matching content types will use
           the GenericDocumentFetcher.)
      </contentTypePattern>
      <validExitCodes>(defaults to 0)</validExitCodes>
      <validStatusCodes>(defaults to 200)</validStatusCodes>
      <notFoundStatusCodes>(defaults to 404)</notFoundStatusCodes>
      <headersPrefix>(string to prefix headers)</headersPrefix>

      
      <screenshotDimensions>
          (Pixel size of the browser page area to capture: [width]x[height].
           E.g., 800x600.  Only used when a screenshot path is specified.
           Default is undefined. It will try to load all it can and may
           produce vertically long images.)
      </screenshotDimensions>
      <screenshotZoomFactor>
          (A decimal value to scale the screenshot image.
           E.g., 0.25  will make the image 25% its regular size,
           which is 25% of the above dimension if specified.
           Default is 1, i.e., 100%)
      </screenshotZoomFactor>
      <screenshotScaleDimensions>
         (Target pixel size the main image should be scaled to.
          Default is 300.)
      </screenshotScaleDimensions>
      <screenshotScaleStretch>
         [false|true]
         (Whether to stretch to match scale size. Default keeps aspect ratio.)
      </screenshotScaleStretch>
      <screenshotScaleQuality>
          [auto|low|medium|high|max]
          (Default is "auto", which tries the best balance between quality
           and speed based on image size. The lower the quality the faster
           it is to scale images.)
      </screenshotScaleQuality>
      <screenshotImageFormat>
         (Target format of stored image. E.g., "jpg", "png", "gif", "bmp", ...
          Default is "png")
      </screenshotImageFormat>
      <screenshotStorage>
         [disk|inline]
         (One or both, comma-separated. Default is "disk".)
      </screenshotStorage>

      
      <screenshotStorageDiskDir structure="[url2path|date|datetime]">
          (Path where to save screenshots.)
      </screenshotStorageDiskDir>
      <screenshotStorageDiskField>
          (Overwrite default field where to store the screenshot path.)
      </screenshotStorageDiskField>

      
      <screenshotStorageInlineField>
          (Overwrite default field where to store the inline screenshot.)
      </screenshotStorageInlineField>

  </documentFetcher>
 
```
When specifying an image size, the format is [width]x[height] or a single value. When a single value is used, that value represents both the width and height (i.e., a square).

The "validStatusCodes" and "notFoundStatusCodes" elements expect a coma-separated list of HTTP response code. If a code is added in both elements, the valid list takes precedence.

Usage example:

The following configures HTTP Collector to use PhantomJS with a proxy to use HttpClient, only for URLs ending with ".html".
```
  <httpcollector id="MyHttpCollector">
    ...
    <crawlers>
      <crawler id="MyCrawler">
        ...
        <documentFetcher class="com.norconex.collector.http.fetch.impl.PhantomJSDocumentFetcher">
          <exePath>/path/to/phantomjs.exe</exePath>
          <renderWaitTime>5000</renderWaitTime>
          <referencePattern>^.*\.html$</referencePattern>
        </documentFetcher>
        ...
      </crawler>
    </crawlers>
    ...
    
    <collectorListeners>
      <listener class="com.norconex.collector.http.fetch.impl.HttpClientProxyCollectorListener" />
    </collectorListeners>
  </httpcollector>
 
```
Since:

2.7.0

Author:

Pascal Essiembre

See Also:

HttpClientProxyCollectorListener

Nested Class Summary

Nested Classes
Modifier and Type	Class and Description
`static class`	`PhantomJSDocumentFetcher.Quality`
`static class`	`PhantomJSDocumentFetcher.Storage`
`static class`	`PhantomJSDocumentFetcher.StorageDiskStructure`

Field Summary

Fields
Modifier and Type	Field and Description
`static String`	`COLLECTOR_PHANTOMJS_SCREENSHOT_INLINE`
`static String`	`COLLECTOR_PHANTOMJS_SCREENSHOT_PATH`
`static String`	`DEFAULT_CONTENT_TYPE_PATTERN`
`static int`	`DEFAULT_RENDER_WAIT_TIME`
`static String`	`DEFAULT_SCREENSHOT_IMAGE_FORMAT`
`static Dimension`	`DEFAULT_SCREENSHOT_SCALE_SIZE`
`static PhantomJSDocumentFetcher.Storage`	`DEFAULT_SCREENSHOT_STORAGE`
`static String`	`DEFAULT_SCREENSHOT_STORAGE_DISK_DIR`
`static float`	`DEFAULT_SCREENSHOT_ZOOM_FACTOR`
`static String`	`DEFAULT_SCRIPT_PATH`

Constructor Summary

Constructors
Constructor and Description

PhantomJSDocumentFetcher()

PhantomJSDocumentFetcher(int[] validStatusCodes)

Constructors
Constructor and Description
`PhantomJSDocumentFetcher()`
`PhantomJSDocumentFetcher(int[] validStatusCodes)`

Method Summary

All Methods Instance Methods Concrete Methods Deprecated Methods
Modifier and Type	Method and Description
`boolean`	`equals(Object other)`
`HttpFetchResponse`	`fetchDocument(org.apache.http.client.HttpClient httpClient, HttpDocument doc)` Fetches HTTP document and saves it to a local file
`String`	`getContentTypePattern()`
`String`	`getExePath()`
`String`	`getHeadersPrefix()`
`int[]`	`getNotFoundStatusCodes()`
`String[]`	`getOptions()`
`String`	`getReferencePattern()`
`int`	`getRenderWaitTime()`
`int`	`getResourceTimeout()` Gets the milliseconds timeout after which any resource requested will stop trying and proceed with other parts of the page.
`Dimension`	`getScreenshotDimensions()`
`String`	`getScreenshotDir()` Deprecated. Since 2.8.0, use `getScreenshotStorageDiskDir()`
`String`	`getScreenshotImageFormat()` Gets the screenshot image format (jpg, png, gif, bmp, etc.).
`Dimension`	`getScreenshotScaleDimensions()` Gets the pixel dimensions we want the stored screenshot to have.
`PhantomJSDocumentFetcher.Quality`	`getScreenshotScaleQuality()` Gets the screenshot scaling quality to use when when storage is "disk" or "inline".
`PhantomJSDocumentFetcher.Storage[]`	`getScreenshotStorage()` Gets the screenshot storage mechanisms.
`String`	`getScreenshotStorageDiskDir()` Gets the directory where screenshots are saved when storage is "disk".
`String`	`getScreenshotStorageDiskField()` Gets the target document metadata field where to store the path to thescreen shot image file when storage is "disk".
`PhantomJSDocumentFetcher.StorageDiskStructure`	`getScreenshotStorageDiskStructure()` Gets the screenshot directory structure to create when storage is "disk".
`String`	`getScreenshotStorageInlineField()` Gets the target document metadata field where to store the inline (Base64) screenshot image when storage is "inline".
`float`	`getScreenshotZoomFactor()`
`String`	`getScriptPath()`
`int[]`	`getValidExitCodes()`
`int[]`	`getValidStatusCodes()`
`int`	`hashCode()`
`boolean`	`isDetectCharset()`
`boolean`	`isDetectContentType()`
`boolean`	`isScreenshotEnabled()` Gets whether to enable taking screenshot of crawled web pages.
`boolean`	`isScreenshotScaleStretch()` Gets whether the screenshot should be stretch to to fill all the scale dimensions.
`void`	`loadFromXML(Reader in)`
`void`	`saveToXML(Writer out)`
`void`	`setContentTypePattern(String contentTypePattern)`
`void`	`setDetectCharset(boolean detectCharset)`
`void`	`setDetectContentType(boolean detectContentType)`
`void`	`setExePath(String exePath)`
`void`	`setHeadersPrefix(String headersPrefix)`
`void`	`setNotFoundStatusCodes(int... notFoundStatusCodes)`
`void`	`setOptions(String... options)`
`void`	`setReferencePattern(String referencePattern)`
`void`	`setRenderWaitTime(int renderWaitTime)`
`void`	`setResourceTimeout(int resourceTimeout)` Sets the milliseconds timeout after which any resource requested will stop trying and proceed with other parts of the page.
`void`	`setScreenshotDimensions(Dimension screenshotDimensions)`
`void`	`setScreenshotDimensions(int width, int height)`
`void`	`setScreenshotDir(String screenshotDir)` Deprecated. Since 2.8.0, use `setScreenshotStorageDiskDir(String)`
`void`	`setScreenshotEnabled(boolean screenshotEnabled)` Sets whether to enable taking screenshot of crawled web pages.
`void`	`setScreenshotImageFormat(String screenshotImageFormat)` Sets the screenshot image format (jpg, png, gif, bmp, etc.).
`void`	`setScreenshotScaleDimensions(Dimension screenshotScaleDimensions)` Sets the pixel dimensions we want the stored screenshot to have.
`void`	`setScreenshotScaleDimensions(int width, int height)` Sets the pixel dimensions we want the stored screenshot to have.
`void`	`setScreenshotScaleQuality(PhantomJSDocumentFetcher.Quality screenshotScaleQuality)` Sets the screenshot scaling quality to use when when storage is "disk" or "inline".
`void`	`setScreenshotScaleStretch(boolean screenshotScaleStretch)` Sets whether the screenshot should be stretch to to fill all the scale dimensions.
`void`	`setScreenshotStorage(PhantomJSDocumentFetcher.Storage... screenshotStorage)` Sets the screenshot storage mechanisms.
`void`	`setScreenshotStorageDiskDir(String screenshotStorageDiskDir)` Sets the directory where screenshots are saved when storage is "disk".
`void`	`setScreenshotStorageDiskField(String screenshotStorageDiskField)` Sets the target document metadata field where to store the path to thescreen shot image file when storage is "disk".
`void`	`setScreenshotStorageDiskStructure(PhantomJSDocumentFetcher.StorageDiskStructure screenshotStorageDiskStructure)` Sets the screenshot directory structure to create when storage is "disk".
`void`	`setScreenshotStorageInlineField(String screenshotStorageInlineField)` Sets the target document metadata field where to store the inline (Base64) screenshot image when storage is "inline".
`void`	`setScreenshotZoomFactor(float screenshotZoomFactor)`
`void`	`setScriptPath(String scriptPath)`
`void`	`setValidExitCodes(int... validExitCodes)`
`void`	`setValidStatusCodes(int... validStatusCodes)`
`String`	`toString()`

Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait

- Field Detail
  - DEFAULT_SCRIPT_PATH
```
public static final String DEFAULT_SCRIPT_PATH
```
    See Also:
    
    Constant Field Values
  - DEFAULT_RENDER_WAIT_TIME
```
public static final int DEFAULT_RENDER_WAIT_TIME
```
    See Also:
    
    Constant Field Values
  - DEFAULT_SCREENSHOT_ZOOM_FACTOR
```
public static final float DEFAULT_SCREENSHOT_ZOOM_FACTOR
```
    See Also:
    
    Constant Field Values
  - DEFAULT_CONTENT_TYPE_PATTERN
```
public static final String DEFAULT_CONTENT_TYPE_PATTERN
```
    See Also:
    
    Constant Field Values
  - DEFAULT_SCREENSHOT_STORAGE_DISK_DIR
```
public static final String DEFAULT_SCREENSHOT_STORAGE_DISK_DIR
```
    See Also:
    
    Constant Field Values
  - DEFAULT_SCREENSHOT_STORAGE
```
public static final PhantomJSDocumentFetcher.Storage DEFAULT_SCREENSHOT_STORAGE
```
  - DEFAULT_SCREENSHOT_IMAGE_FORMAT
```
public static final String DEFAULT_SCREENSHOT_IMAGE_FORMAT
```
    See Also:
    
    Constant Field Values
  - DEFAULT_SCREENSHOT_SCALE_SIZE
```
public static final Dimension DEFAULT_SCREENSHOT_SCALE_SIZE
```
  - COLLECTOR_PHANTOMJS_SCREENSHOT_PATH
```
public static final String COLLECTOR_PHANTOMJS_SCREENSHOT_PATH
```
    See Also:
    
    Constant Field Values
  - COLLECTOR_PHANTOMJS_SCREENSHOT_INLINE
```
public static final String COLLECTOR_PHANTOMJS_SCREENSHOT_INLINE
```
    See Also:
    
    Constant Field Values
- Constructor Detail
  - PhantomJSDocumentFetcher
```
public PhantomJSDocumentFetcher()
```
  - PhantomJSDocumentFetcher
```
public PhantomJSDocumentFetcher(int[] validStatusCodes)
```
- Method Detail
  - getExePath
```
public String getExePath()
```
  - setExePath
```
public void setExePath(String exePath)
```
  - getScriptPath
```
public String getScriptPath()
```
  - setScriptPath
```
public void setScriptPath(String scriptPath)
```
  - getRenderWaitTime
```
public int getRenderWaitTime()
```
  - setRenderWaitTime
```
public void setRenderWaitTime(int renderWaitTime)
```
  - getOptions
```
public String[] getOptions()
```
  - setOptions
```
public void setOptions(String... options)
```
  - getScreenshotDir
```
@Deprecated
public String getScreenshotDir()
```
    Deprecated. Since 2.8.0, use getScreenshotStorageDiskDir()
    
    Gets the screenshot directory when storage is "disk".
    
    Returns:
    
    screenshot directory
  - setScreenshotDir
```
@Deprecated
public void setScreenshotDir(String screenshotDir)
```
    Deprecated. Since 2.8.0, use setScreenshotStorageDiskDir(String)
    
    Gets the screenshot directory when storage is "disk".
    
    Parameters:
    
    screenshotDir - screenshot directory
  - getScreenshotStorageDiskDir
```
public String getScreenshotStorageDiskDir()
```
    Gets the directory where screenshots are saved when storage is "disk". Default is "./screenshots".
    
    Returns:
    
    directory
    
    Since:
    
    2.8.0
  - setScreenshotStorageDiskDir
```
public void setScreenshotStorageDiskDir(String screenshotStorageDiskDir)
```
    Sets the directory where screenshots are saved when storage is "disk". Use this method to overwrite the default ("./screenshots").
    
    Parameters:
    
    screenshotStorageDiskDir - directory
    
    Since:
    
    2.8.0
  - getScreenshotStorageDiskField
```
public String getScreenshotStorageDiskField()
```
    Gets the target document metadata field where to store the path to thescreen shot image file when storage is "disk". Default is "collector.phantomjs-screenshot-path".
    
    Returns:
    
    field name
    
    Since:
    
    2.8.0
  - setScreenshotStorageDiskField
```
public void setScreenshotStorageDiskField(String screenshotStorageDiskField)
```
    Sets the target document metadata field where to store the path to thescreen shot image file when storage is "disk". Use this method to overwrite the default ("collector.phantomjs-screenshot-path").
    
    Parameters:
    
    screenshotStorageDiskField - field name
    
    Since:
    
    2.8.0
  - getScreenshotStorageInlineField
```
public String getScreenshotStorageInlineField()
```
    Gets the target document metadata field where to store the inline (Base64) screenshot image when storage is "inline". Default is "collector.phantomjs-screenshot-inline".
    
    Returns:
    
    field name
    
    Since:
    
    2.8.0
  - setScreenshotStorageInlineField
```
public void setScreenshotStorageInlineField(String screenshotStorageInlineField)
```
    Sets the target document metadata field where to store the inline (Base64) screenshot image when storage is "inline". Use this method to overwrite the default ("collector.phantomjs-screenshot-inline").
    
    Parameters:
    
    screenshotStorageInlineField - field name
    
    Since:
    
    2.8.0
  - isScreenshotEnabled
```
public boolean isScreenshotEnabled()
```
    Gets whether to enable taking screenshot of crawled web pages.
    
    Returns:
    
    true if enabled
    
    Since:
    
    2.8.0
  - setScreenshotEnabled
```
public void setScreenshotEnabled(boolean screenshotEnabled)
```
    Sets whether to enable taking screenshot of crawled web pages.
    
    Parameters:
    
    screenshotEnabled - true if enabled
    
    Since:
    
    2.8.0
  - getScreenshotDimensions
```
public Dimension getScreenshotDimensions()
```
  - setScreenshotDimensions
```
public void setScreenshotDimensions(int width,
                                    int height)
```
  - setScreenshotDimensions
```
public void setScreenshotDimensions(Dimension screenshotDimensions)
```
  - getScreenshotZoomFactor
```
public float getScreenshotZoomFactor()
```
  - setScreenshotZoomFactor
```
public void setScreenshotZoomFactor(float screenshotZoomFactor)
```
  - getValidExitCodes
```
public int[] getValidExitCodes()
```
  - setValidExitCodes
```
public final void setValidExitCodes(int... validExitCodes)
```
  - getValidStatusCodes
```
public int[] getValidStatusCodes()
```
  - setValidStatusCodes
```
public final void setValidStatusCodes(int... validStatusCodes)
```
  - getNotFoundStatusCodes
```
public int[] getNotFoundStatusCodes()
```
  - setNotFoundStatusCodes
```
public final void setNotFoundStatusCodes(int... notFoundStatusCodes)
```
  - getHeadersPrefix
```
public String getHeadersPrefix()
```
  - setHeadersPrefix
```
public void setHeadersPrefix(String headersPrefix)
```
  - isDetectContentType
```
public boolean isDetectContentType()
```
  - setDetectContentType
```
public void setDetectContentType(boolean detectContentType)
```
  - isDetectCharset
```
public boolean isDetectCharset()
```
  - setDetectCharset
```
public void setDetectCharset(boolean detectCharset)
```
  - getContentTypePattern
```
public String getContentTypePattern()
```
  - setContentTypePattern
```
public void setContentTypePattern(String contentTypePattern)
```
  - getReferencePattern
```
public String getReferencePattern()
```
  - setReferencePattern
```
public void setReferencePattern(String referencePattern)
```
  - getResourceTimeout
```
public int getResourceTimeout()
```
    Gets the milliseconds timeout after which any resource requested will stop trying and proceed with other parts of the page.
    
    Returns:
    
    the timeout value, or -1 if undefined
    
    Since:
    
    2.8.0
  - setResourceTimeout
```
public void setResourceTimeout(int resourceTimeout)
```
    Sets the milliseconds timeout after which any resource requested will stop trying and proceed with other parts of the page.
    
    Parameters:
    
    resourceTimeout - the timeout value, or -1 for undefined
    
    Since:
    
    2.8.0
  - getScreenshotScaleDimensions
```
public Dimension getScreenshotScaleDimensions()
```
    Gets the pixel dimensions we want the stored screenshot to have.
    
    Returns:
    
    dimension
    
    Since:
    
    2.8.0
  - setScreenshotScaleDimensions
```
public void setScreenshotScaleDimensions(Dimension screenshotScaleDimensions)
```
    Sets the pixel dimensions we want the stored screenshot to have.
    
    Parameters:
    
    screenshotScaleDimensions - dimension
    
    Since:
    
    2.8.0
  - setScreenshotScaleDimensions
```
public void setScreenshotScaleDimensions(int width,
                                         int height)
```
    Sets the pixel dimensions we want the stored screenshot to have.
    
    Parameters:
    
    width - image width
    
    height - image height
    
    Since:
    
    2.8.0
  - isScreenshotScaleStretch
```
public boolean isScreenshotScaleStretch()
```
    Gets whether the screenshot should be stretch to to fill all the scale dimensions. Default keeps aspect ratio.
    
    Returns:
    
    true to stretch
    
    Since:
    
    2.8.0
  - setScreenshotScaleStretch
```
public void setScreenshotScaleStretch(boolean screenshotScaleStretch)
```
    Sets whether the screenshot should be stretch to to fill all the scale dimensions. Default keeps aspect ratio.
    
    Parameters:
    
    screenshotScaleStretch - true to stretch
    
    Since:
    
    2.8.0
  - getScreenshotImageFormat
```
public String getScreenshotImageFormat()
```
    Gets the screenshot image format (jpg, png, gif, bmp, etc.).
    
    Returns:
    
    image format
    
    Since:
    
    2.8.0
  - setScreenshotImageFormat
```
public void setScreenshotImageFormat(String screenshotImageFormat)
```
    Sets the screenshot image format (jpg, png, gif, bmp, etc.).
    
    Parameters:
    
    screenshotImageFormat - image format
    
    Since:
    
    2.8.0
  - getScreenshotStorage
```
public PhantomJSDocumentFetcher.Storage[] getScreenshotStorage()
```
    Gets the screenshot storage mechanisms.
    
    Returns:
    
    storage mechanisms
    
    Since:
    
    2.8.0
  - setScreenshotStorage
```
public void setScreenshotStorage(PhantomJSDocumentFetcher.Storage... screenshotStorage)
```
    Sets the screenshot storage mechanisms.
    
    Parameters:
    
    screenshotStorage - storage mechanisms
    
    Since:
    
    2.8.0
  - getScreenshotStorageDiskStructure
```
public PhantomJSDocumentFetcher.StorageDiskStructure getScreenshotStorageDiskStructure()
```
    Gets the screenshot directory structure to create when storage is "disk".
    
    Returns:
    
    directory structure
    
    Since:
    
    2.8.0
  - setScreenshotStorageDiskStructure
```
public void setScreenshotStorageDiskStructure(PhantomJSDocumentFetcher.StorageDiskStructure screenshotStorageDiskStructure)
```
    Sets the screenshot directory structure to create when storage is "disk".
    
    Parameters:
    
    screenshotStorageDiskStructure - directory structure
    
    Since:
    
    2.8.0
  - getScreenshotScaleQuality
```
public PhantomJSDocumentFetcher.Quality getScreenshotScaleQuality()
```
    Gets the screenshot scaling quality to use when when storage is "disk" or "inline". Default is PhantomJSDocumentFetcher.Quality.AUTO
    
    Returns:
    
    quality
    
    Since:
    
    2.8.0
  - setScreenshotScaleQuality
```
public void setScreenshotScaleQuality(PhantomJSDocumentFetcher.Quality screenshotScaleQuality)
```
    Sets the screenshot scaling quality to use when when storage is "disk" or "inline".
    
    Parameters:
    
    screenshotScaleQuality - quality
    
    Since:
    
    2.8.0
  - fetchDocument
```
public HttpFetchResponse fetchDocument(org.apache.http.client.HttpClient httpClient,
                                       HttpDocument doc)
```
    Description copied from interface: IHttpDocumentFetcher
    
    Fetches HTTP document and saves it to a local file
    
    Specified by:
    
    fetchDocument in interface IHttpDocumentFetcher
    
    Parameters:
    
    httpClient - the HTTP client
    
    doc - the document to fetch and save
    
    Returns:
    
    fetch response
  - loadFromXML
```
public void loadFromXML(Reader in)
```
    Specified by:
    
    loadFromXML in interface IXMLConfigurable
  - saveToXML
```
public void saveToXML(Writer out)
               throws IOException
```
    Specified by:
    
    saveToXML in interface IXMLConfigurable
    
    Throws:
    
    IOException
  - equals
```
public boolean equals(Object other)
```
    Overrides:
    
    equals in class Object
  - hashCode
```
public int hashCode()
```
    Overrides:
    
    hashCode in class Object
  - toString
```
public String toString()
```
    Overrides:
    
    toString in class Object

Class PhantomJSDocumentFetcher

Considerations

Handling of non-HTML Pages

Avoid double-downloads

How to maintain HTTP sessions

Taking screenshots of pages

PhantomJS exit values

XML configuration usage:

Usage example:

Nested Class Summary

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Field Detail

DEFAULT_SCRIPT_PATH

DEFAULT_RENDER_WAIT_TIME

DEFAULT_SCREENSHOT_ZOOM_FACTOR

DEFAULT_CONTENT_TYPE_PATTERN

DEFAULT_SCREENSHOT_STORAGE_DISK_DIR

DEFAULT_SCREENSHOT_STORAGE

DEFAULT_SCREENSHOT_IMAGE_FORMAT

DEFAULT_SCREENSHOT_SCALE_SIZE

COLLECTOR_PHANTOMJS_SCREENSHOT_PATH

COLLECTOR_PHANTOMJS_SCREENSHOT_INLINE

Constructor Detail

PhantomJSDocumentFetcher

PhantomJSDocumentFetcher

Method Detail

getExePath

setExePath

getScriptPath

setScriptPath

getRenderWaitTime

setRenderWaitTime

getOptions

setOptions

getScreenshotDir

setScreenshotDir

getScreenshotStorageDiskDir

setScreenshotStorageDiskDir

getScreenshotStorageDiskField

setScreenshotStorageDiskField

getScreenshotStorageInlineField

setScreenshotStorageInlineField

isScreenshotEnabled

setScreenshotEnabled

getScreenshotDimensions

setScreenshotDimensions

setScreenshotDimensions

getScreenshotZoomFactor

setScreenshotZoomFactor

getValidExitCodes

setValidExitCodes

getValidStatusCodes

setValidStatusCodes

getNotFoundStatusCodes

setNotFoundStatusCodes

getHeadersPrefix

setHeadersPrefix

isDetectContentType

setDetectContentType

isDetectCharset

setDetectCharset

getContentTypePattern

setContentTypePattern

getReferencePattern

setReferencePattern

getResourceTimeout

setResourceTimeout

getScreenshotScaleDimensions

setScreenshotScaleDimensions

setScreenshotScaleDimensions

isScreenshotScaleStretch

setScreenshotScaleStretch

getScreenshotImageFormat

setScreenshotImageFormat

getScreenshotStorage

setScreenshotStorage

getScreenshotStorageDiskStructure