public class FeaturedImageProcessor extends Object implements IHttpDocumentProcessor, IXMLConfigurable
Document processor that extract the "main" image from HTML pages. Since HTML is expected, this class should only be used at pre-import processor. It is possible for this processor to not find any image.
By default this class will get the first image (<img>) matching the minimum size. You can specify you want the largest of all matching ones instead. In addition, if you know your images to be defined in a special way (e.g., all share the same CSS class), then you can use the "domSelector" to limit to one or a few images. See JSoup selector-syntax for how to build the "domSelector".
One or more storage method can be specified. Here are the possible storage options:
collector.featured-image-url
field.
When only this option is set, scaling options and image format
have no effect.
collector.featured-image-inline
field.
The string is ready to be
used inline, in a <img src="..."> tag.
collector.featured-image-path
field.
<processor
class="com.norconex.collector.http.processor.impl.FeaturedImageProcessor">
<pageContentTypePattern>
(Optional regex to overwrite default matching of HTML pages)
</pageContentTypePattern>
<domSelector>
(Optional CSS-like path matching one or more image elements)
</domSelector>
<minDimensions>
(Minimum pixel size for an image to be considered.
Default is 400x400).
</minDimensions>
<largest>[false|true]</largest>
<imageCacheSize>
(Maximum number of images to cache for faster processing.
Set to 0 to disable caching.)
</imageCacheSize>
<imageCacheDir>(Directory where to create the image cache)</imageCacheDir>
<storage>
[url|inline|disk]
(One or more, comma-separated. Default is "url".)
</storage>
<!-- Only applicable for "inline" and "disk" storage: -->
<scaleDimensions>
(Target pixel size the featured image should be scaled to.
Default is 150x150.)
</scaleDimensions>
<scaleStretch>
[false|true]
(Whether to stretch to match scale size. Default keeps aspect ratio.)
</scaleStretch>
<scaleQuality>
[auto|low|medium|high|max]
(Default is "auto", which tries the best balance between quality
and speed based on image size. The lower the quality the faster
it is to scale images.)
</scaleQuality>
<imageFormat>
(Target format of stored image. E.g., "jpg", "png", "gif", "bmp", ...
Default is "png")
</imageFormat>
<!-- Only applicable for "disk" storage: -->
<storageDiskDir
structure="[url2path|date|datetime]">
(Path to directory where to store images on disk.)
</storageDiskDir>
<storageDiskField>
(Overwrite default field where to store the image path.
Default is {@value #COLLECTOR_FEATURED_IMAGE_PATH}.)
</storageDiskField>
<!-- Only applicable for "inline" storage: -->
<storageInlineField>
(Overwrite default field where to store the inline image.
Default is {@value #COLLECTOR_FEATURED_IMAGE_INLINE}.)
</storageInlineField>
<!-- Only applicable for "url" storage: -->
<storageUrlField>
(Overwrite default field where to store the image URL.
Default is {@value #COLLECTOR_FEATURED_IMAGE_URL}.)
</storageUrlField>
</processor>
When specifying an image size, the format is [width]x[height]
or a single value. When a single value is used, that value represents both
the width and height (i.e., a square).
<preImportProcessors>
<processor
class="FeaturedImageProcessor">
<minDimensions>300x400</minDimensions>
<scaleDimensions>50</scaleDimensions>
<imageFormat>jpg</imageFormat>
<scaleQuality>max</scaleQuality>
<storage>inline</storage>
</processor>
</preImportProcessors>
The above example extracts the first image being 300x400 or larger, scaling it down to be 50x50 and storing it as an inline JPEG in a document field, preserving aspect ratio and using the best quality possible.
Modifier and Type | Class and Description |
---|---|
static class |
FeaturedImageProcessor.Quality |
static class |
FeaturedImageProcessor.Storage |
static class |
FeaturedImageProcessor.StorageDiskStructure |
Modifier and Type | Field and Description |
---|---|
static String |
COLLECTOR_FEATURED_IMAGE_INLINE |
static String |
COLLECTOR_FEATURED_IMAGE_PATH |
static String |
COLLECTOR_FEATURED_IMAGE_URL |
static String |
DEFAULT_IMAGE_CACHE_DIR |
static int |
DEFAULT_IMAGE_CACHE_SIZE |
static String |
DEFAULT_IMAGE_FORMAT |
static Dimension |
DEFAULT_MIN_SIZE |
static String |
DEFAULT_PAGE_CONTENT_TYPE_PATTERN |
static Dimension |
DEFAULT_SCALE_SIZE |
static FeaturedImageProcessor.Storage |
DEFAULT_STORAGE |
static String |
DEFAULT_STORAGE_DISK_DIR |
static FeaturedImageProcessor.StorageDiskStructure |
DEFAULT_STORAGE_DISK_STRUCTURE |
Constructor and Description |
---|
FeaturedImageProcessor() |
public static final String COLLECTOR_FEATURED_IMAGE_URL
public static final String COLLECTOR_FEATURED_IMAGE_PATH
public static final String COLLECTOR_FEATURED_IMAGE_INLINE
public static final String DEFAULT_PAGE_CONTENT_TYPE_PATTERN
public static final int DEFAULT_IMAGE_CACHE_SIZE
public static final String DEFAULT_IMAGE_CACHE_DIR
public static final String DEFAULT_STORAGE_DISK_DIR
public static final String DEFAULT_IMAGE_FORMAT
public static final Dimension DEFAULT_MIN_SIZE
public static final Dimension DEFAULT_SCALE_SIZE
public static final FeaturedImageProcessor.Storage DEFAULT_STORAGE
public static final FeaturedImageProcessor.StorageDiskStructure DEFAULT_STORAGE_DISK_STRUCTURE
public String getPageContentTypePattern()
public void setPageContentTypePattern(String pageContentTypePattern)
public String getDomSelector()
public void setDomSelector(String domSelector)
public Dimension getMinDimensions()
public void setMinDimensions(int width, int height)
public void setMinDimensions(Dimension minDimensions)
public Dimension getScaleDimensions()
public void setScaleDimensions(int width, int height)
public void setScaleDimensions(Dimension scaleDimensions)
public boolean isScaleStretch()
public void setScaleStretch(boolean scaleStretch)
public String getImageFormat()
public void setImageFormat(String imageFormat)
public int getImageCacheSize()
public void setImageCacheSize(int imageCacheSize)
public Path getImageCacheDir()
public void setImageCacheDir(Path imageCacheDir)
public boolean isLargest()
public void setLargest(boolean largest)
public List<FeaturedImageProcessor.Storage> getStorage()
public void setStorage(FeaturedImageProcessor.Storage... storage)
storage
- storage mechanismspublic void setStorage(List<FeaturedImageProcessor.Storage> storage)
storage
- storage mechanismspublic String getStorageDiskDir()
public void setStorageDiskDir(String storageDiskDir)
public FeaturedImageProcessor.StorageDiskStructure getStorageDiskStructure()
public void setStorageDiskStructure(FeaturedImageProcessor.StorageDiskStructure storageDiskStructure)
public String getStorageDiskField()
public void setStorageDiskField(String storageDiskField)
public String getStorageInlineField()
public void setStorageInlineField(String storageInlineField)
public String getStorageUrlField()
public void setStorageUrlField(String storageUrlField)
public FeaturedImageProcessor.Quality getScaleQuality()
public void setScaleQuality(FeaturedImageProcessor.Quality scaleQuality)
public void processDocument(HttpFetchClient fetcher, Doc doc)
IHttpDocumentProcessor
processDocument
in interface IHttpDocumentProcessor
fetcher
- HTTP fetch clientdoc
- the documentpublic void loadFromXML(XML xml)
loadFromXML
in interface IXMLConfigurable
public void saveToXML(XML xml)
saveToXML
in interface IXMLConfigurable
Copyright © 2009–2023 Norconex Inc.. All rights reserved.