java.lang.Object
- com.norconex.importer.util.CharsetUtil

```
public final class CharsetUtil
extends Object
```
Character set utility methods.

Since:

2.5.0

Author:

Pascal Essiembre

Method Summary

All Methods Static Methods Concrete Methods
Modifier and Type	Method	Description
`static void`	`convertCharset(InputStream input, String inputCharset, OutputStream output, String outputCharset)`	Converts the character encoding of the supplied input.
`static String`	`convertCharset(String input, String inputCharset, String outputCharset)`	Converts the character encoding of the supplied input value.
`static String`	`detectCharset(InputStream input)`	Detects the character encoding of an input stream.
`static String`	`detectCharset(InputStream input, String declaredEncoding)`	Detects the character encoding of an input stream.
`static String`	`detectCharset(String input)`	Detects the character encoding of a string.
`static String`	`detectCharset(String input, String declaredEncoding)`	Detects the character encoding of a string.
`static String`	`detectCharsetIfBlank(String charset, Doc doc)`	Detects a document character encoding if the supplied `charset` is blank.
`static String`	`detectCharsetIfBlank(String charset, InputStream is)`	Detects a document character encoding if the supplied `charset` is blank.
`static String`	`detectsCharset(Doc doc)`	Detects a document character encoding.
`static String`	`firstNonBlankOrUTF8(ParseState parseState, String... charsets)`	Returns the first non-blank character encoding, or returns UTF-8 if they are all blank or in post-parse state.
`static String`	`firstNonBlankOrUTF8(String... charsets)`	Returns the first non-blank character encoding, or returns UTF-8 if they are all blank.

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Method Detail
  - convertCharset
```
public static String convertCharset(String input,
                                    String inputCharset,
                                    String outputCharset)
                             throws IOException
```
    Converts the character encoding of the supplied input value.
    
    Parameters:
    
    input - input value to apply conversion
    
    inputCharset - character set of the input value
    
    outputCharset - desired character set of the output value
    
    Returns:
    
    the converted value
    
    Throws:
    
    IOException - problem converting character set
  - convertCharset
```
public static void convertCharset(InputStream input,
                                  String inputCharset,
                                  OutputStream output,
                                  String outputCharset)
                           throws IOException
```
    Converts the character encoding of the supplied input.
    
    Parameters:
    
    input - input stream to apply conversion
    
    inputCharset - character set of the input stream
    
    output - where converted stream will be stored
    
    outputCharset - desired character set of the output stream
    
    Throws:
    
    IOException - problem converting character set
  - detectCharset
```
public static String detectCharset(String input)
                            throws IOException
```
    Detects the character encoding of a string.
    
    Parameters:
    
    input - the input to detect encoding on
    
    Returns:
    
    the character encoding official name or null if the input is null or blank
    
    Throws:
    
    IOException - if there is a problem find the character encoding
  - detectCharset
```
public static String detectCharset(String input,
                                   String declaredEncoding)
```
    Detects the character encoding of a string. If the string has a declared character encoding, specifying it will influence the detection result.
    
    Parameters:
    
    input - the input to detect encoding on
    
    declaredEncoding - declared input encoding, if known
    
    Returns:
    
    the character encoding official name or null if the input is null or blank
  - detectCharset
```
public static String detectCharset(InputStream input)
                            throws IOException
```
    Detects the character encoding of an input stream. InputStream.markSupported() must return true otherwise no decoding will be attempted.
    
    Parameters:
    
    input - the input to detect encoding on
    
    Returns:
    
    the character encoding official name or null if input is null
    
    Throws:
    
    IOException - if there is a problem find the character encoding
  - detectCharset
```
public static String detectCharset(InputStream input,
                                   String declaredEncoding)
                            throws IOException
```
    Detects the character encoding of an input stream. If the string has a declared character encoding, specifying it will influence the detection result. InputStream.markSupported() must return true otherwise no decoding will be attempted.
    
    Parameters:
    
    input - the input to detect encoding on
    
    declaredEncoding - declared input encoding, if known
    
    Returns:
    
    the character encoding official name or null if input is null
    
    Throws:
    
    IOException - if there is a problem find the character encoding
  - detectsCharset
```
public static String detectsCharset(Doc doc)
                             throws IOException
```
    Detects a document character encoding. It first checks if it is defined in the document DocInfo.getContentEncoding(). If not, it will attempt to detect it from the document input stream. This method will NOT set the detected encoding on the DocInfo. If unable to detect, UTF-8 is assumed.
    
    Parameters:
    
    doc - document to detect encoding on
    
    Returns:
    
    string representation of character encoding
    
    Throws:
    
    IOException - problem detecting charset
    
    Since:
    
    3.0.0
  - detectCharsetIfBlank
```
public static String detectCharsetIfBlank(String charset,
                                          Doc doc)
                                   throws IOException
```
    Detects a document character encoding if the supplied charset is blank. When blank, it checks if it is defined in the document DocInfo.getContentEncoding(). If not, it will attempt to detect it from the document input stream. This method will NOT set the detected encoding on the DocInfo. If unable to detect, UTF-8 is assumed.
    
    Parameters:
    
    charset - character encoding to use if not blank
    
    doc - document to detect encoding on
    
    Returns:
    
    supplied charset if not blank, or the detected charset
    
    Throws:
    
    IOException - problem detecting charset
    
    Since:
    
    3.0.0
  - detectCharsetIfBlank
```
public static String detectCharsetIfBlank(String charset,
                                          InputStream is)
                                   throws IOException
```
    Detects a document character encoding if the supplied charset is blank. When blank, it will attempt to detect it from the input stream. If unable to detect, UTF-8 is assumed.
    
    Parameters:
    
    charset - character encoding to use if not blank
    
    is - input stream
    
    Returns:
    
    supplied charset if not blank, or the detected charset
    
    Throws:
    
    IOException - problem detecting charset
    
    Since:
    
    3.0.0
  - firstNonBlankOrUTF8
```
public static String firstNonBlankOrUTF8(String... charsets)
```
    Returns the first non-blank character encoding, or returns UTF-8 if they are all blank.
    
    Parameters:
    
    charsets - character encodings to test
    
    Returns:
    
    first non-blank, or UTF-8
    
    Since:
    
    3.0.0
  - firstNonBlankOrUTF8
```
public static String firstNonBlankOrUTF8(ParseState parseState,
                                         String... charsets)
```
    Returns the first non-blank character encoding, or returns UTF-8 if they are all blank or in post-parse state. That is, UTF-8 is always returned if parsing has already occurred (since parsing converts content encoding to UTF-8).
    
    Parameters:
    
    parseState - document parsing state
    
    charsets - character encodings to test
    
    Returns:
    
    first non-blank, or UTF-8
    
    Since:
    
    3.0.0

Class CharsetUtil

Method Summary

Methods inherited from class java.lang.Object

Method Detail

convertCharset

convertCharset

detectCharset

detectCharset

detectCharset

detectCharset

detectsCharset

detectCharsetIfBlank

detectCharsetIfBlank

firstNonBlankOrUTF8

firstNonBlankOrUTF8