public final class CharsetUtil extends Object
Modifier and Type | Method and Description |
---|---|
static void |
convertCharset(InputStream input,
String inputCharset,
OutputStream output,
String outputCharset)
Converts the character encoding of the supplied input.
|
static String |
convertCharset(String input,
String inputCharset,
String outputCharset)
Converts the character encoding of the supplied input value.
|
static String |
detectCharset(InputStream input)
Detects the character encoding of an input stream.
|
static String |
detectCharset(InputStream input,
String declaredEncoding)
Detects the character encoding of an input stream.
|
static String |
detectCharset(String input)
Detects the character encoding of a string.
|
static String |
detectCharset(String input,
String declaredEncoding)
Detects the character encoding of a string.
|
static String |
detectCharsetIfBlank(String charset,
Doc doc)
Detects a document character encoding if the supplied
charset is blank. |
static String |
detectCharsetIfBlank(String charset,
InputStream is)
Detects a document character encoding if the supplied
charset is blank. |
static String |
detectsCharset(Doc doc)
Detects a document character encoding.
|
static String |
firstNonBlankOrUTF8(ParseState parseState,
String... charsets)
Returns the first non-blank character encoding, or returns UTF-8 if they
are all blank or in post-parse state.
|
static String |
firstNonBlankOrUTF8(String... charsets)
Returns the first non-blank character encoding, or returns UTF-8 if they
are all blank.
|
public static String convertCharset(String input, String inputCharset, String outputCharset) throws IOException
input
- input value to apply conversioninputCharset
- character set of the input valueoutputCharset
- desired character set of the output valueIOException
- problem converting character setpublic static void convertCharset(InputStream input, String inputCharset, OutputStream output, String outputCharset) throws IOException
input
- input stream to apply conversioninputCharset
- character set of the input streamoutput
- where converted stream will be storedoutputCharset
- desired character set of the output streamIOException
- problem converting character setpublic static String detectCharset(String input) throws IOException
input
- the input to detect encoding onnull
if the input is null or blankIOException
- if there is a problem find the character encodingpublic static String detectCharset(String input, String declaredEncoding)
input
- the input to detect encoding ondeclaredEncoding
- declared input encoding, if knownnull
if the input is null or blankpublic static String detectCharset(InputStream input) throws IOException
InputStream.markSupported()
must return true
otherwise no decoding will be attempted.input
- the input to detect encoding onnull
if input is nullIOException
- if there is a problem find the character encodingpublic static String detectCharset(InputStream input, String declaredEncoding) throws IOException
InputStream.markSupported()
must return true
otherwise no decoding will be attempted.input
- the input to detect encoding ondeclaredEncoding
- declared input encoding, if knownnull
if input is nullIOException
- if there is a problem find the character encodingpublic static String detectsCharset(Doc doc) throws IOException
DocInfo.getContentEncoding()
. If not,
it will attempt to detect it from the document input stream.
This method will NOT set the detected encoding on the DocInfo
.
If unable to detect, UTF-8
is assumed.doc
- document to detect encoding onIOException
- problem detecting charsetpublic static String detectCharsetIfBlank(String charset, Doc doc) throws IOException
charset
is blank. When blank, it checks if it is defined
in the document DocInfo.getContentEncoding()
. If not,
it will attempt to detect it from the document input stream.
This method will NOT set the detected encoding on the DocInfo
.
If unable to detect, UTF-8
is assumed.charset
- character encoding to use if not blankdoc
- document to detect encoding onIOException
- problem detecting charsetpublic static String detectCharsetIfBlank(String charset, InputStream is) throws IOException
charset
is blank. When blank,
it will attempt to detect it from the input stream.
If unable to detect, UTF-8
is assumed.charset
- character encoding to use if not blankis
- input streamIOException
- problem detecting charsetpublic static String firstNonBlankOrUTF8(String... charsets)
charsets
- character encodings to testpublic static String firstNonBlankOrUTF8(ParseState parseState, String... charsets)
parseState
- document parsing statecharsets
- character encodings to testCopyright © 2009–2023 Norconex Inc.. All rights reserved.