Class URLNormalizer

    • Constructor Detail

      • URLNormalizer

        public URLNormalizer​(URL url)
        Create a new URLNormalizer instance.
        Parameters:
        url - the url to normalize
      • URLNormalizer

        public URLNormalizer​(String url)

        Create a new URLNormalizer instance.

        Since 1.8.0, spaces in URLs are no longer converted to + automatically. Use encodeNonURICharacters() or encodeSpaces().

        Parameters:
        url - the url to normalize
    • Method Detail

      • lowerCase

        public URLNormalizer lowerCase()

        Converts the entire URL to lower case, including scheme, host name, path, query string parameter names and values. Consider using less aggressive variations of lower case methods to only focus on specific parts of a URL.

        HTTP://www.Example.com/Path/Query?Param1=AAA&Param2=BBB → http://www.example.com/path/query?param1=aaa&param2=bbb

        Returns:
        this instance
        Since:
        1.15.1
      • lowerCaseSchemeHost

        public URLNormalizer lowerCaseSchemeHost()

        Converts the scheme and host to lower case.

        HTTP://www.Example.com/ → http://www.example.com/

        Returns:
        this instance
      • lowerCasePath

        public URLNormalizer lowerCasePath()

        Converts the URL path to lower case.

        http://www.example.com/AAA/BBB → http://www.example.com/aaa/bbb

        Returns:
        this instance
        Since:
        1.15.1
      • lowerCaseQuery

        public URLNormalizer lowerCaseQuery()

        Converts the URL query string to lower case, which includes both the parameter names and values.

        http://www.example.com/query?Param1=AAA&Param2=BBB → http://www.example.com/query?param1=aaa&param2=bbb

        Returns:
        this instance
        Since:
        1.15.1
      • lowerCaseQueryParameterNames

        public URLNormalizer lowerCaseQueryParameterNames()

        Converts the URL query parameter names to lower case, leaving query parameter values intact.

        http://www.example.com/query?Param1=AAA&Param2=BBB → http://www.example.com/query?param1=AAA&param2=BBB

        Returns:
        this instance
        Since:
        1.15.1
      • lowerCaseQueryParameterValues

        public URLNormalizer lowerCaseQueryParameterValues()

        Converts the URL query parameter values to lower case, leaving query parameter names intact.

        http://www.example.com/query?Param1=AAA&Param2=BBB → http://www.example.com/query?Param1=aaa&Param2=bbb

        Returns:
        this instance
        Since:
        1.15.1
      • upperCaseEscapeSequence

        public URLNormalizer upperCaseEscapeSequence()
        Converts letters in URL-encoded escape sequences to upper case.

        http://www.example.com/a%c2%b1b → http://www.example.com/a%C2%B1b

        Returns:
        this instance
      • decodeUnreservedCharacters

        public URLNormalizer decodeUnreservedCharacters()
        Decodes percent-encoded unreserved characters.

        http://www.example.com/%7Eusername/ → http://www.example.com/~username/

        Returns:
        this instance
      • encodeNonURICharacters

        public URLNormalizer encodeNonURICharacters()

        Encodes all characters that are not supported characters in a URI (not to confuse with URL), as defined by the RFC 3986 standard. This includes all non-ASCII characters.

        Since this method also encodes spaces to the plus sign (+), there is no need to also invoke encodeSpaces().

        http://www.example.com/^a [b]/ → http://www.example.com/%5Ea+%5Bb%5D/
        Returns:
        this instance
        Since:
        1.8.0
      • encodeSpaces

        public URLNormalizer encodeSpaces()

        Encodes space characters into plus signs (+) if they are part of the query string. Spaces part of the URL path are percent-encoded to %20.

        To encode all non-ASCII characters (including spaces), use encodeNonURICharacters() instead.

        http://www.example.com/a b c → http://www.example.com/a+b+c
        Returns:
        this instance
        Since:
        1.8.0
      • removeDefaultPort

        public URLNormalizer removeDefaultPort()
        Removes the default port (80 for http, and 443 for https).

        http://www.example.com:80/bar.html → http://www.example.com/bar.html

        Returns:
        this instance
      • addDirectoryTrailingSlash

        public URLNormalizer addDirectoryTrailingSlash()

        Adds a trailing slash (/) to a URL ending with a directory. A URL is considered to end with a directory if the last path segment, before fragment (#) or query string (?), does not contain a dot, typically representing an extension.

        Please Note: URLs do not always denote a directory structure and many URLs can qualify to this method without truly representing a directory. Adding a trailing slash to these URLs could potentially break its semantic equivalence.

        http://www.example.com/alice → http://www.example.com/alice/
        Returns:
        this instance
        Since:
        1.11.0 (renamed from "addTrailingSlash")
      • addDomainTrailingSlash

        public URLNormalizer addDomainTrailingSlash()

        Adds a trailing slash (/) right after the domain for URLs with no path, before any fragment (#) or query string (?).

        Please Note: Adding a trailing slash to URLs could potentially break its semantic equivalence.

        http://www.example.com → http://www.example.com/
        Returns:
        this instance
        Since:
        1.12.0
      • addTrailingSlash

        @Deprecated
        public URLNormalizer addTrailingSlash()
        Deprecated.
        Since 1.11.0, use addDirectoryTrailingSlash()

        Adds a trailing slash (/) to a URL ending with a directory. A URL is considered to end with a directory if the last path segment, before fragment (#) or query string (?), does not contain a dot, typically representing an extension.

        Please Note: URLs do not always denote a directory structure and many URLs can qualify to this method without truly representing a directory. Adding a trailing slash to these URLs could potentially break its semantic equivalence.

        http://www.example.com/alice → http://www.example.com/alice/
        Returns:
        this instance
      • removeTrailingSlash

        public URLNormalizer removeTrailingSlash()

        Removes any trailing slash (/) from a URL, before fragment (#) or query string (?).

        Please Note: Removing trailing slashes form URLs could potentially break their semantic equivalence.

        http://www.example.com/alice/ → http://www.example.com/alice
        Returns:
        this instance
        Since:
        1.11.0
      • removeDotSegments

        public URLNormalizer removeDotSegments()

        Removes the unnecessary "." and ".." segments from the URL path.

        As of 2.3.0, the algorithm used to remove the dot segments is the one prescribed by RFC3986.

        http://www.example.com/../a/b/../c/./d.html → http://www.example.com/a/c/d.html

        Please Note: URLs do not always represent a clean hierarchy structure and the dots/double-dots may have a different signification on some sites. Removing them from a URL could potentially break its semantic equivalence.

        Returns:
        this instance
        See Also:
        URI.normalize()
      • removeDirectoryIndex

        public URLNormalizer removeDirectoryIndex()

        Removes directory index files. They are often not needed in URLs.

        http://www.example.com/a/index.html → http://www.example.com/a/

        Index files must be the last URL path segment to be considered. The following are considered index files:

        • index.html
        • index.htm
        • index.shtml
        • index.php
        • default.html
        • default.htm
        • home.html
        • home.htm
        • index.php5
        • index.php4
        • index.php3
        • index.cgi
        • placeholder.html
        • default.asp

        Please Note: There are no guarantees a URL without its index files will be semantically equivalent, or even be valid.

        Returns:
        this instance
      • removeFragment

        public URLNormalizer removeFragment()

        Removes the URL fragment (from the first "#" character encountered to the end of the URL).

        http://www.example.com/abc.html#section1 → http://www.example.com/abc.html http://www.example.com/abc#/def/ghi → http://www.example.com/abc http://www.example.com/abc#def/ghi#klm → http://www.example.com/abc
        Returns:
        this instance
      • removeTrailingFragment

        public URLNormalizer removeTrailingFragment()

        Removes the URL fragment like removeFragment(), but only if it is found after the last URL segment (/...).

        http://www.example.com/abc.html#section1 → http://www.example.com/abc.html http://www.example.com/abc#/def/ghi → http://www.example.com/abc#/def/ghi http://www.example.com/abc#def/ghi#klm → http://www.example.com/abc#def/ghi
        Returns:
        this instance
        Since:
        2.1.0
      • removeQueryString

        public URLNormalizer removeQueryString()

        Removes the URL query string (from the "?" character until the end or the first # character).

        http://www.example.com/query?param1=AAA7&param2=BBB#fragment → http://www.example.com/query#fragment
        Returns:
        this instance
        Since:
        1.15.1
      • replaceIPWithDomainName

        public URLNormalizer replaceIPWithDomainName()

        Replaces IP address with domain name. This is often not reliable due to virtual domain names and can be slow, as it has to access the network.

        http://208.77.188.166/ → http://www.example.com/
        Returns:
        this instance
      • unsecureScheme

        public URLNormalizer unsecureScheme()

        Converts https scheme to http.

        https://www.example.com/ → http://www.example.com/
        Returns:
        this instance
      • secureScheme

        public URLNormalizer secureScheme()

        Converts http scheme to https.

        http://www.example.com/ → https://www.example.com/
        Returns:
        this instance
      • removeDuplicateSlashes

        public URLNormalizer removeDuplicateSlashes()

        Removes duplicate slashes. Two or more adjacent slash ("/") characters will be converted into one.

        http://www.example.com/foo//bar.html → http://www.example.com/foo/bar.html
        Returns:
        this instance
      • removeWWW

        public URLNormalizer removeWWW()

        Removes "www." domain name prefix.

        http://www.example.com/ → http://example.com/
        Returns:
        this instance
      • addWWW

        public URLNormalizer addWWW()

        Adds "www." domain name prefix.

        http://example.com/ → http://www.example.com/
        Returns:
        this instance
      • sortQueryParameters

        public URLNormalizer sortQueryParameters()

        Sorts query parameters.

        http://www.example.com/?z=bb&y=cc&z=aa → http://www.example.com/?y=cc&z=bb&z=aa
        Returns:
        this instance
      • removeEmptyParameters

        public URLNormalizer removeEmptyParameters()

        Removes empty parameters.

        http://www.example.com/display?a=b&a=&c=d&e=&f=g → http://www.example.com/display?a=b&c=d&f=g
        Returns:
        this instance
      • removeTrailingQuestionMark

        public URLNormalizer removeTrailingQuestionMark()

        Removes trailing question mark ("?").

        http://www.example.com/display? → http://www.example.com/display
        Returns:
        this instance
      • removeSessionIds

        public URLNormalizer removeSessionIds()

        Removes a URL-based session id. It removes PHP (PHPSESSID), ASP (ASPSESSIONID), and Java EE (jsessionid) session ids.

        http://www.example.com/servlet;jsessionid=1E6FEC0D14D044541DD84D2D013D29ED?a=b → http://www.example.com/servlet?a=b

        Please Note: Removing session IDs from URLs is often a good way to have the URL return an error once invoked.

        Returns:
        this instance
      • removeTrailingHash

        public URLNormalizer removeTrailingHash()

        Removes trailing hash character ("#").

        http://www.example.com/path# → http://www.example.com/path

        This only removes the hash character if it is the last character. To remove an entire URL fragment, use removeFragment().

        Returns:
        this instance
        Since:
        1.13.0
      • toString

        public String toString()
        Returns the normalized URL as string.
        Overrides:
        toString in class Object
        Returns:
        URL
      • toURI

        public URI toURI()
        Returns the normalized URL as URI.
        Returns:
        URI
      • toURL

        public URL toURL()
        Returns the normalized URL as URL.
        Returns:
        URI