Class DOMUtil


  • public final class DOMUtil
    extends Object
    Utility methods related to JSoup/DOM manipulation.
    Since:
    2.6.0
    • Method Detail

      • toJSoupParser

        public static org.jsoup.parser.Parser toJSoupParser​(String parser)
        Gets the JSoup parser associated with the string representation. The string "xml" (case insensitive) will return the XML parser. Anything else will return the HTML parser.
        Parameters:
        parser - "html" or "xml"
        Returns:
        JSoup parser
        Since:
        2.8.0
      • getElementValue

        public static String getElementValue​(org.jsoup.nodes.Element element,
                                             String extract)

        Gets an element value based on JSoup DOM. You control what gets extracted exactly thanks to the "extract" argument. Possible values are:

        • text: Default option when extract is blank. The text of the element, including combined children.
        • html: Extracts an element inner HTML (including children).
        • outerHtml: Extracts an element outer HTML (like "html", but includes the "current" tag).
        • ownText: Extracts the text owned by this element only; does not get the combined text of all children.
        • data: Extracts the combined data of a data-element (e.g. <script>).
        • id: Extracts the ID attribute of the element (if any).
        • tagName: Extract the name of the tag of the element.
        • val: Extracts the value of a form element (input, textarea, etc).
        • className: Extracts the literal value of the element's "class" attribute, which may include multiple class names, space separated.
        • cssSelector: Extracts a CSS selector that will uniquely select (identify) this element.
        • attr(attributeKey): Extracts the value of the element attribute matching your replacement for "attributeKey" (e.g. "attr(title)" will extract the "title" attribute).

        Typically, when specified as an attribute, implementors can use the following:

        
        extract="[text|html|outerHtml|ownText|data|tagName|val|className|cssSelector|attr(attributeKey)]"
        Parameters:
        element - the element to extract value on
        extract - the type of extraction to perform
        Returns:
        the element value
        See Also:
        Element