LTS Follow

Extracting Complex XML Elements and Attributes Using XPath in Java

May 19, 2025 4 min read

Extracting Complex XML Elements and Attributes Using XPath in Java

XPath (XML Path Language) is a query language that lets you navigate and extract elements, attributes, and values from an XML document. In Java, you can use XPath through the standard library with javax.xml.xpath. This guide walks you through using XPath to extract complex data from XML, with a detailed example, diagram, and explanation.

Use Case Example

Suppose you have the following XML structure representing a library of books:

<library>
    <book id="101" genre="fiction">
        <title>Effective Java</title>
        <author>
            <firstName>Joshua</firstName>
            <lastName>Bloch</lastName>
        </author>
        <published year="2018" publisher="Addison-Wesley"/>
    </book>
    <book id="102" genre="programming">
        <title>Clean Code</title>
        <author>
            <firstName>Robert</firstName>
            <lastName>Martin</lastName>
        </author>
        <published year="2008" publisher="Prentice Hall"/>
    </book>
</library>

Goal

We want to use XPath in Java to extract:

Titles of all books.
Full names of all authors.
The publisher of the book with id='101'.
Books published after 2010.
Genre attribute of books whose author’s last name is “Martin”.

Java Setup

Required Imports

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.*;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;

Full Java Example

public class XPathExample {

    public static void main(String[] args) throws Exception {
        // Load XML
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = factory.newDocumentBuilder();
        Document doc = builder.parse(“library.xml”);

        // Initialize XPath
        XPathFactory xpathFactory = XPathFactory.newInstance();
        XPath xpath = xpathFactory.newXPath();

        // 1. Titles of all books
        NodeList titles = (NodeList) xpath.evaluate(“//book/title”, doc, XPathConstants.NODESET);
        System.out.println(“Book Titles:”);
        for (int i = 0; i < titles.getLength(); i++) {
            System.out.println(” – ” + titles.item(i).getTextContent());
        }

        // 2. Full names of all authors
        NodeList authors = (NodeList) xpath.evaluate(“//book/author”, doc, XPathConstants.NODESET);
        System.out.println(“\nAuthors:”);
        for (int i = 0; i < authors.getLength(); i++) {
            Node author = authors.item(i);
            String firstName = xpath.evaluate(“firstName”, author);
            String lastName = xpath.evaluate(“lastName”, author);
            System.out.println(” – ” + firstName + ” ” + lastName);
        }

        // 3. Publisher of book with id=’101′
        String publisher = xpath.evaluate(“//book[@id=’101′]/published/@publisher”, doc);
        System.out.println(“\nPublisher of book ID 101: ” + publisher);

        // 4. Books published after 2010
        NodeList recentBooks = (NodeList) xpath.evaluate(“//book[published/@year > 2010]”, doc, XPathConstants.NODESET);
        System.out.println(“\nBooks published after 2010:”);
        for (int i = 0; i < recentBooks.getLength(); i++) {
            String title = xpath.evaluate(“title”, recentBooks.item(i));
            System.out.println(” – ” + title);
        }

        // 5. Genre of books where author’s last name is Martin
        NodeList genres = (NodeList) xpath.evaluate(“//book[author/lastName=’Martin’]/@genre”, doc, XPathConstants.NODESET);
        System.out.println(“\nGenres of books by Martin:”);
        for (int i = 0; i < genres.getLength(); i++) {
            System.out.println(” – ” + genres.item(i).getNodeValue());
        }
    }
}

public class XPathExample {

    public static void main(String[] args) throws Exception {
        // Load XML
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = factory.newDocumentBuilder();
        Document doc = builder.parse("library.xml");

        // Initialize XPath
        XPathFactory xpathFactory = XPathFactory.newInstance();
        XPath xpath = xpathFactory.newXPath();

        // 1. Titles of all books
        NodeList titles = (NodeList) xpath.evaluate("//book/title", doc, XPathConstants.NODESET);
        System.out.println("Book Titles:");
        for (int i = 0; i < titles.getLength(); i++) {
            System.out.println(" - " + titles.item(i).getTextContent());
        }

        // 2. Full names of all authors
        NodeList authors = (NodeList) xpath.evaluate("//book/author", doc, XPathConstants.NODESET);
        System.out.println("\nAuthors:");
        for (int i = 0; i < authors.getLength(); i++) {
            Node author = authors.item(i);
            String firstName = xpath.evaluate("firstName", author);
            String lastName = xpath.evaluate("lastName", author);
            System.out.println(" - " + firstName + " " + lastName);
        }

        // 3. Publisher of book with id='101'
        String publisher = xpath.evaluate("//book[@id='101']/published/@publisher", doc);
        System.out.println("\nPublisher of book ID 101: " + publisher);

        // 4. Books published after 2010
        NodeList recentBooks = (NodeList) xpath.evaluate("//book[published/@year > 2010]", doc, XPathConstants.NODESET);
        System.out.println("\nBooks published after 2010:");
        for (int i = 0; i < recentBooks.getLength(); i++) {
            String title = xpath.evaluate("title", recentBooks.item(i));
            System.out.println(" - " + title);
        }

        // 5. Genre of books where author's last name is Martin
        NodeList genres = (NodeList) xpath.evaluate("//book[author/lastName='Martin']/@genre", doc, XPathConstants.NODESET);
        System.out.println("\nGenres of books by Martin:");
        for (int i = 0; i < genres.getLength(); i++) {
            System.out.println(" - " + genres.item(i).getNodeValue());
        }
    }
}

Explanation of XPath Expressions

Expression	Description
`//book/title`	Selects all `<title>` elements inside `<book>` tags
`//book/author`	Selects all `<author>` nodes under any `<book>`
`//book[@id='101']/published/@publisher`	Gets the `publisher` attribute from `<published>` of the book with ID 101
`//book[published/@year > 2010]`	Filters books with published year > 2010
`//book[author/lastName='Martin']/@genre`	Gets `genre` attribute where author’s last name is Martin

Diagram

Below is a diagram of the XML structure with annotations showing the XPath targets:

<library>
│
├── <book id="101" genre="fiction">
│   ├── <title>Effective Java</title>       ←── //book/title
│   ├── <author>                            ←── //book/author
│   │   ├── <firstName>Joshua</firstName>
│   │   └── <lastName>Bloch</lastName>      ←── //book[author/lastName='...']
│   └── <published year="2018" publisher="Addison-Wesley"/> ←── //book[@id='101']/published/@publisher
│
├── <book id="102" genre="programming">
    ├── <title>Clean Code</title>
    ├── <author>
    │   ├── <firstName>Robert</firstName>
    │   └── <lastName>Martin</lastName>
    └── <published year="2008" publisher="Prentice Hall"/>

Common Pitfalls

Namespaces: If your XML uses namespaces, you’ll need to register a NamespaceContext with XPath.
Type Casting: Always cast xpath.evaluate(...) correctly (NODESET, STRING, etc.).
File Path: Ensure library.xml is in the correct location or use an InputStream.

Conclusion

XPath is a powerful way to navigate and extract data from XML in Java. With careful expression crafting, even deeply nested or attribute-specific elements can be accessed efficiently.

LTS Follow

Java

« A Complete Guide to Spring Data Annotations (with Examples)

Java 17 Switch Statement: Pattern Matching and Expression Support »

Java 17 Switch Statement: Pattern Matching and Expression Support

LTS
May 20, 2025 2 min read

A Complete Guide to Spring Data Annotations (with Examples)

LTS
May 7, 2025 2 min read

Using Swagger in a Spring Boot Application (with Full…

LTS
May 7, 2025 2 min read

Leave a Reply Cancel reply