XPath (XML Path Language) is a query language that lets you navigate and extract elements, attributes, and values from an XML document. In Java, you can use XPath through the standard library with javax.xml.xpath
. This guide walks you through using XPath to extract complex data from XML, with a detailed example, diagram, and explanation.
Use Case Example
Suppose you have the following XML structure representing a library of books:
<library>
<book id="101" genre="fiction">
<title>Effective Java</title>
<author>
<firstName>Joshua</firstName>
<lastName>Bloch</lastName>
</author>
<published year="2018" publisher="Addison-Wesley"/>
</book>
<book id="102" genre="programming">
<title>Clean Code</title>
<author>
<firstName>Robert</firstName>
<lastName>Martin</lastName>
</author>
<published year="2008" publisher="Prentice Hall"/>
</book>
</library>
Goal
We want to use XPath in Java to extract:
- Titles of all books.
- Full names of all authors.
- The publisher of the book with
id='101'
. - Books published after 2010.
- Genre attribute of books whose author’s last name is “Martin”.
Java Setup
Required Imports
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import javax.xml.xpath.*;
import org.w3c.dom.Document;
import org.w3c.dom.NodeList;
import org.w3c.dom.Node;
Full Java Example
public class XPathExample {
public static void main(String[] args) throws Exception {
// Load XML
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document doc = builder.parse("library.xml");
// Initialize XPath
XPathFactory xpathFactory = XPathFactory.newInstance();
XPath xpath = xpathFactory.newXPath();
// 1. Titles of all books
NodeList titles = (NodeList) xpath.evaluate("//book/title", doc, XPathConstants.NODESET);
System.out.println("Book Titles:");
for (int i = 0; i < titles.getLength(); i++) {
System.out.println(" - " + titles.item(i).getTextContent());
}
// 2. Full names of all authors
NodeList authors = (NodeList) xpath.evaluate("//book/author", doc, XPathConstants.NODESET);
System.out.println("\nAuthors:");
for (int i = 0; i < authors.getLength(); i++) {
Node author = authors.item(i);
String firstName = xpath.evaluate("firstName", author);
String lastName = xpath.evaluate("lastName", author);
System.out.println(" - " + firstName + " " + lastName);
}
// 3. Publisher of book with id='101'
String publisher = xpath.evaluate("//book[@id='101']/published/@publisher", doc);
System.out.println("\nPublisher of book ID 101: " + publisher);
// 4. Books published after 2010
NodeList recentBooks = (NodeList) xpath.evaluate("//book[published/@year > 2010]", doc, XPathConstants.NODESET);
System.out.println("\nBooks published after 2010:");
for (int i = 0; i < recentBooks.getLength(); i++) {
String title = xpath.evaluate("title", recentBooks.item(i));
System.out.println(" - " + title);
}
// 5. Genre of books where author's last name is Martin
NodeList genres = (NodeList) xpath.evaluate("//book[author/lastName='Martin']/@genre", doc, XPathConstants.NODESET);
System.out.println("\nGenres of books by Martin:");
for (int i = 0; i < genres.getLength(); i++) {
System.out.println(" - " + genres.item(i).getNodeValue());
}
}
}
Explanation of XPath Expressions
Expression | Description |
---|---|
//book/title | Selects all <title> elements inside <book> tags |
//book/author | Selects all <author> nodes under any <book> |
//book[@id='101']/published/@publisher | Gets the publisher attribute from <published> of the book with ID 101 |
//book[published/@year > 2010] | Filters books with published year > 2010 |
//book[author/lastName='Martin']/@genre | Gets genre attribute where author’s last name is Martin |
Diagram
Below is a diagram of the XML structure with annotations showing the XPath targets:
<library>
│
├── <book id="101" genre="fiction">
│ ├── <title>Effective Java</title> ←── //book/title
│ ├── <author> ←── //book/author
│ │ ├── <firstName>Joshua</firstName>
│ │ └── <lastName>Bloch</lastName> ←── //book[author/lastName='...']
│ └── <published year="2018" publisher="Addison-Wesley"/> ←── //book[@id='101']/published/@publisher
│
├── <book id="102" genre="programming">
├── <title>Clean Code</title>
├── <author>
│ ├── <firstName>Robert</firstName>
│ └── <lastName>Martin</lastName>
└── <published year="2008" publisher="Prentice Hall"/>
Common Pitfalls
- Namespaces: If your XML uses namespaces, you’ll need to register a
NamespaceContext
with XPath. - Type Casting: Always cast
xpath.evaluate(...)
correctly (NODESET
,STRING
, etc.). - File Path: Ensure
library.xml
is in the correct location or use an InputStream.
Conclusion
XPath is a powerful way to navigate and extract data from XML in Java. With careful expression crafting, even deeply nested or attribute-specific elements can be accessed efficiently.