In XPath, the preceding-sibling and following-sibling axes can be utilized to select sibling elements, providing a powerful means to navigate through the hierarchical structure of an XML or HTML document. This technique is invaluable for web scraping and data mining tasks, where precise control over element selection is crucial. By understanding how to effectively use these axes, you can extract data more accurately and efficiently. For those looking to refine their web scraping capabilities further, incorporating a web scraping API into your toolkit can significantly enhance your ability to gather and process data from the web. These APIs offer advanced features that complement XPath selections, making your data extraction process more robust and adaptable to various web structures.
preceding-sibling::span
is used to select any siblings that are positioned above the current element:
<!– we can find tax rate by navigating from price–>
<div>
<article>
<h1>title</h1>
<p>paragraph</p>
<span>(no tax)</span>
<h3>2.99</h3>
</article>
</div>
following-sibling::span
is used to select siblings that are positioned below the current element:
<!– we can find currency by navigating from price–>
<div>
<article>
<h1>title</h1>
<p>paragraph</p>
<p>paragraph with <a href=”/foo.html”>link</a></p>
<span>(no tax)</span>
<h3>2.99</h3>
<span>USD</span>
</article>
</div>
It’s worth noting that the wildcard character can replace explicit element names (e.g. following-sibling::*
) to select siblings of any element name.
For a comprehensive understanding of XPath, consider reading our detailed introductory article