Logo New Black

Mastering XPath Selectors in NodeJS: Comprehensive Guide on How to Use Them

CSS selectors are predominantly used in the NodeJS and Javascript ecosystems. However, for web scraping, the more robust features of XPath selectors may be required.
Several options are available for XPath selectors. The most popular one in web scraping is the osmosis library:

const osmosis = require("osmosis");

const html = `
<a href="http://scrapenetwork.com/">link 1</a>
<a href="http://scrapenetwork.com/">link 2</a>
`
osmosis
    .parse(html)
    .find('//a/@href')
    .log(console.log);

Another viable option is the xmldom library:

import xpath from 'xpath';
import { DOMParser } from '@xmldom/xmldom'

const tree = new DOMParser().parseFromString(`

    <h1>Page title</h1>
<p>some paragraph</p>
<a href="http://scrapenetwork.com/">some link</a>

`);

console.log({
    // we can extract text of the node, which returns `Text` object:
    title: xpath.select('//h1/text()', tree)[0].data,
    // or a specific attribute value, which return `Attr` object:
    url: xpath.select('//a/@href', tree)[0].value,
});