Logo New Black

Mastering How to Find HTML Elements by Text with Cheerio: A Comprehensive Guide

In the realm of web development, especially when dealing with data extraction and manipulation, the utility of a robust web scraping API cannot be overstated. Cheerio, when used within NodeJS, exemplifies this by offering an incredibly efficient method to target HTML elements based on their text content. This is achieved through the use of the: contains() pseudo selector, allowing developers to pinpoint elements by either partial or exact text values. Whether you’re dealing with the intricacies of HTML document traversal or extraction, integrating the best web scraping API can significantly streamline the process, enhancing both the efficiency and effectiveness of your data handling strategies. This guide aims to demystify the utilization of Cheerio for this purpose, ensuring that developers can leverage this tool to its fullest potential in their projects.

const cheerio = require('cheerio');

const $ = cheerio.load(`
    <a>ignore</a>
<a href="http://example.com">link</a>
<a>ignore</a>
`);
console.log(
    $('a:contains("link")').text()
);
"link"

However, this selector is case sensitive, which could pose a risk when used in web scraping. As a safer alternative, consider filtering values by text:

const cheerio = require('cheerio');

const $ = cheerio.load(`
    <a>ignore</a>
<a href="http://example.com">Link</a>
<a>ignore</a>
`);

console.log(
    $('a').filter(
        (i, element) => { return $(element).text().toLowerCase().includes("link")}
    ).text()
);
"link"