Logo New Black

Mastering Puppeteer: Comprehensive Guide on How to Scroll to the Bottom

Web scraping with Puppeteer often involves dealing with pages that necessitate scrolling to the bottom to load additional content, a common feature of infinite-scrolling pages. To effectively manage this task, integrating a reliable web scraping API can significantly enhance the efficiency and accuracy of your data collection efforts, providing advanced features to handle dynamic content and infinite scrolling seamlessly.

For scrolling in our Puppeteer browser, we can utilize a custom javascript function window.scrollTo(x, y), which scrolls the page to the designated coordinates. This method, combined with a powerful scraping API, ensures that you can navigate and extract data from complex websites with ease.

If we need to scroll to the absolute bottom of the page, a while loop can be employed to keep scrolling until the bottom is reached. Let’s examine an example by scraping web-scraping.dev/testimonials:

const puppeteer = require('puppeteer');

async function scrapeTestimonials() {
    const browser = await puppeteer.launch({headless: false});
    const page = await browser.newPage();

    await page.goto('https://web-scraping.dev/testimonials/');

    let prevHeight = -1;
    let maxScrolls = 100;
    let scrollCount = 0;

    while (scrollCount < maxScrolls) {
        // Scroll to the bottom of the page
        await page.evaluate('window.scrollTo(0, document.body.scrollHeight)');
        // Wait for page load
        await page.waitForTimeout(1000);
        // Calculate new scroll height and compare
        let newHeight = await page.evaluate('document.body.scrollHeight');
        if (newHeight == prevHeight) {
            break;
        }
        prevHeight = newHeight;
        scrollCount += 1;
    }

    // Collect all loaded data
    let elements = await page.$$('.testimonial');
    let results = [];
    for(let element of elements) {
        let text = await element.$eval('.text', node => node.innerHTML);
        results.push(text);
    }

    console.log(`Scraped: ${results.length} results!`);

    await browser.close();
}

scrapeTestimonials();

In the above example, we’re scraping an endless paging example from the web-scraping.dev website. We initiate a while loop and continue scrolling to the bottom until the browser’s vertical size ceases to change. Once the bottom is reached, we can commence parsing the content.