Logo New Black

Step-by-Step Guide: How to Download File with Puppeteer & NodeJS

In the world of automation and web scraping, Puppeteer stands out as a powerful tool for developers. Whether you’re automating routine tasks or collecting data for analysis, knowing how to handle file downloads is crucial. Puppeteer, combined with NodeJS, offers flexible solutions for this, catering to various needs. For those embarking on data collection or automation projects, leveraging a robust web scraping API can enhance your capabilities, ensuring efficient and reliable data retrieval. This guide will explore the two primary methods for downloading files with Puppeteer: directly through the browser’s fetch feature, capturing the file in a JavaScript variable, or by simulating user interaction to click a download button, thus saving the file in the browser’s designated download directory. Each method has its advantages, and choosing the right approach depends on your specific project requirements and the nature of the file you wish to download.

// initialize puppeteer
const puppeteer = require('puppeteer');
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();

// navigate to URL
await page.goto("https://httpbin.dev/");

// download file to a JavaScript variable:
const csvFile = await page.evaluate(() =>
{
// locate the URL:
    const url = document.querySelector('.download-button').getAttribute('href');
    // download using JavaScript fetch:
    return fetch(url, {
        method: 'GET',
        credentials: 'include'
    }).then(r => r.text());
});

Alternatively, the download button can be clicked using the page.click() command:

// initialize puppeteer
const puppeteer = require('puppeteer');
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();

// set default download directory:
const path = require('path');
await page._client.send('Page.setDownloadBehavior', {
    behavior: 'allow',
    downloadPath: path.resolve('./downloads'), 
});

// navigate to URL
await page.goto("https://httpbin.dev/");
// click on download link
await page.click('.download-button');