ScrapeNetwork

Step-by-Step Guide: How to Download File with Puppeteer & NodeJS

Table of Contents

Table of Contents

In the world of automation and web scraping, Puppeteer stands out as a powerful tool for developers. Whether you’re automating routine tasks or collecting data for analysis, knowing how to handle file downloads is crucial. Puppeteer, combined with NodeJS, offers flexible solutions for this, catering to various needs. For those embarking on data collection or automation projects, leveraging a robust web scraping API can enhance your capabilities, ensuring efficient and reliable data retrieval. This guide will explore the two primary methods for downloading files with Puppeteer: directly through the browser’s fetch feature, capturing the file in a JavaScript variable, or by simulating user interaction to click a download button, thus saving the file in the browser’s designated download directory. Each method has its advantages, and choosing the right approach depends on your specific project requirements and the nature of the file you wish to download.

// initialize puppeteer
const puppeteer = require('puppeteer');
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();

// navigate to URL
await page.goto("https://httpbin.dev/");

// download file to a JavaScript variable:
const csvFile = await page.evaluate(() =>
{
// locate the URL:
    const url = document.querySelector('.download-button').getAttribute('href');
    // download using JavaScript fetch:
    return fetch(url, {
        method: 'GET',
        credentials: 'include'
    }).then(r => r.text());
});

Alternatively, the download button can be clicked using the page.click() command:

// initialize puppeteer
const puppeteer = require('puppeteer');
const browser = await puppeteer.launch({ headless: false });
const page = await browser.newPage();

// set default download directory:
const path = require('path');
await page._client.send('Page.setDownloadBehavior', {
    behavior: 'allow',
    downloadPath: path.resolve('./downloads'), 
});

// navigate to URL
await page.goto("https://httpbin.dev/");
// click on download link
await page.click('.download-button');

Related Questions

Related Blogs

Puppeteer
Web scraping with Puppeteer often involves dealing with pages that necessitate scrolling to the bottom to load additional content, a common feature of infinite-scrolling pages....
Puppeteer
Using Puppeteer for web scraping often involves navigating modal popups, such as Javascript alerts that conceal content and display messages upon page load. For developers...
Puppeteer
Puppeteer stealth is a widely used extension for the Puppeteer browser automation framework. This plugin modifies Puppeteer’s runtime to reduce the likelihood of detection by...