Logo New Black

Comprehensive Guide: How to Load Local Files in Puppeteer Easily

When testing our Puppeteer web scrapers, we may prefer to use local files instead of public websites. Puppeteer, like any real web browser, can load local files using the file:// URL protocol, making it a versatile tool for developers who need to test their scripts under various conditions without relying on external web resources. This approach is invaluable for unit testing, developing offline, or when precise control over the testing environment is required. Moreover, for those aiming to elevate their web scraping projects, exploring a web scraping API could significantly enhance your toolkit. Such APIs simplify complex scraping tasks, offering a robust solution for efficiently handling CAPTCHAs, managing proxies, and ensuring your scraping activities remain scalable and efficient, further augmenting the capabilities provided by Puppeteer for comprehensive web data extraction strategies.

const puppeteer = require('puppeteer');
const path = require('path');

async function run() {
  // usual browser startup:
    const browser = await puppeteer.launch();
    const page = await browser.newPage();

    // we can use absolute paths like
    await page.goto("file://home/user/projects/test.html");  // linux
    await page.goto("file://C:/Users/projects/test.html");  // windows

    // or we can use relative paths: 
    // below will select test.html that is in the same directory as the script
    await page.goto(`file:${path.join(__dirname, 'test.html')}`);

    console.log(await page.content());
    browser.close();
}
 
run();

By using this approach, we can foster a more collaborative environment, as it allows us to share and test our scripts without the need for live websites. This not only saves time but also promotes a more efficient workflow within the scrape network.