ScrapeNetwork

Mastering Playwright: Comprehensive Guide on How to Save and Load Cookies

Table of Contents

Table of Contents

In the evolving landscape of web development and data extraction, the significance of efficient web scraping cannot be overstated. Leveraging the capabilities of a robust web scraping API, like Playwright, can streamline the process of gathering data from various websites. This comprehensive guide dives into the nuances of using Playwright to save and load cookies, a crucial aspect of maintaining session information across your scraping endeavors. By mastering these techniques, you’ll be able to pause and resume your scraping sessions seamlessly, ensuring data consistency and reducing the likelihood of being blocked by web servers.

import json
from pathlib import Path

from playwright.sync_api import sync_playwright

with sync_playwright() as pw:
    browser = pw.chromium.launch(headless=False)

    # To save cookies to a file, we first extract them from the browser context:
    context = browser.new_context(viewport={"width": 1920, "height": 1080})
    page = context.new_page()
    page.goto("https://httpbin.dev/cookies/set/mycookie/myvalue")
    cookies = context.cookies()
    Path("cookies.json").write_text(json.dumps(cookies))

    # Next, we can restore cookies from the file:
    context = browser.new_context(viewport={"width": 1920, "height": 1080})
    context.add_cookies(json.loads(Path("cookies.json").read_text()))
    page = context.new_page()
    page.goto("https://httpbin.dev/cookies")
    print(context.cookies())  # this allows us to test whether the cookies were set correctly
    # the output will be:
    [
        {
            "sameSite": "Lax",
            "name": "mycookie",
            "value": "myvalue",
            "domain": "httpbin.dev",
            "path": "/",
            "expires": -1,
            "httpOnly": False,
            "secure": False,
        }
    ]

Related Questions

Related Blogs

Playwright
By utilizing the request interception feature in Playwright, we can significantly enhance the efficiency of web scraping efforts. This optimization can be achieved by blocking...
Playwright
Modal pop-ups, often seen as cookie consent or login requests, are created using custom JavaScript. They typically hide the page content upon loading and display...
Playwright
Utilizing Playwright for web scraping enables us to navigate pages with infinite scrolling, where content dynamically loads as the user scrolls down. To automate this...