Logo New Black

Mastering Playwright: Comprehensive Guide on How to Save and Load Cookies

In the evolving landscape of web development and data extraction, the significance of efficient web scraping cannot be overstated. Leveraging the capabilities of a robust web scraping API, like Playwright, can streamline the process of gathering data from various websites. This comprehensive guide dives into the nuances of using Playwright to save and load cookies, a crucial aspect of maintaining session information across your scraping endeavors. By mastering these techniques, you’ll be able to pause and resume your scraping sessions seamlessly, ensuring data consistency and reducing the likelihood of being blocked by web servers.

import json
from pathlib import Path

from playwright.sync_api import sync_playwright

with sync_playwright() as pw:
    browser = pw.chromium.launch(headless=False)

    # To save cookies to a file, we first extract them from the browser context:
    context = browser.new_context(viewport={"width": 1920, "height": 1080})
    page = context.new_page()
    page.goto("https://httpbin.dev/cookies/set/mycookie/myvalue")
    cookies = context.cookies()
    Path("cookies.json").write_text(json.dumps(cookies))

    # Next, we can restore cookies from the file:
    context = browser.new_context(viewport={"width": 1920, "height": 1080})
    context.add_cookies(json.loads(Path("cookies.json").read_text()))
    page = context.new_page()
    page.goto("https://httpbin.dev/cookies")
    print(context.cookies())  # this allows us to test whether the cookies were set correctly
    # the output will be:
    [
        {
            "sameSite": "Lax",
            "name": "mycookie",
            "value": "myvalue",
            "domain": "httpbin.dev",
            "path": "/",
            "expires": -1,
            "httpOnly": False,
            "secure": False,
        }
    ]