Logo New Black

Fix Python Requests Exception MissingSchema: Comprehensive Guide

The MissingSchema error often occurs when using the Python requests module to scrape URLs that are invalid due to the absence of a protocol indicator (the http:// part). This common mistake can cause significant disruption in web scraping projects, making it crucial to ensure that all URLs are correctly formatted. To streamline your web scraping tasks and minimize errors like MissingSchema, integrating a reliable web scraping API into your workflow can be a game-changer. Such APIs are meticulously designed to handle the nuances of web scraping, providing a smooth and efficient way to extract data from the web. By leveraging these tools, developers can sidestep the common pitfalls associated with manual scraping efforts and focus on deriving valuable insights from their data.

This typically happens when we mistakenly provide the scraper with relative URLs instead of absolute URLs:

import requests

requests.get("/product/25")  # default redirect limit is 30
# will raise:
# MissingSchema: Invalid URL '/product/10': No scheme supplied. Perhaps you meant http:///product/10?

When web scraping, it’s advisable to always ensure the scraped URLs are absolute by using the urljoin() function:

from urllib.parse import urljoin
import requests

response = requests.get("http://example.com")
urls = [  # lets assume we got this batch of product urls:

for relative_url in urls:
    absolute_url = urljoin(response.url, relative_url)
    # this will result in: http://example.com/product/1
    item_response = requests.get(absolute_url)