Logo New Black

Intro to Python Requests Proxy: Comprehensive Guide for Web Scraping

Python’s requests package not only simplifies HTTP requests but also offers robust support for using proxies, including both HTTP and SOCKS5 types. This feature is essential for web scraping, as it allows developers to route their requests through different servers, effectively managing request rate limits and bypassing geo-restrictions or IP bans. By setting proxies for individual requests or configuring them for the entire script, developers can enhance the anonymity and efficiency of their web scraping operations. To further optimize your web scraping endeavors, considering the integration of the best web scraping API could significantly augment your capabilities. Such APIs are specifically designed to streamline the data extraction process, offering advanced features like automatic proxy rotation and sophisticated parsing capabilities that can handle even the most complex web pages. This combination of Python’s requests package and a high-quality web scraping API provides a powerful toolkit for developers looking to extract valuable data from the web with precision and speed.

import requests

# proxy pattern is:
# scheme://username:password@IP:PORT
# For example:
# no auth HTTP proxy:
my_proxy = "http://160.11.12.13:1020"
# or socks5
my_proxy = "socks://160.11.12.13:1020"
# proxy with authentication
my_proxy = "http://my_username:my_password@160.11.12.13:1020"
# note: that username and password should be url quoted if they contain URL sensitive characters like "@":
from urllib.parse import quote
my_proxy = f"http://{quote('foo@bar.com')}:{quote('password@123')}@160.11.12.13:1020"


proxies = {
    # this proxy will be applied to all http:// urls
    'http': 'http://160.11.12.13:1020',
    # this proxy will be applied to all https:// urls (not the S)
    'https': 'http://160.11.12.13:1020',
    # we can also use proxy only for specific pages
    'https://httpbin.dev': 'http://160.11.12.13:1020',
}
requests.get("https://httpbin.dev/ip", proxies=proxies)

Note that proxy can also be set through the standard *_PROXY environment variables:

$ export HTTP_PROXY="http://160.11.12.13:1020"
$ export HTTPS_PROXY="http://160.11.12.13:1020"
$ export ALL_PROXY="socks://160.11.12.13:1020"
$ python
import requests
# this will use the proxies we set
requests.get("https://httpbin.dev/ip")

Finally, when web scraping using proxies we should rotate proxies for each request. Check out our guide on rotating proxies for more information. For more on proxies, see our introduction to proxies in web scraping.