Logo New Black

Comprehensive Guide: How to Get Page Source in Selenium Easily

Web scraping often involves retrieving the full page source (the complete HTML of the web page) for data parsing using tools like BeautifulSoup. Python and Selenium offer a seamless approach to this, where the driver.page_source attribute becomes a pivotal asset in accessing the complete HTML content of any webpage. This capability is crucial for anyone involved in data extraction, providing a straightforward method to collect and manipulate web data effectively. However, for those embarking on more ambitious or complex scraping projects, turning to a specialized web scraping API can be a game-changer. Such tools are designed to simplify the extraction process, offering enhanced functionality like automated browser behavior, advanced data parsing, and efficient handling of large-scale scraping tasks, thereby empowering developers and analysts to focus on deriving insights and value from the web content they collect.

from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://httpbin.dev/html")
print(driver.page_source)

âš  Be aware that this command might retrieve the page source before the page fully loads if it’s a dynamic JavaScript page. For more information, see how to wait for a page to load in Selenium.