When extracting data from dynamic web pages using Selenium, it’s crucial to allow the page to fully load before capturing the page source. The Selenium WebDriverWait function enables us to pause until a specific element, which signals that the web page has completely loaded, appears on the page. For developers and data analysts looking to streamline their web scraping projects, leveraging a web scraping API can significantly enhance the efficiency and reliability of data extraction. By integrating such APIs, you can bypass common scraping challenges, including handling page load delays. Only then do we capture the page source, ensuring that the data collected is accurate and comprehensive.
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
from selenium.webdriver.common.by import By
driver = webdriver.Chrome()
driver.get("https://httpbin.dev/")
_timeout = 10 # ⚠ remember to set a reasonable timeout
WebDriverWait(driver, _timeout).until(
expected_conditions.presence_of_element_located(
# we can wait by any selector type like element id:
(By.ID, "operations-tag-Auth")
# or by class name
# (By.CLASS_NAME, ".price")
# or by xpath
# (By.XPATH, "//h1[@class='price']")
# or by CSS selector
# (By.CSS_SELECTOR, "h1.price")
)
)
print(driver.page_source)