Web scraping with Selenium often results in unnecessary bandwidth consumption due to image loading. Unless capturing screenshots, data scrapers typically don’t require the visuals such as images. This can not only slow down your scraping process but also lead to increased costs, especially when dealing with large volumes of data. To optimize performance and efficiency, it’s crucial to implement strategies that block image loading. By adjusting Selenium’s settings or integrating a web crawler API, you can significantly reduce the amount of data your operations consume, speed up the scraping process, and maintain high efficiency without compromising the quality of the collected data. This approach is especially beneficial for those looking to streamline their web scraping projects while minimizing overhead.
There are two options to block images in Selenium: either add the imagesEnabled=false
flag or set the profile.managed_default_content_settings.images
value to 2
:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
options = Options()
options.headless = True
chrome_options = webdriver.ChromeOptions()
# this will disable image loading
chrome_options.add_argument('--blink-settings=imagesEnabled=false')
# or alternatively we can set direct preference:
chrome_options.add_experimental_option(
"prefs", {"profile.managed_default_content_settings.images": 2}
)
driver = webdriver.Chrome(options=options, chrome_options=chrome_options)
driver.get("https://www.twitch.tv/directory/game/Art")
driver.quit()
Alternatively, to avoid unnecessary bandwidth consumption, consider using web scraping APIs, such as those offered by Scrape Network.