Logo New Black

Comprehensive Guide: HTML Table to XLSX using Python BeautifulSoup

Python, in conjunction with BeautifulSoup4 and xlsxwriter, plus an HTTP client-like requests, can be employed to convert an HTML table into an Excel spreadsheet. This process becomes significantly more streamlined and efficient when utilizing a web scraping API. These APIs are designed to simplify data extraction, allowing developers to focus on parsing and manipulating data rather than dealing with the complexities of web scraping.

$ pip install bs4 xlsxwriter requests

Following this, we can scrape the webpage, locate table data using bs4, and write it to a .xlsx file using `xlsxwriter`:

from bs4 import BeautifulSoup
import requests 
import xlsxwriter

# 1. Retrieve HTML and create BeautifulSoup object
response = requests.get("https://www.w3schools.com/html/html_tables.asp")
soup = BeautifulSoup(response.text)
# 2. Find the table and extract headers and rows:
table = soup.find('table', {"id": "customers"})
header = []
rows = []
for i, row in enumerate(table.find_all('tr')):
    if i == 0:
        header = [el.text.strip() for el in row.find_all('th')]
    else:
        rows.append([el.text.strip() for el in row.find_all('td')])
# 3. save to it a XLSX file:
workbook = xlsxwriter.Workbook('output.xlsx')
worksheet = workbook.add_worksheet()
worksheet.write_row(0, 0, header)
for i, row in enumerate(rows):
    worksheet.write_row(i+1, 0, row)
workbook.close()

BeautifulSoup is an extremely potent HTML parser that provides us with complete control over parsing HTML tables. Unlike many automated scripts, it can be directed to extract HTML table values from any table structure.