PDFs are widely regarded as the best portable document-sharing format. You can open a PDF on any device, system, or machine and access the information. Whether you are working with reports, charts, graphs, or other visual formats, a PDF report provides an engaging overview of the data.
The main question is, why do you want to export DataFrame to PDF? The main reason is that you can transfer the data safely, securely, and presentably. You cannot display DataFrame raw to your client; you have to present it in a structured table format in a PDF document that your client can see. Here, you can’t modify the data by mistake, which is a good thing.
The easy and efficient way to export Pandas DataFrame to PDF in Python is by using the “reportlab” library.
Here is the step-by-step guide:
Step 1: Installing the “reportlab” and “pandas” library
The easiest way to install the “reportlab” library is to use the “pip”:
pip install reportlab
Since we are dealing with DataFrame, we also need to install the Pandas library if not installed already:
pip install pandas
Step 2: Import various modules
Import letter, colors, and SimpleDocTemplate, Table, and TableStyle modules from reportlab to generate a proper table from the dataframe.
import pandas as pd from reportlab.lib.pagesizes import letter from reportlab.lib import colors from reportlab.platypus import SimpleDocTemplate, Table, TableStyle
Step 3: Creating a DataFrame
You can create a DataFrame using pd.DataFrame() function.
# Defining a data data = { 'Name': ['John', 'Anna', 'Peter', 'Linda'], 'Age': [28, 24, 35, 32], 'City': ['Rajkot', 'Ahmedabad', 'Surat', 'Jamnanagar'], } # Creating a DataFrame df = pd.DataFrame(data)
Step 4: Converting DataFrame to a list of lists
We need to convert a DataFrame to a list of lists because this format is suitable for the reportlab library that expects the data in this way to create a table in PDF.
# Converting DataFrame to list of lists data_list = [df.columns.values.tolist()] + df.values.tolist()
Step 5: Creating a document object
Let’s define a final PDF path where our document will be saved. And create a document object using the SimpleDocTemplate() function.
# Creating a PDF document pdf_path = './pdfs/export_df.pdf' pdf = SimpleDocTemplate(pdf_path, pagesize=letter)
Step 6: Creating the Table
We need to display the data in tabular format in PDF, and that is why we have to create a table of data first.
# Creating a table with the data table = Table(data_list)
Step 7: Adding style
We can style the table using the TableStyle() function that accepts some styling rules like BACKGROUND, TEXTCOLOR, FONTNAME, GRID, BACKGROUND, etc.
# Adding style to the table style = TableStyle([ ('BACKGROUND', (0, 0), (-1, 0), colors.grey), ('TEXTCOLOR', (0, 0), (-1, 0), colors.whitesmoke), ('ALIGN', (0, 0), (-1, -1), 'CENTER'), ('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'), ('BOTTOMPADDING', (0, 0), (-1, 0), 12), ('BACKGROUND', (0, 1), (-1, -1), colors.beige), ('GRID', (0, 0), (-1, -1), 1, colors.black), ]) table.setStyle(style)
Step 8: Finalizing the PDF file
Let’s create a list named elements containing only the table object.
elements = [table]
Then, build a PDF document using the pdf.build() function.
# Finalizing the PDF elements = [table] pdf.build(elements)
Also, print the success message in the terminal so that the user knows what just happened!
# Printing success message print("DataFrame exported to PDF successfully.")
Save the file, and your entire code will look like this:
import pandas as pd from reportlab.lib.pagesizes import letter from reportlab.lib import colors from reportlab.platypus import SimpleDocTemplate, Table, TableStyle # Defining a data data = { 'Name': ['John', 'Anna', 'Peter', 'Linda'], 'Age': [28, 24, 35, 32], 'City': ['Rajkot', 'Ahmedabad', 'Surat', 'Jamnanagar'], } # Creating a DataFrame df = pd.DataFrame(data) # Converting DataFrame to list of lists data_list = [df.columns.values.tolist()] + df.values.tolist() # Creating a PDF document pdf_path = './pdfs/export_df.pdf' pdf = SimpleDocTemplate(pdf_path, pagesize=letter) # Creating a table with the data table = Table(data_list) # Adding style to the table style = TableStyle([ ('BACKGROUND', (0, 0), (-1, 0), colors.grey), ('TEXTCOLOR', (0, 0), (-1, 0), colors.whitesmoke), ('ALIGN', (0, 0), (-1, -1), 'CENTER'), ('FONTNAME', (0, 0), (-1, 0), 'Helvetica-Bold'), ('BOTTOMPADDING', (0, 0), (-1, 0), 12), ('BACKGROUND', (0, 1), (-1, -1), colors.beige), ('GRID', (0, 0), (-1, -1), 1, colors.black), ]) table.setStyle(style) # Creating the PDF elements = [table] pdf.build(elements) # Printing success message print("DataFrame exported to PDF successfully.")
Go to the terminal and execute the file, and you will see a new file called “export_df.pdf” that looks like the below image:
Why is “reportlab” the most helpful library?
- The reportlab provides various customized options that can control the entire styling and layout of the pdf document. It provides pixel-perfect accuracy with multiple pages and
- If your DataFrame is big and complex, it will help you build complex documents and large datasets.
- Unlike other methods, here, you don’t need to convert DataFrame into an HTML page and then convert it into PDF. You directly convert df to pdf without any staging medium.
- It also supports advanced features like “encryption”.
- Even if you are dealing with large datasets with complex layouts, it efficiently manages memory efficiency.
That’s all!