If you are creating software that requires process automation, batch processing, scheduling tasks, billing, or reporting, you need a mechanism that will generate all of these in the form of PDF and that’s where you need to create PDF using Python.
Here are four popular packages that you can use:
- Using reportlab
- Using xhtml2pdf
- Using fpdf (fpdf2 or pyfpdf)
- Using pdfkit
In this tutorial, we will discuss each approach with its pros, cons, complexities, and specific usages. So, you as a developer will understand which method to choose based on your requirements.
In our practical, I will embed an image too and for that, I will be using the following image:
Method 1: Using reportlab
The reportlab is by far the most suitable, efficient, and perfect library in Python that will create a PDF from data including plain text, and images. It can easily embed an image in PDF which is helpful in many scenarios.
Install the reportlab using pip:
pip install reportlab
Here is the complete code:
# importing reportlab from reportlab.lib.pagesizes import letter from reportlab.pdfgen import canvas # Creating a custom function that accepts # filename and image_path as arguments def create_pdf_with_image(filename, image_path): c = canvas.Canvas(filename, pagesize=letter) width, height = letter # Add some text c.setFont("Helvetica", 12) c.drawString(100, height - 100, "PDF with an embedded image:") # Draw the image c.drawImage(image_path, 100, height - 300, width=200, height=150) # Adjust position and size as needed # Save the PDF c.save() # Calling the custom function if __name__ == "__main__": create_pdf_with_image("reportlab_image.pdf", "Avatar.png")
In this code, we imported a letter and canvas from “reportlab”, created a canvas, set font family and size, wrote a text, drawing an image, and saved the data in PDF format.
The output “reportlab_image.pdf” file looks like this:
You can see from the above screenshot that the PDF contains plain text as well as an image.
Complexities
- Time complexity: O(n) runtime with n denoting pages.
- Space complexity: O(n) notation, where n signifies the size and complexity of the content.
Pros
- The reportlab supports complex layouts with texts, vectors, graphics, or images.
- It provides both low-level (Canvas) and high-level (Platypus) APIs to choose from based on your requirements.
- The reportlab library provides various functions that you can use to customize your final PDF document.
Cons
- ReportLab itself doesn’t have direct HTML-to-PDF conversion capabilities, but you can use reportlab in combination with other libraries to convert to PDF.
- If you are looking for simple pdf generation then it might be overkill.
Specific usages
- If you are looking for precise layout control.
- If you are looking for dynamic PDF generation with custom formatting.
Method 2: Using xhtml2pdf
As the name suggests, the xhtml2pdf library is used when you are working with an HTML-based layout. It will directly convert HTML/CSS into a PDF document.
The xhtml2bpdf accommodates HTML5, CSS version 2.1, and select aspects of CSS3. It is a pure Python-based solution.
You can install it using the command below:
pip install xhtml2pdf
Here is how you can use the package:
# Importing xhtml2pdf package from xhtml2pdf import pisa # Creating a custom function # to convert html page to pdf def create_pdf_with_image(html_content, filename): with open(filename, "wb") as pdf_file: pisa_status = pisa.CreatePDF( src=html_content, dest=pdf_file ) if pisa_status.err: print("Error creating PDF") else: print(f"PDF created successfully: {filename}") if __name__ == "__main__": html = """ <html> <head> <style> body { font-family: Verdana, sans-serif; } h1 { color: red; } </style> </head> <body> <h1>PDF with an Embedded Image</h1> <img src="Avatar.png" alt="Embedded Image" width="300"/> </body> </html> """ create_pdf_with_image(html, "xhtml2pdf_image.pdf")
Output
You can see from the above code that we embed an image using the HTML <img> tag within your HTML content and then convert it to PDF.
Complexities
- Time complexity: O(n), where n is the number of pages, and also it depends on the complexity of the HTML document.
- Space complexity: O(n), where n is proportional to the size and complexity of the HTML document.
Pros
- It is a pure Python implementation.
Cons
- Limited CSS3 support.
- It can render poor designs if the layout is very complex.
Specific usages
- It specifically supports HTML documents including embedding images.
- It can quickly convert HTML pages to PDF without loading heavy dependencies.
Method 3: Using fpdf
If you worked with PHP then you might know this library. Developers designed the fpdf library to create simple PDFs with basic text and graphics.
You can install the fpdf library using the command below:
pip install fpdf
You can use various methods like FPDF().add_page(), FPDF().image(), or FPDF.output() with the help of content to generate a pdf like this:
# importing fpdf from fpdf import FPDF # Creating a custom function that will save the content into pdf file def create_pdf_with_image(filename, image_path): pdf = FPDF() pdf.add_page() pdf.set_font("Helvetica", size=12) # Add some text pdf.cell(200, 10, text="PDF with an embedded image:", ln=True, align='C') # Insert image # Adjust position and size as needed pdf.image(image_path, x=10, y=20, w=100) # Save the PDF pdf.output(filename) # Calling the function if __name__ == "__main__": create_pdf_with_image("fpdf_image.pdf", "Avatar.png")
Output
And we got the output as expected from the above screenshot.
Complexities
- Time complexity: The algorithm scales linearly with the number of pages n, hence O(n).
- Space complexity: O(n), where n is proportional to the size.
Pros
- It is a great choice for quick PDF generation tasks.
- It is straightforward to learn with incremental challenges.
Cons
- Not suitable for complex layouts.
- It provides very basic styling and formatting capabilities.
Specific usages
- You can go for the fpdf approach if you prioritize speed above all.
Method 4: Using pdfkit
Another approach is “pdfkit” which is a wrapper around the wkhtmltopdf tool, which converts HTML to PDF using the WebKit rendering engine.
Before installing pdfkit, we first need to install the wkhtmltopdf tool if we have not installed it already! I am using MacOS, so I can install it using the command below:
brew install wkhtmltopdf
Now, we can install the pdfkit library using the command below:
pip install pdfkit
Here is how you can generate using pdfkit:
import pdfkit def create_pdf_from_html(html_content, filename): # Configure path to wkhtmltopdf if not in PATH config = pdfkit.configuration(wkhtmltopdf='/usr/local/bin/wkhtmltopdf') # Update the path as needed pdfkit.from_string(html_content, filename, configuration=config) if __name__ == "__main__": html = """ <html> <head> <style> body { font-family: Times New Roman, serif; } h1 { color: green; } </style> </head> <body> <h1>Hello, PDFKit!</h1> <p>This PDF is generated using PDFKit and wkhtmltopdf.</p> </body> </html> """ create_pdf_from_html(html, "pdfkit_example.pdf")
Output
Complexities
- Time complexity: O(n), where n is the number of pages and it also depends on the processing time of wkhtmltopdf.
- Space complexity: O(n), where n is proportional to the size.
Pros
- It uses using high-quality rendering browser engine to return high-quality PDFs.
- It renders JavaScript-related execution well!
- It handles complex HTML and CSS with ease.
Cons
- It requires external wkhtmltopdf binary installation. So, it is not a pure Python-based solution as it requires external installation.
- You will have less control over PDF-specific features outside of HTML/CSS.
Specific usages
- It converts complex, dynamic web pages into PDFs.
Final words
I highly recommend using the “reportlab” approach because it provides highly customizable solutions including images, graphics, and other various options.
If you are looking for a simple, easy, and lightweight solution, use the “FPDF” library.
If you are looking for HTML document-based solutions, you can use the “pdfkit” approach because it uses full browser-engine capabilities that can support JavaScript.