Skip to content
  • (+91) 9409548155
  • support@appdividend.com
  • Home
  • Pricing
  • Instructor
  • Tutorials
    • Laravel
    • Python
    • React
    • Javascript
    • Angular
  • Become A Tutor
  • About Us
  • Contact Us
Menu
  • Home
  • Pricing
  • Instructor
  • Tutorials
    • Laravel
    • Python
    • React
    • Javascript
    • Angular
  • Become A Tutor
  • About Us
  • Contact Us
  • Home
  • Pricing
  • Instructor
  • Tutorials
    • Laravel
    • Python
    • React
    • Javascript
    • Angular
  • Become A Tutor
  • About Us
  • Contact Us
Python

How to Convert DOCX to PDF in Python

  • 01 Oct, 2024
  • Com 0
How to Convert DOC or DOCX to PDF in Python

Here are three ways to convert DOCX to PDF in Python:

  1. Using docx2pdf
  2. Using unoconv with LibreOffice/OpenOffice
  3. Using python-docx and reportlab

The .pdf and .doc or .docx are the most popular file types for managing files in an application. You can convert Word documents (.doc or .docx) to PDF format using several methods, each with advantages and trade-offs.

For this practical, I will be using a sample.docx file that looks like this:

Sample docx file

Method 1: Using docx2pdf

The most popular package to convert a doc file (.doc or .docx) to a PDF file is to use the “docx2pdf” package. This package is also platform-independent, meaning it works on Windows.

It leverages Microsoft Word via the COM interface for conversion, ensuring high fidelity. On macOS, it uses JXA (JavaScript for Automation), and on Linux, it falls back to LibreOffice.

Flowchart of the docx2pdf process

Flowchart of docx2pdf process

You can install the package using pip:

pip install docx2pdf

Here is the complete code:

from docx2pdf import convert

# Convert a single file
convert("./sample.docx", "./document.pdf")

print("Conversion from DOCX to PDF completed successfully!")

Output

Converted PDF using docx2pdf

Method 2: Using unoconv with LibreOffice/OpenOffice

The unoconv is LibreOffice’s uno binding for the conversions. It supports a wide range of formats and preserves formatting well.

To use this approach, you need to install LibreOffice. Since I am using MacOS, I can install it using the command below:

# Install LibreOffice
brew install --cask libreoffice

# Install unoconv
brew install unoconv

Ensure that you have installed the Homebrew package manager for macOS.

Flowchart of the unoconv process

Flowchart of the unoconv process

Here is the complete code:

import subprocess


def convert_with_unoconv(input_file, output_file):
    subprocess.call(['unoconv', '-f', 'pdf', '-o', output_file, input_file])


# Usage
convert_with_unoconv('./sample.docx', './document.pdf')
print("DOC file is converted into PDF")

Output

Converted PDF using libreoffice

If everything is installed and set up correctly, then you should be able to convert the Docx file into a PDF file.

Method 3: Using python-docx and reportlab

The combination of python-docx and reportlab libraries can do the job by first reading and manipulating DOCX files using “python-docx” and the “reportlab” library to generate PDFs from that file.

Here is the step-by-step implementation:

  1. Read the .docx file using the python-docx library.
  2. Extract the text and images from the docx file.
  3. Generate a PDF using reportlab by adding the extracted content.

Flowchart of python-docx and reportlab process

Flowchart of python-docx and reportlab process

Install both libraries using the command below:

pip install python-docx reportlab

Here is the complete Python  code:

from docx import Document
from reportlab.lib.pagesizes import letter
from reportlab.pdfgen import canvas


def convert_with_python_docx(input_file, output_file):
    # Read the .docx file
    doc = Document(input_file)
    full_text = []
    for para in doc.paragraphs:
        full_text.append(para.text)

    # Generate PDF
    pdf = canvas.Canvas(output_file, pagesize=letter)
    textobject = pdf.beginText(40, 750)
    for line in full_text:
        textobject.textLine(line)
        if textobject.getY() < 50:
            pdf.drawText(textobject)
            pdf.showPage()
            textobject = pdf.beginText(40, 750)
    pdf.drawText(textobject)
    pdf.save()


# Usage
convert_with_python_docx('./sample.docx', './document.pdf')

Output

Output PDF using python-docx and reportlab

Conversion time for comparison of different approaches

The conversion time for each method is mentioned in the following image:

Conversion time table for each method

We used the “time” module to check the processing time for each method, and I created a table based on my observation:

  1. The fastest method is “python-docx and reportlab”
  2. The second fastest is “docx2pdf with LibreOffice”
  3. The slowest method is “unoconv with LibreOffice”

Here is the chart based on the above observation:

Chart displaying time for each method

That’s it!

Post Views: 2,075
Share on:
Krunal Lathiya

With a career spanning over eight years in the field of Computer Science, Krunal’s expertise is rooted in a solid foundation of hands-on experience, complemented by a continuous pursuit of knowledge.

How to Delete Pages from PDF File using Python
How to Implement Routing in Angular 18

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Address: TwinStar, South Block – 1202, 150 Ft Ring Road, Nr. Nana Mauva Circle, Rajkot(360005), Gujarat, India

Call: (+91) 9409548155

Email: support@appdividend.com

Online Platform

  • Pricing
  • Instructors
  • FAQ
  • Refund Policy
  • Support

Links

  • About Us
  • Contact Us
  • Privacy Policy
  • Terms of services

Tutorials

  • Angular
  • React
  • Python
  • Laravel
  • Javascript
Copyright @2024 AppDividend. All Rights Reserved
Appdividend