Skip to content
  • (+91) 9409548155
  • support@appdividend.com
  • Home
  • Pricing
  • Instructor
  • Tutorials
    • Laravel
    • Python
    • React
    • Javascript
    • Angular
  • Become A Tutor
  • About Us
  • Contact Us
Menu
  • Home
  • Pricing
  • Instructor
  • Tutorials
    • Laravel
    • Python
    • React
    • Javascript
    • Angular
  • Become A Tutor
  • About Us
  • Contact Us
  • Home
  • Pricing
  • Instructor
  • Tutorials
    • Laravel
    • Python
    • React
    • Javascript
    • Angular
  • Become A Tutor
  • About Us
  • Contact Us
Python

Merging Multiple PDF Files into One in Python

  • 07 Dec, 2024
  • Com 0
Merging Multiple PDF Files into One in Python

Many online tools are available for merging multiple PDFs into one file, but they have privacy concerns. The online method is neither secure nor viable if your PDFs contain secret information. That’s where a basic programming language like Python comes into play.

Python provides multiple third-party libraries that you can use to handle PDF operations. Two such libraries are the following:

  1. PymuPDF
  2. PyPDF2

To merge multiple PDFs into a single (merged) PDF, you can either use pymupdf’s .insert_file() function or “pypdf2’s PdfMerger()“ class.

Preparing PDFs

Before going into action, we have to prepare pdf files for merging. For this practical, I have created one folder called “pdfs” in my current working directory, and inside that folder, I have two pdfs like this:

Multiple PDFs

Here are two files that you can download and use in your practical:

  1. first.pdf
  2. second.pdf

Approach 1: Pymupdf’s .insert_file() function

The Pymupdf library provides various functions to open, merge, write, and close pdf objects.

One big advantage of using pymupdf is that it is a cross-platform library that performs well with large PDF files.

Here is the step-by-step guide:

Step 1: Install pymupdf

Install the pymupdf library using the command below:

pip install pymupdf

Step 2: Import it into your file

# Importing pymupdf library

import pymupdf

Step 3: Prepare a list of input PDF files

# List of input pdf files

files = ['pdfs/first.pdf', 'pdfs/second.pdf']

Step 4: Creating an empty document

We can create an empty PDF document and add pages to it later.

# Creating an empty document object

pdf_doc = pymupdf.open()

Step 5: Appending input files

We already created an empty PDF object, and now it is time to add pages from the input files.

We will iterate the loop through the files list, appending input files to an empty object using the .insert_file() method.

# Appending files to an empty document object

for filename in files:
    pdf_doc.insert_file(filename)

Step 6: Saving the Merged PDF

# Saving merged pdf and close the file object

pdf_doc.save("pdfs/pymupdf_merged.pdf")

pdf_doc.close()

Inside the “pdfs” folder, we are saving the pymupdf_merged.pdf file, which is the concatenation of two input files. Here is the complete code:

# Importing pymupdf library
import pymupdf

# List of input pdf files
files = ['pdfs/first.pdf', 'pdfs/second.pdf']

# Creating an empty document object
pdf_doc = pymupdf.open()

# Appending files to an empty document object
for filename in files:
    pdf_doc.insert_file(filename)

# Saving merged pdf and close the file object
pdf_doc.save("pdfs/pymupdf_merged.pdf")
pdf_doc.close()

Here is the merged pdf:

Merged PDF using pymupdf

Approach 2: PyPDF2’s PdfMerger() class

The PyPDF2 is another PDF manipulation library that provides a concatenation class.

We will create a custom function that accepts the input file list and the output file’s path.

PyPDF2 is straightforward and helpful for basic PDF operations. I recommend this library if you are working with a simple PDF file. However, if your library is complex and requires special attention, I suggest you use “pymupdf.”

Here is the step-by-step guide for this approach:

Step 1: Import PyPDF2

# Import PyPDF2 library

import PyPDF2

Step 2: Defining the custom function

Let’s create a custom function merge_pdfs() that will accept two arguments:
  1. input_files
  2. output file
# Defining custom function that merges multiple pdfs

def merge_pdfs(input_files, output_file):

Step 3: Create an instance of the PdfMerger class

As we already discussed, the PyPDF2 library provides a PdfMerger class, and we must create its instance. 

# Creating an instance of PdfMerger class
  
pdf_merger = PyPDF2.PdfMerger()

Step 4: Appending Input Files

We will use a pdfmerger instance and append all the input files into this one.

# Appending input files
for pdf_file in input_files:
    with open(pdf_file, 'rb') as file:
        pdf_merger.append(file)

Step 5: Writing an Output File

Use the .write() function on the pdf_merger instance to write a final merged pdf file.

# Writing an output merged pdf file
   with open(output_file, 'wb') as file:
       pdf_merger.write(file)

# Closing the merger instance
   pdf_merger.close()

We also closed the pdf_merger instance to free up the computing resources.

Step 6: Calling the custom function

Our final step is to call the custom function with input_files and output files:

# Defining input and output files
input_files = ['pdfs/first.pdf', 'pdfs/second.pdf']
output_file = 'pdfs/merged_file.pdf'

# Calling a custom function
merge_pdfs(input_files, output_file)

Here is the complete code:

# Import PyPDF2 library
import PyPDF2


# Defining custom function that merges multiple pdfs
def merge_pdfs(input_files, output_file):

    # Creating an instance of PdfMerger class
    pdf_merger = PyPDF2.PdfMerger()

    # Appending input files
    for pdf_file in input_files:
        with open(pdf_file, 'rb') as file:
            pdf_merger.append(file)

    # Writing an output merged pdf file
    with open(output_file, 'wb') as file:
        pdf_merger.write(file)

    # Closing the merger instance
    pdf_merger.close()


# Defining input and output files
input_files = ['pdfs/first.pdf', 'pdfs/second.pdf']
output_file = 'pdfs/pypdf2_merged.pdf'

# Calling a custom function
merge_pdfs(input_files, output_file)

If you execute the above code, you will get this merged file:

Merged PDF using PyPDF2 Library

That’s all!

Post Views: 44
Share on:
Krunal Lathiya

With a career spanning over eight years in the field of Computer Science, Krunal’s expertise is rooted in a solid foundation of hands-on experience, complemented by a continuous pursuit of knowledge.

Efficiently Adding Text to Existing PDF in Python
Exporting Pandas DataFrame into a PDF File in Python

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Address: TwinStar, South Block – 1202, 150 Ft Ring Road, Nr. Nana Mauva Circle, Rajkot(360005), Gujarat, India

Call: (+91) 9409548155

Email: support@appdividend.com

Online Platform

  • Pricing
  • Instructors
  • FAQ
  • Refund Policy
  • Support

Links

  • About Us
  • Contact Us
  • Privacy Policy
  • Terms of services

Tutorials

  • Angular
  • React
  • Python
  • Laravel
  • Javascript
Copyright @2024 AppDividend. All Rights Reserved
Appdividend