There are many online tools available for merging multiple PDFs into one file, but they have privacy concerns. If your PDFs contain secret information, then the online method is neither secure nor viable. That’s where a basic programming language like Python comes into play.
Python provides multiple third-party libraries that you can use to handle PDF operations. Two such libraries are the following:
- PymuPDF
- PyPDF2
To merge multiple PDFs into a single (merged) PDF, you can either use pymupdf’s .insert_file() function or “pypdf2’s PdfMerger()“ class.
Preparing PDFs
Before going into action, we have to prepare pdf files for merging. For this practical, I have created one folder called “pdfs” in my current working directory, and inside that folder, I have two pdfs like this:
Here are two files that you can download and use in your practical:
Approach 1: Pymupdf’s .insert_file() function
The Pymupdf library provides various functions to open, merge, write, and close pdf objects.
One of the big advantages of using the pymupdf is that it performs well with large PDF files and it is a cross-platform library.
Here is the step-by-step guide:
Step 1: Install pymupdf
Install the pymupdf library using the command below:
pip install pymupdf
Step 2: Import it into your file
# Importing pymupdf library import pymupdf
Step 3: Prepare a list of input PDF files
# List of input pdf files files = ['pdfs/first.pdf', 'pdfs/second.pdf']
Step 4: Creating an empty document
We can create an empty PDF document, which we will use in the future to add pages to it.
# Creating an empty document object pdf_doc = pymupdf.open()
Step 5: Appending input files
We already created an empty PDF object, and now it is time to add pages from the input files to it.
We will iterate the loop through the files list and append input files to an empty object using the .insert_file() method.
# Appending files to an empty document object for filename in files: pdf_doc.insert_file(filename)
Step 6: Saving the Merged PDF
# Saving merged pdf and close the file object pdf_doc.save("pdfs/pymupdf_merged.pdf") pdf_doc.close()
Inside the “pdfs” folder, we are saving the pymupdf_merged.pdf file, which is the concatenation of two input files. Here is the complete code:
# Importing pymupdf library import pymupdf # List of input pdf files files = ['pdfs/first.pdf', 'pdfs/second.pdf'] # Creating an empty document object pdf_doc = pymupdf.open() # Appending files to an empty document object for filename in files: pdf_doc.insert_file(filename) # Saving merged pdf and close the file object pdf_doc.save("pdfs/pymupdf_merged.pdf") pdf_doc.close()
Here is the merged pdf:
Approach 2: PyPDF2’s PdfMerger() class
The PyPDF2 is another PDF manipulation library that provides a concatenation class.
We will create a custom function that accepts the input file list and the output file’s path.
PyPDF2 is simple and helpful in basic PDF operations. If you are working with a simple PDF file, I would recommend you to use this library, but if your library is complex and requires special attention, I would recommend you to use the “pymupdf”.
Here is the step-by-step guide for this approach:
Step 1: Import PyPDF2
# Import PyPDF2 library import PyPDF2
Step 2: Defining the custom function
- input_files
- output file
# Defining custom function that merges multiple pdfs def merge_pdfs(input_files, output_file):
Step 3: Create an instance of the PdfMerger class
As we already discussed, the PyPDF2 library provides a PdfMerger class, and we have to create its instance.
# Creating an instance of PdfMerger class pdf_merger = PyPDF2.PdfMerger()
Step 4: Appending Input Files
We will use a pdfmerger instance and append all the input files into this one.
# Appending input files for pdf_file in input_files: with open(pdf_file, 'rb') as file: pdf_merger.append(file)
Step 5: Writing an Output File
Use the .write() function on the pdf_merger instance to write a final merged pdf file.
# Writing an output merged pdf file with open(output_file, 'wb') as file: pdf_merger.write(file) # Closing the merger instance pdf_merger.close()
We also closed the pdf_merger instance to free up the computing resources.
Step 6: Calling the custom function
Our final step is to call the custom function with input_files and output file:
# Defining input and output files input_files = ['pdfs/first.pdf', 'pdfs/second.pdf'] output_file = 'pdfs/merged_file.pdf' # Calling a custom function merge_pdfs(input_files, output_file)
Here is the complete code:
# Import PyPDF2 library import PyPDF2 # Defining custom function that merges multiple pdfs def merge_pdfs(input_files, output_file): # Creating an instance of PdfMerger class pdf_merger = PyPDF2.PdfMerger() # Appending input files for pdf_file in input_files: with open(pdf_file, 'rb') as file: pdf_merger.append(file) # Writing an output merged pdf file with open(output_file, 'wb') as file: pdf_merger.write(file) # Closing the merger instance pdf_merger.close() # Defining input and output files input_files = ['pdfs/first.pdf', 'pdfs/second.pdf'] output_file = 'pdfs/pypdf2_merged.pdf' # Calling a custom function merge_pdfs(input_files, output_file)
If you execute the above code, you will get this merged file:
That’s all!