Many online tools are available for merging multiple PDFs into one file, but they have privacy concerns. The online method is neither secure nor viable if your PDFs contain secret information. That’s where a basic programming language like Python comes into play.
Python provides multiple third-party libraries that you can use to handle PDF operations. Two such libraries are the following:
- PymuPDF
- PyPDF2
To merge multiple PDFs into a single (merged) PDF, you can either use pymupdf’s .insert_file() function or “pypdf2’s PdfMerger()“ class.
Preparing PDFs
Before going into action, we have to prepare pdf files for merging. For this practical, I have created one folder called “pdfs” in my current working directory, and inside that folder, I have two pdfs like this:
Here are two files that you can download and use in your practical:
Approach 1: Pymupdf’s .insert_file() function
The Pymupdf library provides various functions to open, merge, write, and close pdf objects.
One big advantage of using pymupdf is that it is a cross-platform library that performs well with large PDF files.
Here is the step-by-step guide:
Step 1: Install pymupdf
Install the pymupdf library using the command below:
pip install pymupdf
Step 2: Import it into your file
# Importing pymupdf library import pymupdf
Step 3: Prepare a list of input PDF files
# List of input pdf files files = ['pdfs/first.pdf', 'pdfs/second.pdf']
Step 4: Creating an empty document
We can create an empty PDF document and add pages to it later.
# Creating an empty document object pdf_doc = pymupdf.open()
Step 5: Appending input files
We already created an empty PDF object, and now it is time to add pages from the input files.
We will iterate the loop through the files list, appending input files to an empty object using the .insert_file() method.
# Appending files to an empty document object for filename in files: pdf_doc.insert_file(filename)
Step 6: Saving the Merged PDF
# Saving merged pdf and close the file object pdf_doc.save("pdfs/pymupdf_merged.pdf") pdf_doc.close()
Inside the “pdfs” folder, we are saving the pymupdf_merged.pdf file, which is the concatenation of two input files. Here is the complete code:
# Importing pymupdf library import pymupdf # List of input pdf files files = ['pdfs/first.pdf', 'pdfs/second.pdf'] # Creating an empty document object pdf_doc = pymupdf.open() # Appending files to an empty document object for filename in files: pdf_doc.insert_file(filename) # Saving merged pdf and close the file object pdf_doc.save("pdfs/pymupdf_merged.pdf") pdf_doc.close()
Here is the merged pdf:
Approach 2: PyPDF2’s PdfMerger() class
The PyPDF2 is another PDF manipulation library that provides a concatenation class.
We will create a custom function that accepts the input file list and the output file’s path.
PyPDF2 is straightforward and helpful for basic PDF operations. I recommend this library if you are working with a simple PDF file. However, if your library is complex and requires special attention, I suggest you use “pymupdf.”
Here is the step-by-step guide for this approach:
Step 1: Import PyPDF2
# Import PyPDF2 library import PyPDF2
Step 2: Defining the custom function
- input_files
- output file
# Defining custom function that merges multiple pdfs def merge_pdfs(input_files, output_file):
Step 3: Create an instance of the PdfMerger class
As we already discussed, the PyPDF2 library provides a PdfMerger class, and we must create its instance.
# Creating an instance of PdfMerger class pdf_merger = PyPDF2.PdfMerger()
Step 4: Appending Input Files
We will use a pdfmerger instance and append all the input files into this one.
# Appending input files for pdf_file in input_files: with open(pdf_file, 'rb') as file: pdf_merger.append(file)
Step 5: Writing an Output File
Use the .write() function on the pdf_merger instance to write a final merged pdf file.
# Writing an output merged pdf file with open(output_file, 'wb') as file: pdf_merger.write(file) # Closing the merger instance pdf_merger.close()
We also closed the pdf_merger instance to free up the computing resources.
Step 6: Calling the custom function
Our final step is to call the custom function with input_files and output files:
# Defining input and output files input_files = ['pdfs/first.pdf', 'pdfs/second.pdf'] output_file = 'pdfs/merged_file.pdf' # Calling a custom function merge_pdfs(input_files, output_file)
Here is the complete code:
# Import PyPDF2 library import PyPDF2 # Defining custom function that merges multiple pdfs def merge_pdfs(input_files, output_file): # Creating an instance of PdfMerger class pdf_merger = PyPDF2.PdfMerger() # Appending input files for pdf_file in input_files: with open(pdf_file, 'rb') as file: pdf_merger.append(file) # Writing an output merged pdf file with open(output_file, 'wb') as file: pdf_merger.write(file) # Closing the merger instance pdf_merger.close() # Defining input and output files input_files = ['pdfs/first.pdf', 'pdfs/second.pdf'] output_file = 'pdfs/pypdf2_merged.pdf' # Calling a custom function merge_pdfs(input_files, output_file)
If you execute the above code, you will get this merged file:
That’s all!