How to a Read Binary File in Python

Reading a binary file is like reading a normal file using the open() function and changing the mode of the file to “rb”, which is a binary mode. That syntax looks like: with open(“data.bin”, rb) as f

Let’s say we have a binary file data.bin that looks like the image below:

Throughout this tutorial, we will read this bin file in different ways.

Opening a file in binary mode

Using a context manager like the “with statement”, we can open a file in binary mode and read all the bytes in memory using the .read() function.

# Open File in Binary Mode
with open('data.bin', 'rb') as f:
    data = f.read()
    print(data)

# Output:
# b'\x00\x00\x00\x01.Ws\xef\xbf\xbd\x00\xef\xbf\xbdASensorData_XYZ\x00\x00\n\x00\x00\x00\x01.
# Ws\xef\xbf\xbd\x00\xef\xbf\xbdASensorData_XYZ\x00\x00\n\x00\x00\x00\x01.
# Ws\xef\xbf\xbd\x00\xef\xbf\xbdASensorData_XYZ\x00\x00\n\x00\x00\x00\x01.
# Ws\xef\xbf\xbd\x00\xef\xbf\xbdASensorData_XYZ\x00\x00\n\x00\x00\x00\x01.
# Ws\xef\xbf\xbd\x00\xef\xbf\xbdASensorData_XYZ\x00\x00'

We printed the binary data to the console.

The argument “rb” means r stands for read-only and b stands for binary.

There is one flaw in this approach. If the file is enormous, it may consume all available RAM. So, this approach is not efficient when the file size is extremely large.

Read a binary file in chunks

To resolve the memory exhaustion issue, we need to divide the file’s content into chunks by passing a chunk size to the f.read() method, and then read the file chunk by chunk, printing it if desired.

# Read in Chunks
chunk_size = 4096  # 4KB chunks

with open('data.bin', 'rb') as f:
    while chunk := f.read(chunk_size):
        print(chunk)
        

# Output:
# b'\x00\x00\x00\x01.Ws\xef\xbf\xbd\x00\xef\xbf\xbdASensorData_XYZ\x00\x00\n\x00\x00\x00\x01.
# Ws\xef\xbf\xbd\x00\xef\xbf\xbdASensorData_XYZ\x00\x00\n\x00\x00\x00\x01.
# Ws\xef\xbf\xbd\x00\xef\xbf\xbdASensorData_XYZ\x00\x00\n\x00\x00\x00\x01.
# Ws\xef\xbf\xbd\x00\xef\xbf\xbdASensorData_XYZ\x00\x00\n\x00\x00\x00\x01.
# Ws\xef\xbf\xbd\x00\xef\xbf\xbdASensorData_XYZ\x00\x00'

Chunk reading prevents an entire file from loading at once, reducing the memory usage.

Don’t pass chunk_size ≤ 0 because in that case the method f.read() may misbehave—always use a positive integer.

The methods readline() and readlines() are not suitable for reading binary files. They help read normal textual files.

Data Parsing with struct

The built-in “struct” package has struct.unpack() method that can be used to unpack binary data into Python types.

# Structured Data Parsing with struct
import struct

with open('data.bin', 'rb') as f:
    header = f.read(4)
    value = struct.unpack('<i', header)
    print(header)
    print(value)
    
# Output:
# b'\x00\x00\x00\x01'
# (16777216,)

In this code, ‘<i’ means little‑endian (<) signed 4‑byte integer (i) and ‘>d’ for big-endian double. Use this approach for structured data.

If you attempt to read fewer than 4 bytes, an error will be thrown. If possible, always wrap your code with a try/except block to catch potential errors without crashing the program.

Memory‑Mapped Files (mmap)

For efficient file processing, you can import the mmap module that maps the file into your process’s virtual memory. Using mmap.mmap() method, we can access random data in large files without loading all the data.

# Memory-Mapped Files (mmap)
import mmap

with open('data.bin', 'rb') as f:
    with mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ) as mm:
        data = mm[10:20]  # Directly access bytes 10-19
        print(data)
        
# Output: b'\x00\xef\xbf\xbdASenso'

The OS manages everything, including paging, fast slicing, and random access.

Zero-Copy Slicing with memoryview

The memoryview helps us create a view on a bytes or bytearray buffer without copying. If you are working in tight loops where copying large buffers hurts performance, you should definitely consider memoryview.

You can combine it with with readinto() function to avoid intermediate allocations.

# Zero-Copy Slicing with memoryview:
with open('data.bin', 'rb') as f:
    data = f.read(1024)
    mv = memoryview(data)
    chunk = mv[100:200]
    print(chunk)
    
# Output: <memory at 0x102d099c0>

You can see that the output is the memory address of that chunk. It references original data.

Buffered Reading using io.BufferedReader

If you’re exploring the option of buffered reading, I might suggest looking into io.BufferedReader() method. We can wrap a raw file object for more control over buffering.

# io.BufferedReader
import io


raw = open('data.bin', 'rb', buffering=0)
buf = io.BufferedReader(raw, buffer_size=8192)
data = buf.read(4096)
print(data)

# Output: 
#b'\x00\x00\x00\x01.Ws\xef\xbf\xbd\x00\xef\xbf\xbdASensorData_XYZ\x00\x00\n\x00\x00\x00\x01.Ws\xef\xbf\xbd\x00\xef\xbf\xbdASensorData_XYZ\x00\x00\n\x00\x00\x00\x01
#.Ws\xef\xbf\xbd\x00\xef\xbf\xbdASensorData_XYZ\x00\x00\n\x00\x00\x00\x01.Ws\xef\xbf\xbd\x00\xef\xbf\xbdASensorData_XYZ\x00\x00\n\x00\x00\x00\x01.Ws\xef\xbf\xbd\x00\xef\xbf\xbdASensorData_XYZ\x00\x00'

This approach is unnecessary for simple files. Always use it for the complex files.

asyncio streams

If you are working on network applications that expect data in an asynchronous format, you should use asyncio streams.

We need the aiofiles module, which provides asynchronous support and allows us to use await f.read(…).

Install the aiofiles module using the command below:

pip install aiofiles

Now, you can use it in your program.

Let’s read up to 16 bytes from a binary file.

import asyncio
import aiofiles


async def read_binary(path):
    # open file in async mode
    async with aiofiles.open(path, 'rb') as f:
        header = await f.read(16)
        print(header)

if __name__ == '__main__':
    import sys
    path = sys.argv[1] if len(sys.argv) > 1 else 'data.bin'
    asyncio.run(read_binary(path))

# Output:
# b'\x00\x00\x00\x01.Ws\xef\xbf\xbd\x00\xef\xbf\xbdAS'

If the file pointer is at the end of the file, it will read 0 bytes, and the header will be an empty bytes object (b”). It does not block the event loop, which is the biggest advantage of this approach.

If your application is synchronous, don’t use this approach; always use it for async operations.

Choose the best method based on your file size, raw speed, workflows, whether you are using sync or async, and the type of application you are working on.

Post Views: 276

Krunal Lathiya

With a career spanning over eight years in the field of Computer Science, Krunal’s expertise is rooted in a solid foundation of hands-on experience, complemented by a continuous pursuit of knowledge.