How to Read and Write Binary Files in Python?

2024-06-08Python

Opening and Closing Files

To open files, use open(). When opening binary files, "rb", "wb", "ab" or "xb" are given as a option. On the other hand, use close() to close files.

f = open("test.dat", "wb")
f.close()

The difference between "rb", etc. is as follows.

OptionDescription
"rb"Reading
"wb"Writing
"ab"Moving to the end of file, then writing
"xb"Writing when a file doesn’t exist. An err occurs when a file exists.

When there are continuous read and write operations, and you want to close the file when the operations are finished, it’s convenient to use the with syntax. The file will be automatically closed when the block ends.

with open("test.dat", "rb") as f:
    a = f.read()

Random Access

Getting Current Position

Use tell() to get current position of the file.

# f is file object
pos = f.tell()

Moving Pointer

You can set options with the seek() function to indicate where in the file to use as a reference point. The values are defined in the os module. If omitted, os.SEEK_SET is specified by default.

ValueReference PointSpecified Value
os.SEEK_SETHead of filePositive
os.SEEK_CURCurrent positionPositive or negative
os.SEEK_ENDEnd of fileNegative
# import os module
import os

# f is file object
f.seek( 5, os.SEEK_SET)  # 5 bytes after the head of file
f.seek(-3, os.SEEK_CUR)  # 3 bytes before the current position
f.seek(-5, os.SEEK_END)  # 5 bytes before the end of file

Reading and Writing

To read byte sequences from a file, you use read(), and for writing, you use write(). When reading or writing strings or numbers, you need to go through byte sequences and perform conversions.

# f if file object
b = f.read(32)  # reading 32 bytes

b = b'123'
f.write(b)      # writing byte sequences

Strings

To convert between strings and byte sequences, you use decode() and encode(). Additionally, you can utilize the struct module, as mentioned later.

# f is file object
b = f.read(10)
str = b.decode() # converting byte sequences to string

str = "12345"
b = str.encode() # converting string to byte sequences
f.write(b)

Numerics

Integers

To convert between integers and byte sequences, you use int.from_byte() or to_bytes(). Additionally, you can utilize the struct module, too.

# requirement for using byteorder
from sys import byteorder

# f is file object
b = f.read(4)
val = int.from_bytes(b, byteorder) # converting 4-byte sequeces to integer

val = 12345
b = val.to_bytes(4, byteorder)     # converting integer to 4-byte sequences
f.write(b)

Floating Point Values

To convert floating point values to and from byte sequences, the struct module is used. The return value of unpack() is a tuple, so it’s important to handle it accordingly.

# requirement for using struct module
from struct import unpack, pack

# f is file object
b = f.read(8)
val, = unpack('d', b) # converting 8-byte sequeces to floating point value

val = 12.345
b = pack('f', val)    # converting floating point value to 4-byte sequences
f.write(b)

The format characters that can be specified with unpack() and pack() include the following:

CharactersDescription
'f'Single-precision floating point (4 bytes)
'd'Double-precision floating point (8 bytes)
'q'Signed integer (8 bytes)
'Q'Unsigned integer (8 bytes)
'i', 'l'Signed integer (4 bytes)
'I', 'L'Unsigned integer (4 bytes)
'h'Signed integer (2 bytes)
'H'Unsigned integer (2 bytes)
'c'Character (byte string of length 1)
'b'Signed integer (1 byte)
'B'Unsigned integer (1 byte)
's', 'p'Fixed-length byte string (specified with length, e.g., '10s')

Multiple Data

To read and write multiple pieces of data at once, you can use the struct module.

# requirement for struct module
from struct import unpack, pack, calcsize

format = "15sl10s"
size = calcsize(format) # calculating the buffer size from a format string

b = f.read(size)
s1, val, s2 = unpack(format, b)
s1 = s1.strip(b'\0x00').decode() # removing null character
s2 = s2.strip(b'\0x00').decode() # removing null character

s1 = "test"
val = 123
s2 = "abcdefghij"
b = pack(format, s1.encode(), val, s2.encode())
f.write(b)

python

Posted by izadori