How to Convert String to Bytes in Python

Python string is a unicode sequence representing text, while bytes are immutable sequences of raw 8-bit values (0-255) representing binary data.

You can use either the str.encode() method or the bytes() constructor to convert a string to bytes. These are two main ways, and both accept an encoding (default: ‘utf-8’) and errors (default: ‘strict’) parameter.

Method 1: Using str.encode()

The most common method for converting a string to bytes is to use the str.encode() method and specify the desired encoding, with UTF-8 being the most common.

You can replace ‘utf-8’ with another encoding if necessary, but ‘utf-8’ is the standard as it supports a wide range of characters.

new_string = "GTA 5"

print(new_string)
# Output: GTA 5

print(type(new_string), '\n')
# Output: <class 'str'>

# String with encoding "UTF-8"
bytes_obj = new_string.encode("utf-8")

print(bytes_obj)
# Output: b'GTA 5'

print(type(bytes_obj), '\n')
# Output: <class 'bytes'>

# String with encoding "ASCII"
bytes_obj_2 = new_string.encode("ASCII")

print(bytes_obj_2)
# Output: b'GTA 5'

print(type(bytes_obj_2))
# Output: <class 'bytes'>

The b prefix suggests that the output is a bytes object.

The output bytes are immutable, and you can only perform modifications by creating new objects. If you modify the existing object, it will throw this error: TypeError: ‘bytes’ object does not support item assignment

# String with encoding "UTF-8"
bytes_obj = new_string.encode("utf-8")

print(bytes_obj)
# Output: b'GTA 5'

bytes_obj[0] = 71

# Output: TypeError: 'bytes' object does not support item assignment

“errors” argument

The “errors” argument controls how unencodable characters are handled. If it is ‘strict’ (default), it raises UnicodeEncodeError on failure.

new_string = "Café"

print(new_string)
# Output: Café

print(type(new_string))
# Output: <class 'str'>

try:
    # 'é' (U+00E9) can't be encoded in ASCII
    bytes_obj = new_string.encode('ascii')
except UnicodeEncodeError as e:
    print(e)

print(bytes_obj)
# 'ascii' codec can't encode character '\xe9' in position 3: ordinal not in range(128)

We handle the error using the try/except mechanism.

Empty string

If the input string is empty, the output bytes will be empty too, and you can verify it is empty by measuring its length using the built-in len() method.

empty_str = ""

byte_str = empty_str.encode('utf-8')

print(byte_str)
# Output: b''

print(len(byte_str))
# Output: 0

It won’t throw any error, and you can see that the byte object’s length is 0, which means it is empty.

Non-ASCII characters

The encode() method correctly handles non-ASCII characters and returns the output without any errors.

japanese_hello = "こんにちは"  # Japanese: Hello

bytes = japanese_hello.encode('utf-8')

print(bytes)

# Output: b'\xe3\x81\x93\xe3\x82\x93\xe3\x81\xab\xe3\x81\xa1\xe3\x81\xaf'

Method 2: Using the bytes() constructor

The bytes() constructor takes the string and the encoding as arguments and returns a bytes object.

new_string = "GTA 5"

print(new_string)
# Output: GTA 5

print(type(new_string))
# Output: <class 'str'>

# String with encoding "UTF-8"
bytes_obj = bytes(new_string, "utf-8")

print(bytes_obj)
# Output: b'GTA 5'

print(type(bytes_obj))
# Output: <class 'bytes'>

With other encoding

We can pass the ‘latin-1’ as an encoding argument to get the different byte outputs based on the requirement.

new_string = "Café"

print(new_string)
# Output: Café

print(type(new_string))
# Output: <class 'str'>

# String with encoding "UTF-8"
bytes_obj = bytes(new_string, 'latin-1')

print(bytes_obj)
# Output: b'Caf\xe9'

print(type(bytes_obj))
# Output: <class 'bytes'>

Method 3: Using bytearray() constructor

The bytearray() constructor accepts a string and converts it to a mutable sequence of bytes using the specified encoding, like “utf-8”.

Both bytes() and bytearray() represent binary data (a sequence of bytes). The bytearray() constructor wraps that data in a mutable container.

new_string = "Dam Capital"

print(new_string)
# Output: Dam Capital

print(type(new_string))
# Output: <class 'str'>

bytes = bytearray(new_string, "utf-8")

print(bytes)
# Output: bytearray(b'Dam Capital')

The main difference between bytes() and bytearray() is that bytearray() allows the byte data to be modified later, whereas bytes() is an immutable object.

Let’s change the byte data after the conversion.

new_string = "Dam Capital"

print(new_string)
# Output: Dam Capital

print(type(new_string))
# Output: <class 'str'>

bytes = bytearray(new_string, "utf-8")

print(bytes)
# Output: bytearray(b'Dam Capital')

bytes[0] = ord('H')  # Change 'D' to 'H'

print(bytes)
# Output: bytearray(b'Ham Capital')

print(bytes.decode())
# Output: Ham Capital

In this code, we changed the bytes from “b’Dam”‘s “D” to “H”, and hence the output will now be “b’Ham Capital”.

Since it has been successfully modified, we can prove that bytearray() can be modified at a later stage of the program.

Post Views: 13

Krunal Lathiya

With a career spanning over eight years in the field of Computer Science, Krunal’s expertise is rooted in a solid foundation of hands-on experience, complemented by a continuous pursuit of knowledge.