AppDividend
Latest Code Tutorials

How to Convert Python Unicode to String

Unicode strings can be encoded in plain strings to whichever encoding you choose. Python Unicode character is the abstract object big enough to hold the character, analogous to Python’s long integers. If the string only contains ASCII characters, use the str() function to convert it into a string.

data = u"xyzw"
app = str(data)
print(app)

Output

xyzw

If you have a Unicode string, and you need to write this to a file, or other serialized form, you must first encode it into a particular representation that can be saved.

There are numerous common Unicode encodings, such as UTF-16 (which uses two bytes for most Unicode characters) or UTF-8 ( which uses1-4 bytes/codepoint depending on the character), etc.

To convert that string into a particular encoding, you can use the following code.

data = u'£21'
app = data.encode('UTF-8')
print(app)

new = data.encode('UTF-16')
print(new)

Output

b'\xc2\xa321'
b'\xff\xfe\xa3\x002\x001\x00'

So, we got the output in bytes.

To convert bytes to string, use the decode() function.

data = u'£21'
app = data.encode('UTF-8')
print(app.decode())

new = data.encode('UTF-16')
print(new.decode('UTF-16'))

Output

£21
£21

You can see that we got our original strings.

Convert Python Unicode to String

To convert Python Unicode to string, use the unicodedata.normalize() function. The Unicode standard defines various normalization forms of a Unicode string, based on canonical equivalence and compatibility equivalence.

For each character, there are two normal forms:

  1. normal form C
  2. normal form D

The normal form D (NFD) is also known as canonical decomposition and translates each character into its decomposed form. Normal form C (NFC) first applies a canonical decomposition, then composes pre-combined characters again.

Syntax

unicodedata.normalize(form, unistr)

The normal form KD (NFKD) will apply the compatibility decomposition, for example, replace all compatibility characters with their equivalents. The normal form KC (NFKC) first applies the compatibility decomposition, followed by the canonical composition.

Even if two Unicode strings are normalized and look the same to a human reader, if one has combining characters and doesn’t, they may not compare equal.

Example

import unicodedata

title = u"André skräms inför på fédéral électoral Verhältnismäßigkeit"

nData = unicodedata.normalize('NFKD', title).encode('ASCII', 'ignore')
print(nData)

Output

b'Andre skrams infor pa federal electoral Verhaltnismaigkeit'

You can see that in the output, we got the encoded bytes string, and now we can decode it to get Python string using string decode() function.

import unicodedata

title = u"André skräms inför på fédéral électoral Verhältnismäßigkeit"

nData = unicodedata.normalize('NFKD', title).encode('ASCII', 'ignore')
print(nData.decode())

Output

Andre skrams infor pa federal electoral Verhaltnismaigkeit

As you can see that now we got the plain string in the output. So, we have completely transformed the uniform to string in Python. That is it for this Python Unicode to String tutorial.

See also

Python String to Bytes

Python b-string

Python String to Hex

Leave A Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.