Unicode() and encode() function in Python with examples

In this tutorial, we are going to learn about what is Unicode strings and how to use encode() function for error handling in Python and look after some examples.

Description of Unicode:-

In general computing, languages use ASCII values as a map for code points, which supports only 128 characters, But the Python string type uses the Unicode standard for mapping the code points. Here Unicode Stands for the Universal character set which supports a wide range of characters. Encoding in python is done under utf-8, where UTF means “Unicode Transmission Format” and “8” means 8-bit values are used for encoding.

Here Encoding is done by using the Unicode format. Let us know in detail about how encoding is done in python with examples below.

String encode() methods in Python :-

The main encoding you could consider is utilizing 32-bit integers as the code unit, and afterward utilizing the CPU’s portrayal of 32-digit numbers.

The method encode() is used for encoding the string in the specified encoding. If there is no encoding mention as we discussed above python uses “utf-8” for encoding,

Syntax:-

str.encode(encodings,errors)

Here encode() method uses two parameters; encodings that in which encoding it should be done and errors for error handling, Lets us see how encoding is done by some examples.

Input :-

inp_str = 'Codespeedy'
bytes_8bit_encoded = inp_str.encode("utf-8") # 8-bit value encoding
bytes_16bit_encoded = inp_str.encode("utf-16") $ 16-bit value encoding
print(bytes_8bit_encoded)
print(bytes_16bit_encoded)

Output :-

b'Codespeedy'
b'\xff\xfeC\x00o\x00d\x00e\x00s\x00p\x00e\x00e\x00d\x00y\x00'


Unicode encodes Error Handling:-

we have several errors while encoding. So We have several error handling implementation schemes, We will see them in detail below.

‘strict’:-

In this scheme, Python will raise a unicodeerror exception on the failure of execution. python default uses this scheme.

Example :-

Input :-

in_str = 'åCodespeedy'
output = in_str.encode(encoding='UTF-8',errors='strict')
print(output)

Output :-

b'\xc3\xa5Codespeedy'

‘ignore’ :-

In this scheme, Python will ignore the elements which can’t be encoded.

Example:-

Input :-

in_str = 'åååååååCodespeedy'
output = in_str.encode(encoding='ascii',errors='ignore') # The charecter å can't be encoded
print(output)

Output :-

b'Codespeedy'

‘backslashreplace’

In this scheme, the character which cannot be encoded will be replaced by backslash.

Input :-

in_str = 'åCodespeedyå'
output = in_str.encode(encoding='ascii',errors='backslashreplace') # The character is replaced by backslash
print(output)

Output :-

b'\\xe5Codespeedy\\xe5'

‘namereplace’

In this Scheme the character which cannot be encoded will be replaced by the name of the character.

Example :-

Input:-

in_str = 'åCodespeedyß'
output = in_str.encode(encoding='ascii',errors='namereplace') # Here the character is replaced by name
print(output)

Output :-

b'\\N{LATIN SMALL LETTER A WITH RING ABOVE}Codespeedy\\N{LATIN SMALL LETTER SHARP S}'

‘replace’

In this Scheme, the character which cannot be encoded will be replaced by the question mark.

Example :-

Input:-

in_str = 'åCodespeedyß'
output = in_str.encode(encoding='ascii',errors='replace')
print(output)

Output :-

b'?Codespeedy?'

‘xmlcharrefreplace’ :-

In this Scheme, the character which cannot be encoded will be replaced by the xml character.

Input:-

in_str = 'åCodespeedyß'
output = in_str.encode(encoding='ascii',errors='xmlcharrefreplace')
print(output)

Output :-

b'åCodespeedyß'

Leave a Reply

Your email address will not be published. Required fields are marked *