Att String Decode

## Python bytes.decode() Method In Python, the `decode()` method is used to convert a sequence of bytes (encoded data) back into a Unicode string. It is important to note that in **Python 3**, strings (`str`) are stored as Unicode by default, while binary data is represented by the `bytes` type. Therefore, `decode()` is a method of the **`bytes`** class (and `bytearray`), not the `str` class. To convert a string to bytes, you use `encode()`; to convert bytes back to a string, you use `decode()`. --- ## Syntax ```python bytes.decode(encoding='utf-8', errors='strict') ``` ### Parameters * **`encoding`** *(Optional)*: A string specifying the encoding format to be used for decoding (e.g., `'utf-8'`, `'ascii'`, `'gbk'`, `'latin-1'`). The default value is `'utf-8'`. * **`errors`** *(Optional)*: A string specifying how decoding errors should be handled. The default is `'strict'`. Common error handling schemes include: * `'strict'`: Raises a `UnicodeDecodeError` exception if a decoding error occurs. * `'ignore'`: Silently ignores malformed data and continues decoding. * `'replace'`: Replaces malformed data with a replacement marker (such as `?` or the Unicode replacement character `\ufffd`). * `'backslashreplace'`: Replaces malformed bytes with backslashed escape sequences. * Any other custom error handler registered via `codecs.register_error()`. ### Return Value * This method returns the decoded **string** (`str`) representation of the byte sequence. --- ## Code Examples ### Example 1: Basic UTF-8 Decoding The following example demonstrates how to encode a standard string into bytes and then decode it back into a string using the default UTF-8 encoding. ```python # Define a Unicode string original_str = "Python Programming - Tutorial" # Encode the string to bytes (UTF-8) encoded_bytes = original_str.encode('utf-8') print("Encoded Bytes: ", encoded_bytes) # Decode the bytes back to a string decoded_str = encoded_bytes.decode('utf-8') print("Decoded String:", decoded_str) ``` **Output:** ```text Encoded Bytes: b'Python Programming - Tutorial' Decoded String: Python Programming - Tutorial ``` --- ### Example 2: Handling Non-ASCII Characters When working with international characters, specifying the correct encoding is crucial. ```python # A string with Chinese characters chinese_str = "编程" # Encode using UTF-8 utf8_bytes = chinese_str.encode('utf-8') print("UTF-8 Bytes: ", utf8_bytes) # Decode back using UTF-8 print("Decoded UTF-8:", utf8_bytes.decode('utf-8')) ``` **Output:** ```text UTF-8 Bytes: b'\xe7\xbc\x96\xe7\xa8\x8b' Decoded UTF-8: 编程 ``` --- ### Example 3: Error Handling Strategies If you attempt to decode bytes using the wrong encoding format, Python will raise an error by default. You can manage this behavior using the `errors` parameter. ```python # UTF-8 encoded bytes for "编程" utf8_bytes = b'\xe7\xbc\x96\xe7\xa8\x8b' # 1. Default 'strict' behavior (Raises UnicodeDecodeError) try: # ASCII cannot decode these bytes utf8_bytes.decode('ascii', errors='strict') except UnicodeDecodeError as e: print("Strict Error:", e) # 2. 'ignore' behavior (Skips invalid bytes) ignored = utf8_bytes.decode('ascii', errors='ignore') print("Ignored Result:", repr(ignored)) # 3. 'replace' behavior (Inserts replacement characters) replaced = utf8_bytes.decode('ascii', errors='replace') print("Replaced Result:", replaced) ``` **Output:** ```text Strict Error: 'ascii' codec can't decode byte 0xe7 in position 0: ordinal not in range(128) Ignored Result: '' Replaced Result: ``` --- ## Considerations ### Python 2 vs. Python 3 Difference * **Python 2.x**: Strings were byte-based by default, meaning `str.decode()` was a valid method used to convert a byte string into a `unicode` object. Python 2 also supported legacy codecs like `'base64'` directly inside `encode()` and `decode()`. * **Python 3.x**: Strings are strictly Unicode. The `decode()` method **only** exists on `bytes` and `bytearray` objects. Legacy non-text codecs (like `'base64'` or `'hex'`) cannot be used directly with `bytes.decode()`; instead, you should use the `base64` or `binascii` standard libraries.

YouTip

Att String Decode

📂 Categories