Page MenuHomePhorge

[Python 3] bytes-like strings decoding for

Authored by ghane on Mar 23 2022, 6:54 PM.
Referenced Files
F10373131: D3469.diff
Sun, Oct 1, 1:14 PM
F10369798: D3469.id10424.diff
Sat, Sep 30, 6:37 PM
F10369795: D3469.diff
Sat, Sep 30, 6:37 PM
Unknown Object (File)
Sat, Sep 30, 12:14 PM
Unknown Object (File)
Fri, Sep 29, 7:27 PM
Unknown Object (File)
Thu, Sep 28, 7:04 PM
Unknown Object (File)
Thu, Sep 28, 12:21 AM
Unknown Object (File)
Wed, Sep 27, 8:59 AM



byte-like strings needed to be decoded as text strings

debugging with 2.7 origin code returned unicode strings, but returned in python 3 bytse-like string variables.

Optparse gets text strings from shell.
configparse uses internally text strings.
From version 3.0, python-ldap uses text where appropriate. On Python 2, the bytes mode setting influences how text is handled.

socket.streams and db including bytes-like string code, which will be decoded with this diff, for python-ldap operations.

Diff Detail

rP pykolab
Lint Skipped
Tests Skipped

Event Timeline

ghane requested review of this revision.Mar 23 2022, 6:54 PM
ghane created this revision.
ghane edited the summary of this revision. (Show Details)
vanmeeuwen subscribed.

I don't understand the case or cases in which this change helps, where it would have otherwise failed.

This revision now requires changes to proceed.Mar 23 2022, 7:13 PM

this is for python 3 as there is more strict type operations between byte and text strings.

you could search each unicode string in code like:


if len(login) == 4:
    realm = login[3]
elif len(login[0].split('@')) > 1:  # this will fail in python 3 as it is mixed bytes and split uses text string , but both is only str in python2 
    realm = login[0].split('@')[1]    # this will fail in python 3 as it is mixed bytes and split uses text string , but both is only str in python2 
    realm = conf.get('kolab', 'primary_domain') # this is text string in python3, in auth/ you would get a mix between login [0] login [1] and realm

I tested the code against versions 2.7, 3.7, 3.8 on debian buster and ubuntu focal.

@vanmeeuwen, how should we proceed here? This is an effort to get the PyKolab codebase into a state where it works with Python 3 without breaking existing systems that are still based on legacy Python 2. Given that background, the commit looks plausible to me.


I know it's already present in the original code, but the second argument to encode() looks strange to me. Isn't that argument supposed to be a string describing the error-handling scheme? The value 'latin1' wouldn't make any sense in that case.

streams ( Python 2 -> type { str } (bytes string) | Python 3 -> type { class bytes } ) uneven
encode() ( Python 2 -> type { str } (bytes string) | Python 3 -> type { class bytes } ) uneven
decode() ( Python 2 -> type { str } (text string) | Python 3 -> type { class string } ) even

LDAP needs string on search text string to get a result, else the result of search is 0,. <= this is an error, this case is not filtered.
LDAP gets a string on Python 2 as bytes strings are also represented as class string , on Python 3 this is more explicit and byte string are now of class bytes, as in Python 2 bytes() represents the class bytes.

setting table "entries" columns explicitly "domain" from String -> Unicode and "values, keys" from Text -> UnicodeText, would make encode() and decode() unnecessary in and encoding decoding would handled by sql alchemy.
but if you need the OS locale encode().decode() would do the job. decode() uses as default the OS locale

encode and decode becomes obsolete at the return value as sql, sql alchemy should do the encoding/decoding
see also:

class Entry
    def __init__

line 70-75 checks unicode

Nice, this looks way better.

This revision was not accepted when it landed; it landed in state Needs Review.Jun 15 2022, 11:57 PM
This revision was automatically updated to reflect the committed changes.