[Buildbot-commits] [Buildbot] #2757: Use chardet on incoming bytestrings

Buildbot trac trac at buildbot.net
Tue Apr 15 14:07:54 UTC 2014


#2757: Use chardet on incoming bytestrings
------------------------+----------------------
Reporter:  dustin       |      Owner:
    Type:  enhancement  |     Status:  new
Priority:  major        |  Milestone:  0.9.+
 Version:  0.8.8        |   Keywords:  encoding
------------------------+----------------------
 There's a library, chardet, which can do a reasonable job of guessing the
 charset of a bytestring.

 There are a number of places in Buildbot where incoming data is a
 bytestring.  Most of those allow the user to specify an encoding, and
 default to UTF-8.  For example, change sources generally get bytestrings
 for commit comments, authors, and so on.

 In the default case, it may be more convenient for users if we dynamically
 detect the character encoding of these strings.  This would amount to
 "doing the right thing" when possible, with the fallback option for users
 to supply an explicit encoding.

 Chardet would also be useful in the `ascii2unicode` method, which
 currently only allows ascii bytestrings.  Then a little mojibake is the
 unlikely worst case, rather than an exception

-- 
Ticket URL: <http://trac.buildbot.net/ticket/2757>
Buildbot <http://buildbot.net/>
Buildbot: build/test automation


More information about the Commits mailing list