Description
Consider the following module:
$ cat test.py
import helper
helper.discriminate(str(3))
helper.discriminate('d')
helper.discriminate(str(u'd'))
helper.discriminate('d' + str(3))
print((chr(123) + chr(125)).format('1'))
helper.discriminate('{}'.format(1))
helper.discriminate(chr(10))
(Don't worry about helper
for now.)
I would expect the following:
- in Python 2, str(3) returns not-unicode, so following the transformations, that behavior will be preserved (i.e. in Python 3 I'll get either
bytes
or some kind of emulation of Python 2'sstr
; in Python 2, I'll still getstr
). - similarly, since there's no
from __future__ import unicode_literals
heading the file, the string literal'd'
will become the literalb'd'
, or some kind of emulation object. - since I'm using
str
to explicitly coerceu'd'
tostr
, this will continue to work. - for the
format
calls, in Python 3 abytes
object obviously won't work, since it won't haveformat()
—so, again, some kind offrom past import str
is what I'd expect.
None of this is what happens, though.
Here's helper.py
, followed by the results of running test.py
:
$ cat helper.py
def discriminate(o):
if isinstance(o, unicode):
print('barf')
elif isinstance(o, str):
print('ok good')
$ python2 test.py
ok good
ok good
ok good
ok good
1
ok good
ok good
Now let's try translating first test.py
, and then also helper.py
:
$ futurize -w -0 test.py 2>/dev/null
--- test.py (original)
+++ test.py (refactored)
@@ -1,3 +1,6 @@
+from __future__ import print_function
+from builtins import str
+from builtins import chr
import helper
helper.discriminate(str(3))
$ python test.py
barf
ok good
barf
barf
1
ok good
barf
$ futurize -w -0 helper.py 2>/dev/null
--- helper.py (original)
+++ helper.py (refactored)
@@ -1,5 +1,6 @@
+from __future__ import print_function
def discriminate(o):
- if isinstance(o, unicode):
+ if isinstance(o, str):
print('barf')
elif isinstance(o, str):
print('ok good')
$ python test.py
barf
1
barf
So, calls to str
and chr
now produce unicode, but the string literal 'd'
is still a Python 2 str. If we translate module-by-module, the behavior of the script obviously changes. Even using str
to cast between types doesn't work anymore! Moreover, the isinstance
checks in helper
are completely wrong.
AFAICT the only way to actually mechanically translate code using strings without introducing semantic differences is just to use a compatibility layer, including one around every literal. I know you all have spent a lot more time on this than I have (and also that such a layer would be a PITA to get right), but the above complications bit me hard in trying futurize
out on part of the codebase for a reasonably large project (and translating the entirety at once wouldn't fix it). It seems as if even with futurize
, without such a compatibility layer, there's no alternative but to check almost every string :(.