Skip to content

strings should use an old_str / old_unicode type, similar to old_div #285

Open
@bwo

Description

@bwo

Consider the following module:

$ cat test.py
import helper

helper.discriminate(str(3))
helper.discriminate('d')
helper.discriminate(str(u'd'))
helper.discriminate('d' + str(3))


print((chr(123) + chr(125)).format('1'))
helper.discriminate('{}'.format(1))
helper.discriminate(chr(10))

(Don't worry about helper for now.)

I would expect the following:

  • in Python 2, str(3) returns not-unicode, so following the transformations, that behavior will be preserved (i.e. in Python 3 I'll get either bytes or some kind of emulation of Python 2's str; in Python 2, I'll still get str).
  • similarly, since there's no from __future__ import unicode_literals heading the file, the string literal 'd' will become the literal b'd', or some kind of emulation object.
  • since I'm using str to explicitly coerce u'd' to str, this will continue to work.
  • for the format calls, in Python 3 a bytes object obviously won't work, since it won't have format()—so, again, some kind of from past import str is what I'd expect.

None of this is what happens, though.

Here's helper.py, followed by the results of running test.py:

$ cat helper.py
def discriminate(o):
    if isinstance(o, unicode):
        print('barf')
    elif isinstance(o, str):
        print('ok good')
$ python2 test.py
ok good
ok good
ok good
ok good
1
ok good
ok good

Now let's try translating first test.py, and then also helper.py:

$ futurize -w -0 test.py 2>/dev/null
--- test.py     (original)
+++ test.py     (refactored)
@@ -1,3 +1,6 @@
+from __future__ import print_function
+from builtins import str
+from builtins import chr
 import helper

 helper.discriminate(str(3))
$ python test.py
barf
ok good
barf
barf
1
ok good
barf
$ futurize -w -0 helper.py 2>/dev/null
--- helper.py   (original)
+++ helper.py   (refactored)
@@ -1,5 +1,6 @@
+from __future__ import print_function
 def discriminate(o):
-    if isinstance(o, unicode):
+    if isinstance(o, str):
         print('barf')
     elif isinstance(o, str):
         print('ok good')
$ python test.py
barf
1
barf

So, calls to str and chr now produce unicode, but the string literal 'd' is still a Python 2 str. If we translate module-by-module, the behavior of the script obviously changes. Even using str to cast between types doesn't work anymore! Moreover, the isinstance checks in helper are completely wrong.

AFAICT the only way to actually mechanically translate code using strings without introducing semantic differences is just to use a compatibility layer, including one around every literal. I know you all have spent a lot more time on this than I have (and also that such a layer would be a PITA to get right), but the above complications bit me hard in trying futurize out on part of the codebase for a reasonably large project (and translating the entirety at once wouldn't fix it). It seems as if even with futurize, without such a compatibility layer, there's no alternative but to check almost every string :(.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions