strings should use an old_str / old_unicode type, similar to old_div

Consider the following module:

```
$ cat test.py
import helper

helper.discriminate(str(3))
helper.discriminate('d')
helper.discriminate(str(u'd'))
helper.discriminate('d' + str(3))


print((chr(123) + chr(125)).format('1'))
helper.discriminate('{}'.format(1))
helper.discriminate(chr(10))
```

(Don't worry about `helper` for now.)

I would expect the following:

 - in Python 2, str(3) returns not-unicode, so following the transformations, that behavior will be preserved (i.e. in Python 3 I'll get either `bytes` or some kind of emulation of Python 2's `str`; in Python 2, I'll still get `str`).
 - similarly, since there's no `from __future__ import unicode_literals` heading the file, the string literal `'d'` will become the literal `b'd'`, or some kind of emulation object.
 - since I'm using `str` to explicitly coerce `u'd'` to `str`, this will continue to work.
 - for the `format` calls, in Python 3 a `bytes` object obviously won't work, since it won't have `format()`—so, again, some kind of `from past import str` is what I'd expect.

None of this is what happens, though.

Here's `helper.py`, followed by the results of running `test.py`:

```
$ cat helper.py
def discriminate(o):
    if isinstance(o, unicode):
        print('barf')
    elif isinstance(o, str):
        print('ok good')
$ python2 test.py
ok good
ok good
ok good
ok good
1
ok good
ok good
```

Now let's try translating first `test.py`, and then also `helper.py`:

```
$ futurize -w -0 test.py 2>/dev/null
--- test.py     (original)
+++ test.py     (refactored)
@@ -1,3 +1,6 @@
+from __future__ import print_function
+from builtins import str
+from builtins import chr
 import helper

 helper.discriminate(str(3))
$ python test.py
barf
ok good
barf
barf
1
ok good
barf
$ futurize -w -0 helper.py 2>/dev/null
--- helper.py   (original)
+++ helper.py   (refactored)
@@ -1,5 +1,6 @@
+from __future__ import print_function
 def discriminate(o):
-    if isinstance(o, unicode):
+    if isinstance(o, str):
         print('barf')
     elif isinstance(o, str):
         print('ok good')
$ python test.py
barf
1
barf
```

So, calls to `str` and `chr` now produce unicode, but the string literal `'d'` is still a Python 2 str. If we translate module-by-module, the behavior of the script obviously changes. Even using `str` to cast between types doesn't work anymore! Moreover, the `isinstance` checks in `helper` are completely wrong.

AFAICT the only way to actually *mechanically translate* code using strings without introducing semantic differences is just to use a compatibility layer, including one around every literal. I know you all have spent a lot more time on this than I have (and also that such a layer would be a PITA to get right), but the above complications bit me hard in trying `futurize` out on part of the codebase for a reasonably large project (and translating the entirety at once wouldn't fix it). It seems as if even with `futurize`, without such a compatibility layer, there's no alternative but to check almost every string :(.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

strings should use an old_str / old_unicode type, similar to old_div #285

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

strings should use an old_str / old_unicode type, similar to old_div #285

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions