Closed
Description
Problem description
Hi! I've noticed that when concat'ing DataFrames, along axis=1
, integer indexes are automatically sorted even when sort=False
. This is not the case when axis = 1
and we have a stringIndex. This also diverges from when axis=0
. This behaviour is a bit confusing and so I thought it might be a bug.
In the _unique_indices API, if the indexes are integers they are brought together using a union, result = result.union(other)
that sorts the numbers by default. So the resulting index is sorted whether or not sort=True.
I think this can be fixed by changing the code to
result = result.union(other, sort=sort)
or something along those lines.
I've added examples below. Thank you :D
Example of difference between stringIndex and intIndex behaviour
#This is what happens with string index (union of indexes ordered)
>>> p1 = pd.DataFrame(
... {"a": [1, 2, 3], "b": [4, 5, 6]}, index=["p", "q", "r"]
... )
>>> p2 = pd.DataFrame(
... {"c": [7, 8, 9], "d": [10, 11, 12]}, index=["r", "p", "z"]
... )
>>> pd.concat([p1,p2], axis=1)
a b c d
p 1.0 4.0 8.0 11.0
q 2.0 5.0 NaN NaN
r 3.0 6.0 7.0 10.0
z NaN NaN 9.0 12.0
>>> pd.concat([p2,p1], axis=1)
c d a b
r 7.0 10.0 3.0 6.0
p 8.0 11.0 1.0 4.0
z 9.0 12.0 NaN NaN
q NaN NaN 2.0 5.0
#This is what happens with an int index (sorted, regardless of order of input)
>>> p_int1 = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}, index=[1,2,3])
>>> p_int2 = pd.DataFrame({"c": [7, 8, 9], "d": [10, 11, 12]}, index=[3,1,6])
>>> pd.concat([p_int1,p_int2], axis=1)
a b c d
1 1.0 4.0 8.0 11.0
2 2.0 5.0 NaN NaN
3 3.0 6.0 7.0 10.0
6 NaN NaN 9.0 12.0
>>> pd.concat([p_int2,p_int1], axis=1)
c d a b
1 8.0 11.0 1.0 4.0
2 NaN NaN 2.0 5.0
3 7.0 10.0 3.0 6.0
6 9.0 12.0 NaN NaN
Example of expected output
>>> p_int1 = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]}, index=[1,2,3])
>>> p_int2 = pd.DataFrame({"c": [7, 8, 9], "d": [10, 11, 12]}, index=[3,1,6])
>>> pd.concat([p_int2,p_int1], axis=1, sort=False)
c d a b
3 7.0 10.0 3.0 6.0
1 8.0 11.0 1.0 4.0
6 9.0 12.0 NaN NaN
2 NaN NaN 2.0 5.0
#### Output of ``pd.show_versions(1.1.4)``
<details>
>>> pd.concat([p_int2,p_int1], axis=1)
c d a b
1 8.0 11.0 1.0 4.0
2 NaN NaN 2.0 5.0
3 7.0 10.0 3.0 6.0
6 9.0 12.0 NaN NaN
</details>