Skip to content

API/REGR: construction of Series with scalar-like / len-1 lists #20391

Closed
@jorisvandenbossche

Description

@jorisvandenbossche

At geopandas some tests started failing with pandas master:

In [8]: from geopandas import GeoSeries
   ...: from shapely.geometry import Point

In [9]: p = Point(1, 2)

In [10]: GeoSeries(p, index=['a', 'b', 'c', 'd'])
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-10-9a0cacdb2179> in <module>()
----> 1 GeoSeries(p, index=['a', 'b', 'c', 'd'])

/home/joris/scipy/geopandas/geopandas/geoseries.py in __new__(cls, data, index, crs, **kwargs)
     96                 name = kwargs.get('name', None)
     97             else:
---> 98                 s = pd.Series(data, index=index, **kwargs)
     99                 # prevent trying to convert non-geometry objects
    100                 if s.dtype != object and not s.empty:

/home/joris/scipy/pandas/pandas/core/series.py in __init__(self, data, index, dtype, name, copy, fastpath)
    253                             'Length of passed values is {val}, '
    254                             'index implies {ind}'
--> 255                             .format(val=len(data), ind=len(index)))
    256                 except TypeError:
    257                     pass

ValueError: Length of passed values is 1, index implies 4

previously this replicated the single point multiple times, just as pd.Series(1, index=['a', 'b', 'c', 'd']) gives a Series with four 1's.

This is related to #19714, which removed the broadcasting of 1-length lists in the Series constructor (pd.Series([1], index=['a', 'b', 'c', 'd'])

The reason that geopandas converted the geometry to single element lists, is because geometries are convertable to array (and some are also iterable), and hence not seen as a 'scalar' by pandas (added 4 years ago: geopandas/geopandas#70).

It still works when you do not pass an index:

In [36]: GeoSeries(p)
Out[36]:
0    POINT (1 2)
dtype: object

Note there is also some inconsistency within pandas itself:

In [39]: pd.Series(p)
Out[39]: 
0    POINT (1 2)
dtype: object

In [40]: pd.Series(p, index=['a', 'b', 'c', 'd'])
...
ValueError: Wrong number of items passed 2, placement implies 4

(because in the first case when no index is specifed, p is converted to [p] before passing it to _sanitize_array, it works, but in the seconds case _sanitize_array converts the point p to np.array[1, 2]) (array of its coordinates))

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions