Skip to content

BUG: StringArray non-extensible due to inconsisent assertion #34309

Open
@sbrugman

Description

@sbrugman

Code Sample, a copy-pastable example

import pandas as pd
from pandas import StringDtype
from pandas.core.arrays import StringArray
from pandas.core.dtypes.dtypes import register_extension_dtype

@register_extension_dtype
class MyExtensionDtype(StringDtype):
    name = 'my_extension'

    def __repr__(self) -> str:
        return "MyExtensionDtype"

    @classmethod
    def construct_array_type(cls) -> "Type[MyExtensionStringArray]":
        return MyExtensionStringArray

class MyExtensionStringArray(StringArray):
    def __init__(self, values, copy=False):
        super().__init__(values, copy)
        self._dtype = MyExtensionDtype()

series = pd.Series(["test", "test2"], dtype="my_extension")
assert series.dtype == 'my_extension'

Results in
assert dtype == "string" AssertionError

Problem description

It should be possible to extend the StringDtype/StringArray for users to design efficient subtypes. I believe that the the AssertionError is a bug and not intended, as pandas wants to have extensible dtypes, because there is the ExtensionDtype.

Expected Output

The code above should pass without errors.

PR with fix on it's way.

Output of pd.show_versions()

pandas v1.0.3

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugExtensionArrayExtending pandas with custom dtypes or arrays.StringsString extension data type and string dataSubclassingSubclassing pandas objects

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions