Description
Assume you have a series, which has a certain dtype. In the case that this dtype is an instance of potentially multiple variants of a logical dtype (for example, string backed by python or backed by pyarrow), how do you check for the "logical" equality of such dtypes?
For checking the logical equality for one series, you have the option to compare it with the generic string alias (which will return True for any variant of it) or checking the dtype with isinstance
or some is_..._dtype
(although we have deprecated some of those). Using string dtype as the example:
ser.dtype == "string"
# or
isinstance(ser.dtype, pd.StringDtype)
pd.api.types.is_string_dtype(ser.dtype)
When you want to check if two serieses have the same dtype, the ==
will check for exact equality (in the string dtype example, the below can evaluate to False even if both are a StringDtype, but have a different storage):
ser1.dtype == ser2.dtype
But so how to check this logical equality for two dtypes? In the example, how to know that both dtypes are representing the same logical dtype (i.e. both a StringDtype instance), without necessarily wanting to check the exact type (i.e. the user doesn't necessarily know it are string dtypes, just want to check if they are logically the same)
# this might work?
type(ser1.dtype) == type(ser2.dtype)
Do we want some other API here that is a bit more user friendly? (just brainstorming, something like dtype1.is_same_type(dtype2)
, or a function, ..)
This is important in the discussion around logical dtypes (#58455), but so it is already an issue for the new string dtype as well in pandas 3.0
cc @WillAyd @Dr-Irv @jbrockmendel (tagging some people that were most active in the logical dtypes discussion)