-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
CLN: unify NumpyBlock, ObjectBlock, and NumericBlock #52817
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Its complicated. We've landed on calling pandas.core.internals.api "pseudo-public" and acknowledging that some packages (pyarrow) access them even though they're not really public. Last time we removed a Block subclass there were complaints and we ended up reverting it and changing the removal to a deprecation (which was enforced in 2.0, so now it is gone). |
OK, I’ll go ahead and add deprecated aliases. Thanks for the context and the prior art pointer. Will hopefully be pushing an update today that fixes the tests and addresses the comments. |
b8c3fa0
to
7aa66d5
Compare
4e52afb
to
5642417
Compare
Is there anything I can do on my end to move this forward? |
9dffc8a
to
7819691
Compare
648cc6c
to
f140bbd
Compare
Rebased to fix a conflict. |
@@ -387,7 +384,7 @@ def test_constructor_no_pandas_array(self): | |||
result = DataFrame({"A": arr}) | |||
expected = DataFrame({"A": [1, 2, 3]}) | |||
tm.assert_frame_equal(result, expected) | |||
assert isinstance(result._mgr.blocks[0], NumericBlock) | |||
assert isinstance(result._mgr.blocks[0], NumpyBlock) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check is_numeric here?
|
||
@cache_readonly | ||
def is_object(self) -> bool: # type: ignore[override] | ||
return self.values.dtype.kind == "O" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there dtypes other than object-dtype that have kind=="O"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope. FWIW, the letter codes are defined as an enum here: https://github.com/numpy/numpy/blob/f45f692abac62265b760c0a249e8b417707c47d1/numpy/core/include/numpy/ndarraytypes.h#L94-L140
f140bbd
to
e6f9b3f
Compare
Would appreciate another round of review. The test failure is unrelated. |
LGTM @ngoldbaum can you merge main one more time to be on the safe side |
Nice work @ngoldbaum |
* CLN: unify NumpyBlock, ObjectBlock, and NumericBlock * CLN: respond to review comments * CLN: deprecate ObjectBlock and NumericBlock * CLN: appease mypy * CLN: remove out-of-date reference to maybe_split * CLN: respond to review comments * CLN: remove NumpyBlock from the semi-public API * CLN: test for is_numeric in block internals tests
* CLN: unify NumpyBlock, ObjectBlock, and NumericBlock * CLN: respond to review comments * CLN: deprecate ObjectBlock and NumericBlock * CLN: appease mypy * CLN: remove out-of-date reference to maybe_split * CLN: respond to review comments * CLN: remove NumpyBlock from the semi-public API * CLN: test for is_numeric in block internals tests
* CLN: unify NumpyBlock, ObjectBlock, and NumericBlock * CLN: respond to review comments * CLN: deprecate ObjectBlock and NumericBlock * CLN: appease mypy * CLN: remove out-of-date reference to maybe_split * CLN: respond to review comments * CLN: remove NumpyBlock from the semi-public API * CLN: test for is_numeric in block internals tests
This PR unifies
NumpyBlock
,ObjectBlock
, andNumericBlock
intoNumpyBlock
.xref #52413
This is part of my effort to make it easier for pandas to support new-style dtypes like the variable width string dtype I'm working on that will hopefully replace usages of object string arrays in pandas (#47884). New-style dtypes have
dtype.kind
set to a null character, so usingdtype.kind
to discriminate between numpy dtypes aspandas.core.internals.blocks.get_block_type
currently does makes it more difficult to support new-style dtypes, as I'd need to add additional checks there that will hurt performance in a hot code path. @jbrockmendel suggested this approach instead.It wasn't clear to me what the backward compatibility guarantees are for
pandas.core.internals
. I think adding anObjectBlock
andNumericBlock
that are (deprecated?) aliases toNumpyBlock
will be sufficient to maintain back compat if that's desired.As a bonus, it's no longer necessary to check
dtype.kind
twice in the block creation fast path - I moved the second check into the newNumpyBlock.is_numeric
cached property - so I think this may be a tiny bit faster for benchmarks sensitive to block creation overhead.