-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
REF/ENH: Constructors for DatetimeArray/TimedeltaArray #23493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…e-constructors
Hello @jbrockmendel! Thanks for updating the PR.
Comment last updated on November 04, 2018 at 18:58 Hours UTC |
@@ -199,10 +198,11 @@ def _join_i8_wrapper(joinf, **kwargs): | |||
|
|||
_engine_type = libindex.DatetimeEngine | |||
|
|||
tz = None | |||
_tz = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A tz
property is defined below
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change needed? (just curious to understand)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am fairly certain that the version currently here is a mistake, since it is overriden by the property definition of tz
below. _freq and _tz are set in _simple_new, so I think the idea of also defining them on the class is to make introspecting what attributes exist easy.
_freq = None | ||
_comparables = ['name', 'freqstr', 'tz'] | ||
_attributes = ['name', 'freq', 'tz'] | ||
timetuple = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is another we'll-need-this-later (when moving from inheritance to composition)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious, what is this? Is it intended to be public? It's not present in the public API for 0.23.4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The stdlib datetime.datetime comparison methods check if this attribute exists and if so return NotImplemented
, otherwise raise TypeError for non-datetime objects
if self._has_same_tz(value): | ||
return _to_m8(value) | ||
raise ValueError('Passed item and index have different timezone') | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just moving this out of the Constructors section for readability
assert expected.freq == dti.freq | ||
assert expected.tz == dti.tz | ||
|
||
# broken until ABCDatetimeArray and isna is fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will be done in the forthcoming PR mentioned in the OP.
|
||
# NB: Among other things not yet ported from the DatetimeIndex | ||
# constructor, this does not call _deepcopy_if_needed | ||
return result | ||
|
||
@classmethod | ||
def _from_sequence(cls, scalars, dtype=None, copy=False): | ||
# list, tuple, or object-dtype ndarray/Index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do you need to turn into an object array here? to_datetime handles all of these cases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right we could make do without it. I like doing this explicitly because to_datetime is already overloaded and circular.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is horribly inefficient and unnecessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't do it here, to_datetime is going to do this. It may be unnecessary, but it is not horribly inefficient. What is a code smell is the circularity involved in calling to_datetime.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then just call array_to_datetime and don’t force the conversion to array
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is the root problem (referenced in your "circularity" comment, and down below in TimedeltaIndex.__new__
) that to_datetime
/ to_timedelta
returns an Index instead of an EA?
Could we have the public to_datetime
just be a simple
array = _to_datetime(...)
return DatetimeIndex(array)
so the internal _to_datetime
returns the array?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is the root problem (referenced in your "circularity" comment, and down below in TimedeltaIndex.new) that to_datetime / to_timedelta returns an Index instead of an EA?
It's not the fact that it's an Index so much as that it is a circular dependency. I think I can resolve this in an upcoming commit.
Looking through to_datetime and _convert_listlike_datetimes, I don't see a conversion to ndarray[object].
_convert_listlike_datetimes calls ensure_object
.
Sorry, to_datetime has in intermediate datetime64[ns] -> object -> datetime64[ns] conversion? That seems unnecessary.
Not sure what you're referring to. As implemented _from_sequence is specifically for list, tuple, or object-dtype NDArray/Index. datetime64-dtype goes through a different path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_convert_listlike_datetimes calls ensure_object.
That's after an
# these are shortcutable
if is_datetime64tz_dtype(arg):
if not isinstance(arg, DatetimeIndex):
return DatetimeIndex(arg, tz=tz, name=name)
if tz == 'utc':
arg = arg.tz_convert(None).tz_localize(tz)
return arg
elif is_datetime64_ns_dtype(arg):
if box and not isinstance(arg, DatetimeIndex):
try:
return DatetimeIndex(arg, tz=tz, name=name)
except ValueError:
pass
return arg
So those both avoid conversion to object.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ExtensionArray._from_sequence
is for any sequence of scalar objects, including a ndarray with a specialized type like datetime64[ns]
. It'll be used, for example, in factorize(ExtensionArray)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TomAugspurger thank you for clarifying; I was under the mistaken impression that it was specifically list/tuple/object-dtype.
Are there any restrictions on kwargs that can be added to it? In particular I'm thinking of freq
and tz
@classmethod | ||
def _from_sequence(cls, scalars, dtype=_TD_DTYPE, copy=False): | ||
# list, tuple, or object-dtype ndarray/Index | ||
values = np.array(scalars, dtype=np.object_, copy=copy) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't call to_timedelta, so this does require that we pass an object array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then it should
let’s not reinvent the wheel
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, it shouldn't. to_timedelta
will just end up calling array_to_timedelta64
like this does, but only after doing a bunch of unecessary dtype checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides this is what TimedeltaIndex.__new__
currently calls
------ | ||
ValueError | ||
""" | ||
if dtype != _TD_DTYPE: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AssertionError no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When called from _simple_new this is internal so AssertionError would make sense, but it is also called from __new__
so is in principle user-facing.
Either way I need to add tests for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well this should never happen all conversions should be before this
so it should assert
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dtype
is part of the signature of TimedeltaArray.__new__
, which is/will be user-facing. If the user passes the wrong dtype, its a ValueError
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no my point is there should be and i think currently there is already conversion
if it’s wrong at this point it’s not a user error but an incorrect path taken
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My point is that this check function is called two times, one of which is the very first thing in TimedeltaArray.__new__
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from the discussion above, is it worth having a 15 line function (including docstrings :-)), for a 2-liner used in two places?
I would maybe simply leave it in place how it was, I think reading something like assert dtype == _TD_DTYPE
in TimedeltaArray._simple_new
is clearer than calling into a helper function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reasonable. But hey, its a nice docstring.
Good comments, thanks. Also some linting mistakes I need to fix. I'll make another pass and comment when this is ready to be looked at. |
Updated with (most) requested edits, basic tests for TimedeltaArray (and fixes to make them pass, particularly implementation of is_monotonic_increasing etc) |
Pushed commits fixing tests, also implemented |
assert isinstance(subarr, np.ndarray), type(subarr) | ||
assert subarr.dtype == 'M8[ns]', subarr.dtype | ||
|
||
subarr = cls._simple_new(subarr, name=name, freq=freq, tz=tz) | ||
if dtype is not None: | ||
if not is_dtype_equal(subarr.dtype, dtype): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This check is made unnecessary by the tz = dtl.validate_tz_from_dtype(dtype, tz)
above (implemented a couple PRs ago)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, this isn't quite right. Will revert/fix.
It should handle any sequence where the scalar types are instances of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jbrockmendel could you summarize where this is in the overall TDA/DTA refactor, and how it gets us closer to the goal (primarily disentangling inheritance -> composition? Anything else?)
def __new__(cls, values, freq=None, tz=None, dtype=None, copy=False): | ||
if isinstance(values, (list, tuple)) or is_object_dtype(values): | ||
values = cls._from_sequence(values, copy=copy) | ||
# TODO: Can we set copy=False here to avoid re-coping? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, then yes you're OK setting copy=False
here. By definition, the conversion to datetime64[ns]
will involve a copy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further question: it is not (yet) possible to simply remove this case? (eventually we should not call the DatetimeArray constructor with an array-like of scalars)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is not (yet) possible to simply remove this case?
Not if we want to share the extant arithmetic tests (which we do)
(eventually we should not call the DatetimeArray constructor with an array-like of scalars)
I don't share this opinion, would prefer to delay this discussion until it is absolutely necessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't share this opinion,
Then please raise this in the appropriate issue, as we have been discussing this before (I think it is #23212, although there is probably some more scattered discussion on other related PRs)
would prefer to delay this discussion until it is absolutely necessary.
It is here that you are redesigning the constructors for the array refactor, IIUC, so if there is a time we should discuss it, it is now I think?
Not if we want to share the extant arithmetic tests (which we do)
Can you clarify this a little bit? At what point do the arithmetic tests need to deal with array of objects?
Eg for boxing the constructed values into Series/Index/Array, there a properly dtyped array can be used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you clarify this a little bit? At what point do the arithmetic tests need to deal with array of objects?
The pertinent word here is "extant". Many of the tests in tests/arithmetic pass a list into tm.box_expected
or klass
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignoring the tests for a moment, I thought we were all on board with the goal of the DatetimelikeArray.__init__
being no inference and no copy.
Back to the tests, it looks like you can you add an entry to box_expected
for DatetimeArray
to return expected = DatetimeArray._from_sequence(expected)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ignoring the tests for a moment, I thought we were all on board with the goal of the DatetimelikeArray.init being no inference and no copy.
My comment to Joris below about mothballing this conversation applies. But short answer is no: I did not get on board with that.
# TODO: Try to do this in just one place | ||
tz = values.dt.tz | ||
values = np.array(values.view('i8')) | ||
elif isinstance(values, DatetimeArrayMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DatetimeArrayMixin
-> cls
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And you don't need to get the tz
in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DatetimeArrayMixin -> cls?
No. For the moment we are still using inheritance, so this would mess up for DatetimeIndex == DatetimeArray. When we change to composition this check will have to become isinstance(values, (DatetimeArray, ABCDatetimeIndex))
|
||
# NB: Among other things not yet ported from the DatetimeIndex | ||
# constructor, this does not call _deepcopy_if_needed | ||
return result | ||
|
||
@classmethod | ||
def _from_sequence(cls, scalars, dtype=None, copy=False): | ||
# list, tuple, or object-dtype ndarray/Index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking through to_datetime
and _convert_listlike_datetimes
, I don't see a conversion to ndarray[object].
|
||
# NB: Among other things not yet ported from the DatetimeIndex | ||
# constructor, this does not call _deepcopy_if_needed | ||
return result | ||
|
||
@classmethod | ||
def _from_sequence(cls, scalars, dtype=None, copy=False): | ||
# list, tuple, or object-dtype ndarray/Index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So is the root problem (referenced in your "circularity" comment, and down below in TimedeltaIndex.__new__
) that to_datetime
/ to_timedelta
returns an Index instead of an EA?
Could we have the public to_datetime
just be a simple
array = _to_datetime(...)
return DatetimeIndex(array)
so the internal _to_datetime
returns the array?
@@ -180,6 +203,23 @@ def _generate_range(cls, start, end, periods, freq, closed=None): | |||
|
|||
return cls._simple_new(index, freq=freq) | |||
|
|||
# ---------------------------------------------------------------- | |||
# Array-Like Methods | |||
# NB: these are appreciably less efficient than the TimedeltaIndex versions |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because of (lack of) caching? This comment makes it seems like it's slower in general, when (if it's caching) it's just slower on repeated use).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW (as mentioned elsewhere), I am not sure we should add them as public methods. If we do so, we should add them to all our EAs, or actually even to the EA interface, and not only to TimedeltaArray (or datetimelike arrays).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do so, we should add them to all our EAs, or actually even to the EA interface
I'm not necessarily opposed to this, but this isn't obvious to me.
Because of (lack of) caching? This comment makes it seems like it's slower in general, when (if it's caching) it's just slower on repeated use).
Because the Index version defines monotonic_increasing, monotonic_decreasing, and is_unique in a single call via _engine.
_freq = None | ||
_comparables = ['name', 'freqstr', 'tz'] | ||
_attributes = ['name', 'freq', 'tz'] | ||
timetuple = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious, what is this? Is it intended to be public? It's not present in the public API for 0.23.4
if not isinstance(data, (np.ndarray, Index, ABCSeries, | ||
DatetimeArrayMixin)): | ||
if is_scalar(data): | ||
raise ValueError('DatetimeIndex() must be called with a ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, this is kinda an API change (raising a TypeError instead of a ValueError).
Seems fine from a consistency P.O.V., but deserves a release note in the API breaking changes section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think currently all public cases raise ValueError, so keeping it on that would not give an API change?
(I agree that TypeError is slightly more appropriate though)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference would have been to keep it a ValueError and change to a TypeError in a separate PR, but here we are... will add a note in Breaking Changes.
tz = values.tz | ||
|
||
# TODO: what about if freq == 'infer'? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then we should also get the freq
from the values as a "cheap" inference? Or would there be cases were an inferred frequency can be different than the actual frequency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
then we should also get the freq from the values as a "cheap" inference?
That's what I'm thinking, yah
def __new__(cls, values, freq=None, tz=None, dtype=None, copy=False): | ||
if isinstance(values, (list, tuple)) or is_object_dtype(values): | ||
values = cls._from_sequence(values, copy=copy) | ||
# TODO: Can we set copy=False here to avoid re-coping? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Further question: it is not (yet) possible to simply remove this case? (eventually we should not call the DatetimeArray constructor with an array-like of scalars)
if isinstance(values, DatetimeArrayMixin): | ||
if lib.is_scalar(values): | ||
raise TypeError(dtl.scalar_data_error(values, cls)) | ||
elif isinstance(values, ABCSeries): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would get out the _values
, and then treat that the same as directly passing a DatetimeIndex/DatetimeArray ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll see if there is a graceful way to do this in the next pass (if I ever manage to catch up with all these comments!)
# TODO: Try to do this in just one place | ||
tz = values.dt.tz | ||
values = np.array(values.view('i8')) | ||
elif isinstance(values, DatetimeArrayMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And you don't need to get the tz
in this case?
if not isinstance(data, (np.ndarray, Index, ABCSeries, | ||
DatetimeArrayMixin)): | ||
if is_scalar(data): | ||
raise ValueError('DatetimeIndex() must be called with a ' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think currently all public cases raise ValueError, so keeping it on that would not give an API change?
(I agree that TypeError is slightly more appropriate though)
@@ -199,10 +198,11 @@ def _join_i8_wrapper(joinf, **kwargs): | |||
|
|||
_engine_type = libindex.DatetimeEngine | |||
|
|||
tz = None | |||
_tz = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this change needed? (just curious to understand)
data = data.astype(_TD_DTYPE) | ||
else: | ||
data = ensure_int64(data).view(_TD_DTYPE) | ||
arr = TimedeltaArrayMixin(data, freq=freq) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should replace the handling of object dtype with TimedeltaArray
constructor (this should not be able to handle object dtype eventually), but if needed that is fine to leave for a later PR that actually does the split / cleans up the TimedeltaArray constructor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on board with the "leave for later" part of this
with pytest.raises(TypeError): | ||
pd.DatetimeIndex(pd.Timestamp.now()) | ||
|
||
def test_from_sequence_requires_1dim(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be a test in the EA base tests.
Although, thinking about it now, I am not sure we should require implementors to handle this case, as the method should never be called with 2D data to begin with.
What is the reason you added handling of this to that method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason you added handling of this to that method?
I'm not sure I understand the question. Is there a reason not to?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there ever a case where call _from_sequence
internally with 2D data? And if so, what case?
I would think that at the point we internally call _from_sequence
, we are sure it is a 1D array (eg as the result of some operation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're far more competent than I am and can be counted on not to make this particular mistake; I cannot.
If we're not allowing object-dtype/lists in __init__/__new__
then we are basically forcing users to use the (private!) _from_sequence
; validation need to happen somewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _from_sequence
method has a specific purpose in the EA interface: converting an array-like of scalars back to a proper Array (and should never be called by a user). Eg this method is used when doing an astype(EAdtype)
, but I was thinking that it might well be that in those cases that we call it, it already has been checked the data is 1D.
So I am simply honestly wondering if there is a specific case that you encountered now where this check was needed.
But I suppose the answer is that you are using _from_sequence
to validate generic user input, and there the user can pass 2D data and we should invalidate it.
However, it is not only you that do it, we actually already do it when doing Series(..., dtype=EAdtype)
(where also user input is processed in _from_sequence
), so indeed, we should probably have this check in general.
So long story short: it indeed might make sense to add this type check. But then to repeat my initial comment: this should then be a test in the base EA tests to ensure all EAs do this properly?
dti = pd.date_range('2016-01-1', freq='MS', periods=9, tz=tz) | ||
|
||
# Fails because np.array(dti, dtype=object) incorrectly returns Longs | ||
result = DatetimeArray(np.array(dti, dtype=object), freq='infer') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I mentioned above, but I don't think DatetimeArray
constructor should handle this case?
That is what we discussed in #23212 (although, @jbrockmendel , you didn't really react there)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully I've answered this enough times in this thread. I see no reason not to handle object dtype in the DatetimeArray constructor. Series and Index and DataFrame all handle object-dtype and lists; I find it counter-intuitive that we would have a small subset of classes that don't.
But in the short-run, we need to handle these cases if we want to share the extant tests (without significant overhaul) (which we do)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #23493 (comment) for my answer to a previous related comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elsewhere in this thread you've said you're fine with bikeshedding this after we have a working implementation. Has this changed in the last few minutes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose you are referring to #23493 (comment) ? Indeed, I was somewhat inconsistent here, but I think the other comments are quite clear in that I am at least asking why you are adding or keeping handling object dtypes?
If the answer is: because I think that is the way it should be, then let's discuss that and try to get to a consensus (#23212). If the answer is: because of practical reasons for now, then let's discuss the practical reasons and see if that is OK to defer to a follow-up, or if there is an easy way to overcome the practical reason.
@jorisvandenbossche Your attention has been specifically requested in #23514, whereas I am finding it increasingly frustrating. I propose we step back from this conversation so I can spend some time addressing the subset of comments on which there is consensus. |
Maybe we can spend a bit of time building consensus on a direction forward? I'll try to build my own thoughts here on a proposal, as a response / concurrence to FWIW, everything on the TimedeltaArray / DatetimeArray is on my critical path, so I'm going to prioritize reviewing your PRs over everything else. |
@jbrockmendel you have to see my comments in the same light. I was assuming we had this discussion and agreed on the design of the constructors, so I was wondering why we couldn't already follow that decision in this PR instead of deferring to a follow-up.
Just as for Tom, this PR (and all issues related to the EA refactor) is high on my priority list, so I will still put time on this topic (which is more than writing comments here). If you have specific feedback on how I can make my comments less frustrating apart from the above, I am honestly all ears. But indeed, let's first try to build consensus on the fundamental design questions. I would propose to not do that here on this PR but on the general issue #23185 and the constructor split-off issue #23212, so we can keep the discussions in this PR on technical implementation details of things we in general already agree on. I would personally propose to have a chat about this, as a more effective way to discuss things. But @jbrockmendel, I think that only makes sense if you can fix the audio issues we were having last time. If that is not possible right now, I would propose to do it on chat at a given time to at least have it a bit more interactive than on github. |
I appreciate the consideration. At this point I think the short-term solution is for me to take a half-step back and disengage. The request that you do the same was primarily so you don't feel that I'm becoming non-responsive. Medium-run, I very much think the priority should be getting enough of this in place that we can get the tests working. Later if you want to implement I'll be going AFK (at least on this thread) for a few hours. |
Good news, for a certain value of "good". In the next pass of de-duplication I found some subtle bugs in the TimedeltaIndex construction, including some that we are specifically testing. So there will be at least one fixing-things PR before this comes back to the forefront. In the interim, we can make incremental progress on orthogonal DTA/TDA code in #23415 and #23502. |
I know I am not at all in the position to force things, so I will just give my opinion (and take it as that :-)): I think we should rather focus our effort on moving forward this PR (eg start trying to reach consensus on the discussion points above).
I thought you mentioned #23415 was kind of dormant for now? On #23502 I added some additional comments. |
You thought correctly, but it may be worthwhile to un-dormant-ize it so as to maintain forward momentum while bugs get sorted out.
Disagree on both counts. Without the bugs being fixed, we can't institute meaningful tests for the array classes. On the other hand, whether we rename |
Closing. I’ll salvage any useful tests in the PR that comes after #23539. |
Big push on the constructors for DatetimeArray/TimedeltaArray, some progress de-duplicating code from their Index counterparts.
As discussed elsewhere, adds
dtype
toPeriodArray.__init__
@TomAugspurger can you confirm that the
_from_sequence
s implemented here handle the right cases? It wasn't obvious if object-dtyped array/indexes were supposed to be handled there, but combining those cases was too clean to overlook.Small fix in DatetimeArray comparison methods, just enough to make the tests work. A separate PR forthcoming PR will do a more thorough fix/improvements of those.
One non-obvious point on which input would be especially welcome is how/when to use the
copy
kwarg in such a way as to copy at-most-once. (related: deep_copy_if_needed and #21907)#23491 found during this process, will be addressed in a follow-up.