REF/ENH: Constructors for DatetimeArray/TimedeltaArray #23493

jbrockmendel · 2018-11-04T18:52:37Z

Big push on the constructors for DatetimeArray/TimedeltaArray, some progress de-duplicating code from their Index counterparts.

As discussed elsewhere, adds dtype to PeriodArray.__init__

@TomAugspurger can you confirm that the _from_sequences implemented here handle the right cases? It wasn't obvious if object-dtyped array/indexes were supposed to be handled there, but combining those cases was too clean to overlook.

Small fix in DatetimeArray comparison methods, just enough to make the tests work. A separate PR forthcoming PR will do a more thorough fix/improvements of those.

One non-obvious point on which input would be especially welcome is how/when to use the copy kwarg in such a way as to copy at-most-once. (related: deep_copy_if_needed and #21907)

#23491 found during this process, will be addressed in a follow-up.

…e-constructors

… to pass them

pep8speaks · 2018-11-04T18:52:47Z

Hello @jbrockmendel! Thanks for updating the PR.

There are no PEP8 issues in the file pandas/core/arrays/datetimelike.py !
There are no PEP8 issues in the file pandas/core/arrays/datetimes.py !
There are no PEP8 issues in the file pandas/core/arrays/period.py !
There are no PEP8 issues in the file pandas/core/arrays/timedeltas.py !
There are no PEP8 issues in the file pandas/core/indexes/datetimes.py !
There are no PEP8 issues in the file pandas/core/indexes/timedeltas.py !
There are no PEP8 issues in the file pandas/tests/arrays/test_datetimes.py !

Comment last updated on November 04, 2018 at 18:58 Hours UTC

pandas/core/arrays/datetimes.py

jbrockmendel · 2018-11-04T18:55:04Z

pandas/core/indexes/datetimes.py

@@ -199,10 +198,11 @@ def _join_i8_wrapper(joinf, **kwargs):

    _engine_type = libindex.DatetimeEngine

-    tz = None
+    _tz = None


A tz property is defined below

Why is this change needed? (just curious to understand)

I am fairly certain that the version currently here is a mistake, since it is overriden by the property definition of tz below. _freq and _tz are set in _simple_new, so I think the idea of also defining them on the class is to make introspecting what attributes exist easy.

jbrockmendel · 2018-11-04T18:55:33Z

pandas/core/indexes/datetimes.py

    _freq = None
    _comparables = ['name', 'freqstr', 'tz']
    _attributes = ['name', 'freq', 'tz']
+    timetuple = None


This is another we'll-need-this-later (when moving from inheritance to composition)

I'm curious, what is this? Is it intended to be public? It's not present in the public API for 0.23.4

The stdlib datetime.datetime comparison methods check if this attribute exists and if so return NotImplemented, otherwise raise TypeError for non-datetime objects

jbrockmendel · 2018-11-04T18:55:59Z

pandas/core/indexes/datetimes.py

-        if self._has_same_tz(value):
-            return _to_m8(value)
-        raise ValueError('Passed item and index have different timezone')
-


Just moving this out of the Constructors section for readability

jbrockmendel · 2018-11-04T18:56:22Z

pandas/tests/arrays/test_datetimes.py

+        assert expected.freq == dti.freq
+        assert expected.tz == dti.tz
+
+        # broken until ABCDatetimeArray and isna is fixed


This will be done in the forthcoming PR mentioned in the OP.

pandas/core/arrays/datetimes.py

jreback · 2018-11-04T20:18:12Z

pandas/core/arrays/datetimes.py


        # NB: Among other things not yet ported from the DatetimeIndex
        # constructor, this does not call _deepcopy_if_needed
        return result

+    @classmethod
+    def _from_sequence(cls, scalars, dtype=None, copy=False):
+        # list, tuple, or object-dtype ndarray/Index


why do you need to turn into an object array here? to_datetime handles all of these cases

You're right we could make do without it. I like doing this explicitly because to_datetime is already overloaded and circular.

this is horribly inefficient and unnecessary

If we don't do it here, to_datetime is going to do this. It may be unnecessary, but it is not horribly inefficient. What is a code smell is the circularity involved in calling to_datetime.

then just call array_to_datetime and don’t force the conversion to array

So is the root problem (referenced in your "circularity" comment, and down below in TimedeltaIndex.__new__) that to_datetime / to_timedelta returns an Index instead of an EA?

Could we have the public to_datetime just be a simple

array = _to_datetime(...) return DatetimeIndex(array)

so the internal _to_datetime returns the array?

So is the root problem (referenced in your "circularity" comment, and down below in TimedeltaIndex.new) that to_datetime / to_timedelta returns an Index instead of an EA?

It's not the fact that it's an Index so much as that it is a circular dependency. I think I can resolve this in an upcoming commit.

Looking through to_datetime and _convert_listlike_datetimes, I don't see a conversion to ndarray[object].

_convert_listlike_datetimes calls ensure_object.

Sorry, to_datetime has in intermediate datetime64[ns] -> object -> datetime64[ns] conversion? That seems unnecessary.

Not sure what you're referring to. As implemented _from_sequence is specifically for list, tuple, or object-dtype NDArray/Index. datetime64-dtype goes through a different path.

_convert_listlike_datetimes calls ensure_object.

That's after an

# these are shortcutable if is_datetime64tz_dtype(arg): if not isinstance(arg, DatetimeIndex): return DatetimeIndex(arg, tz=tz, name=name) if tz == 'utc': arg = arg.tz_convert(None).tz_localize(tz) return arg elif is_datetime64_ns_dtype(arg): if box and not isinstance(arg, DatetimeIndex): try: return DatetimeIndex(arg, tz=tz, name=name) except ValueError: pass return arg

So those both avoid conversion to object.

ExtensionArray._from_sequence is for any sequence of scalar objects, including a ndarray with a specialized type like datetime64[ns]. It'll be used, for example, in factorize(ExtensionArray).

@TomAugspurger thank you for clarifying; I was under the mistaken impression that it was specifically list/tuple/object-dtype.

Are there any restrictions on kwargs that can be added to it? In particular I'm thinking of freq and tz

jreback · 2018-11-04T20:19:38Z

pandas/core/arrays/timedeltas.py

+    @classmethod
+    def _from_sequence(cls, scalars, dtype=_TD_DTYPE, copy=False):
+        # list, tuple, or object-dtype ndarray/Index
+        values = np.array(scalars, dtype=np.object_, copy=copy)


This doesn't call to_timedelta, so this does require that we pass an object array.

then it should
let’s not reinvent the wheel

No, it shouldn't. to_timedelta will just end up calling array_to_timedelta64 like this does, but only after doing a bunch of unecessary dtype checks.

Besides this is what TimedeltaIndex.__new__ currently calls

jreback · 2018-11-04T20:19:53Z

pandas/core/arrays/timedeltas.py

+    ------
+    ValueError
+    """
+    if dtype != _TD_DTYPE:


AssertionError no?

When called from _simple_new this is internal so AssertionError would make sense, but it is also called from __new__ so is in principle user-facing.

Either way I need to add tests for this.

well this should never happen all conversions should be before this

so it should assert

dtype is part of the signature of TimedeltaArray.__new__, which is/will be user-facing. If the user passes the wrong dtype, its a ValueError.

no my point is there should be and i think currently there is already conversion

if it’s wrong at this point it’s not a user error but an incorrect path taken

My point is that this check function is called two times, one of which is the very first thing in TimedeltaArray.__new__.

Apart from the discussion above, is it worth having a 15 line function (including docstrings :-)), for a 2-liner used in two places?
I would maybe simply leave it in place how it was, I think reading something like assert dtype == _TD_DTYPE in TimedeltaArray._simple_new is clearer than calling into a helper function

Reasonable. But hey, its a nice docstring.

pandas/core/indexes/datetimes.py

jbrockmendel · 2018-11-04T21:34:29Z

Good comments, thanks. Also some linting mistakes I need to fix. I'll make another pass and comment when this is ready to be looked at.

…e-constructors

jbrockmendel · 2018-11-05T01:24:59Z

Updated with (most) requested edits, basic tests for TimedeltaArray (and fixes to make them pass, particularly implementation of is_monotonic_increasing etc)

jbrockmendel · 2018-11-05T04:57:59Z

Pushed commits fixing tests, also implemented maybe_validate_freq. I think this can be handled more cleverly in some cases, will look into this. There was also an Issue about removing the verify_integrity argument from some of the constructors. Now would be decent time to do so.

jbrockmendel · 2018-11-05T04:59:26Z

pandas/core/indexes/datetimes.py

        assert isinstance(subarr, np.ndarray), type(subarr)
        assert subarr.dtype == 'M8[ns]', subarr.dtype

        subarr = cls._simple_new(subarr, name=name, freq=freq, tz=tz)
-        if dtype is not None:
-            if not is_dtype_equal(subarr.dtype, dtype):


This check is made unnecessary by the tz = dtl.validate_tz_from_dtype(dtype, tz) above (implemented a couple PRs ago)

Nope, this isn't quite right. Will revert/fix.

TomAugspurger · 2018-11-05T12:53:10Z

@TomAugspurger can you confirm that the _from_sequences implemented here handle the right cases? It wasn't obvious if object-dtyped array/indexes were supposed to be handled there, but combining those cases was too clean to overlook.

It should handle any sequence where the scalar types are instances of ExtensionArray.dtype.type or NA. So yes, object-dtype arrays should be handled.

TomAugspurger

@jbrockmendel could you summarize where this is in the overall TDA/DTA refactor, and how it gets us closer to the goal (primarily disentangling inheritance -> composition? Anything else?)

pandas/core/arrays/datetimelike.py

TomAugspurger · 2018-11-05T13:02:59Z

pandas/core/arrays/datetimes.py

+    def __new__(cls, values, freq=None, tz=None, dtype=None, copy=False):
+        if isinstance(values, (list, tuple)) or is_object_dtype(values):
+            values = cls._from_sequence(values, copy=copy)
+            # TODO: Can we set copy=False here to avoid re-coping?


IIUC, then yes you're OK setting copy=False here. By definition, the conversion to datetime64[ns] will involve a copy.

Further question: it is not (yet) possible to simply remove this case? (eventually we should not call the DatetimeArray constructor with an array-like of scalars)

it is not (yet) possible to simply remove this case?

Not if we want to share the extant arithmetic tests (which we do)

(eventually we should not call the DatetimeArray constructor with an array-like of scalars)

I don't share this opinion, would prefer to delay this discussion until it is absolutely necessary.

I don't share this opinion,

Then please raise this in the appropriate issue, as we have been discussing this before (I think it is #23212, although there is probably some more scattered discussion on other related PRs)

would prefer to delay this discussion until it is absolutely necessary.

It is here that you are redesigning the constructors for the array refactor, IIUC, so if there is a time we should discuss it, it is now I think?

Not if we want to share the extant arithmetic tests (which we do)

Can you clarify this a little bit? At what point do the arithmetic tests need to deal with array of objects?
Eg for boxing the constructed values into Series/Index/Array, there a properly dtyped array can be used?

Can you clarify this a little bit? At what point do the arithmetic tests need to deal with array of objects?

The pertinent word here is "extant". Many of the tests in tests/arithmetic pass a list into tm.box_expected or klass.

Ignoring the tests for a moment, I thought we were all on board with the goal of the DatetimelikeArray.__init__ being no inference and no copy.

Back to the tests, it looks like you can you add an entry to box_expected for DatetimeArray to return expected = DatetimeArray._from_sequence(expected)?

Ignoring the tests for a moment, I thought we were all on board with the goal of the DatetimelikeArray.init being no inference and no copy.

My comment to Joris below about mothballing this conversation applies. But short answer is no: I did not get on board with that.

pandas/core/arrays/datetimes.py

TomAugspurger · 2018-11-05T13:07:20Z

pandas/core/arrays/datetimes.py

+                # TODO: Try to do this in just one place
+                tz = values.dt.tz
+            values = np.array(values.view('i8'))
+        elif isinstance(values, DatetimeArrayMixin):


DatetimeArrayMixin -> cls?

And you don't need to get the tz in this case?

DatetimeArrayMixin -> cls?

No. For the moment we are still using inheritance, so this would mess up for DatetimeIndex == DatetimeArray. When we change to composition this check will have to become isinstance(values, (DatetimeArray, ABCDatetimeIndex))

TomAugspurger · 2018-11-05T13:12:03Z

pandas/core/arrays/datetimes.py


        # NB: Among other things not yet ported from the DatetimeIndex
        # constructor, this does not call _deepcopy_if_needed
        return result

+    @classmethod
+    def _from_sequence(cls, scalars, dtype=None, copy=False):
+        # list, tuple, or object-dtype ndarray/Index


Looking through to_datetime and _convert_listlike_datetimes, I don't see a conversion to ndarray[object].

TomAugspurger · 2018-11-05T13:15:21Z

pandas/core/arrays/datetimes.py


        # NB: Among other things not yet ported from the DatetimeIndex
        # constructor, this does not call _deepcopy_if_needed
        return result

+    @classmethod
+    def _from_sequence(cls, scalars, dtype=None, copy=False):
+        # list, tuple, or object-dtype ndarray/Index


So is the root problem (referenced in your "circularity" comment, and down below in TimedeltaIndex.__new__) that to_datetime / to_timedelta returns an Index instead of an EA?

Could we have the public to_datetime just be a simple

array = _to_datetime(...) return DatetimeIndex(array)

so the internal _to_datetime returns the array?

TomAugspurger · 2018-11-05T13:16:44Z

pandas/core/arrays/timedeltas.py

@@ -180,6 +203,23 @@ def _generate_range(cls, start, end, periods, freq, closed=None):

        return cls._simple_new(index, freq=freq)

+    # ----------------------------------------------------------------
+    # Array-Like Methods
+    # NB: these are appreciably less efficient than the TimedeltaIndex versions


Because of (lack of) caching? This comment makes it seems like it's slower in general, when (if it's caching) it's just slower on repeated use).

BTW (as mentioned elsewhere), I am not sure we should add them as public methods. If we do so, we should add them to all our EAs, or actually even to the EA interface, and not only to TimedeltaArray (or datetimelike arrays).

If we do so, we should add them to all our EAs, or actually even to the EA interface

I'm not necessarily opposed to this, but this isn't obvious to me.

Because of (lack of) caching? This comment makes it seems like it's slower in general, when (if it's caching) it's just slower on repeated use).

Because the Index version defines monotonic_increasing, monotonic_decreasing, and is_unique in a single call via _engine.

TomAugspurger · 2018-11-05T13:19:31Z

pandas/core/indexes/datetimes.py

    _freq = None
    _comparables = ['name', 'freqstr', 'tz']
    _attributes = ['name', 'freq', 'tz']
+    timetuple = None


I'm curious, what is this? Is it intended to be public? It's not present in the public API for 0.23.4

TomAugspurger · 2018-11-05T13:21:18Z

pandas/core/indexes/datetimes.py

-        if not isinstance(data, (np.ndarray, Index, ABCSeries,
-                                 DatetimeArrayMixin)):
-            if is_scalar(data):
-                raise ValueError('DatetimeIndex() must be called with a '


Ah, this is kinda an API change (raising a TypeError instead of a ValueError).

Seems fine from a consistency P.O.V., but deserves a release note in the API breaking changes section.

I think currently all public cases raise ValueError, so keeping it on that would not give an API change?
(I agree that TypeError is slightly more appropriate though)

My preference would have been to keep it a ValueError and change to a TypeError in a separate PR, but here we are... will add a note in Breaking Changes.

pandas/core/arrays/datetimes.py

jorisvandenbossche · 2018-11-05T14:24:34Z

pandas/core/arrays/datetimes.py

            tz = values.tz

+        # TODO: what about if freq == 'infer'?


then we should also get the freq from the values as a "cheap" inference? Or would there be cases were an inferred frequency can be different than the actual frequency?

then we should also get the freq from the values as a "cheap" inference?

That's what I'm thinking, yah

jorisvandenbossche · 2018-11-05T14:27:58Z

pandas/core/arrays/datetimes.py

+    def __new__(cls, values, freq=None, tz=None, dtype=None, copy=False):
+        if isinstance(values, (list, tuple)) or is_object_dtype(values):
+            values = cls._from_sequence(values, copy=copy)
+            # TODO: Can we set copy=False here to avoid re-coping?


Further question: it is not (yet) possible to simply remove this case? (eventually we should not call the DatetimeArray constructor with an array-like of scalars)

jorisvandenbossche · 2018-11-05T14:28:42Z

pandas/core/arrays/datetimes.py

-        if isinstance(values, DatetimeArrayMixin):
+        if lib.is_scalar(values):
+            raise TypeError(dtl.scalar_data_error(values, cls))
+        elif isinstance(values, ABCSeries):


I would get out the _values, and then treat that the same as directly passing a DatetimeIndex/DatetimeArray ?

I'll see if there is a graceful way to do this in the next pass (if I ever manage to catch up with all these comments!)

jorisvandenbossche · 2018-11-05T14:29:19Z

pandas/core/arrays/datetimes.py

+                # TODO: Try to do this in just one place
+                tz = values.dt.tz
+            values = np.array(values.view('i8'))
+        elif isinstance(values, DatetimeArrayMixin):


And you don't need to get the tz in this case?

jorisvandenbossche · 2018-11-05T14:54:03Z

pandas/core/indexes/datetimes.py

-        if not isinstance(data, (np.ndarray, Index, ABCSeries,
-                                 DatetimeArrayMixin)):
-            if is_scalar(data):
-                raise ValueError('DatetimeIndex() must be called with a '


I think currently all public cases raise ValueError, so keeping it on that would not give an API change?
(I agree that TypeError is slightly more appropriate though)

jorisvandenbossche · 2018-11-05T14:55:34Z

pandas/core/indexes/datetimes.py

@@ -199,10 +198,11 @@ def _join_i8_wrapper(joinf, **kwargs):

    _engine_type = libindex.DatetimeEngine

-    tz = None
+    _tz = None


Why is this change needed? (just curious to understand)

jorisvandenbossche · 2018-11-05T15:00:15Z

pandas/core/indexes/timedeltas.py

-                data = data.astype(_TD_DTYPE)
-            else:
-                data = ensure_int64(data).view(_TD_DTYPE)
+        arr = TimedeltaArrayMixin(data, freq=freq)


I don't think we should replace the handling of object dtype with TimedeltaArray constructor (this should not be able to handle object dtype eventually), but if needed that is fine to leave for a later PR that actually does the split / cleans up the TimedeltaArray constructor

I'm on board with the "leave for later" part of this

jorisvandenbossche · 2018-11-05T15:02:37Z

pandas/tests/arrays/test_datetimes.py

+        with pytest.raises(TypeError):
+            pd.DatetimeIndex(pd.Timestamp.now())
+
+    def test_from_sequence_requires_1dim(self):


This can be a test in the EA base tests.

Although, thinking about it now, I am not sure we should require implementors to handle this case, as the method should never be called with 2D data to begin with.

What is the reason you added handling of this to that method?

What is the reason you added handling of this to that method?

I'm not sure I understand the question. Is there a reason not to?

Is there ever a case where call _from_sequence internally with 2D data? And if so, what case?

I would think that at the point we internally call _from_sequence, we are sure it is a 1D array (eg as the result of some operation).

You're far more competent than I am and can be counted on not to make this particular mistake; I cannot.

If we're not allowing object-dtype/lists in __init__/__new__ then we are basically forcing users to use the (private!) _from_sequence; validation need to happen somewhere.

The _from_sequence method has a specific purpose in the EA interface: converting an array-like of scalars back to a proper Array (and should never be called by a user). Eg this method is used when doing an astype(EAdtype), but I was thinking that it might well be that in those cases that we call it, it already has been checked the data is 1D.

So I am simply honestly wondering if there is a specific case that you encountered now where this check was needed.

But I suppose the answer is that you are using _from_sequence to validate generic user input, and there the user can pass 2D data and we should invalidate it.

However, it is not only you that do it, we actually already do it when doing Series(..., dtype=EAdtype) (where also user input is processed in _from_sequence), so indeed, we should probably have this check in general.

So long story short: it indeed might make sense to add this type check. But then to repeat my initial comment: this should then be a test in the base EA tests to ensure all EAs do this properly?

jorisvandenbossche · 2018-11-05T15:06:22Z

pandas/tests/arrays/test_datetimes.py

+        dti = pd.date_range('2016-01-1', freq='MS', periods=9, tz=tz)
+
+        # Fails because np.array(dti, dtype=object) incorrectly returns Longs
+        result = DatetimeArray(np.array(dti, dtype=object), freq='infer')


I think I mentioned above, but I don't think DatetimeArray constructor should handle this case?

That is what we discussed in #23212 (although, @jbrockmendel , you didn't really react there)

Hopefully I've answered this enough times in this thread. I see no reason not to handle object dtype in the DatetimeArray constructor. Series and Index and DataFrame all handle object-dtype and lists; I find it counter-intuitive that we would have a small subset of classes that don't.

But in the short-run, we need to handle these cases if we want to share the extant tests (without significant overhaul) (which we do)

See #23493 (comment) for my answer to a previous related comment

Elsewhere in this thread you've said you're fine with bikeshedding this after we have a working implementation. Has this changed in the last few minutes?

I suppose you are referring to #23493 (comment) ? Indeed, I was somewhat inconsistent here, but I think the other comments are quite clear in that I am at least asking why you are adding or keeping handling object dtypes?

If the answer is: because I think that is the way it should be, then let's discuss that and try to get to a consensus (#23212). If the answer is: because of practical reasons for now, then let's discuss the practical reasons and see if that is OK to defer to a follow-up, or if there is an easy way to overcome the practical reason.

jbrockmendel · 2018-11-05T19:21:26Z

@jorisvandenbossche Your attention has been specifically requested in #23514, whereas I am finding it increasingly frustrating. I propose we step back from this conversation so I can spend some time addressing the subset of comments on which there is consensus.

TomAugspurger · 2018-11-05T19:47:36Z

Maybe we can spend a bit of time building consensus on a direction forward? I'll try to build my own thoughts here on a proposal, as a response / concurrence to
#23185 (comment).

FWIW, everything on the TimedeltaArray / DatetimeArray is on my critical path, so I'm going to prioritize reviewing your PRs over everything else.

jorisvandenbossche · 2018-11-05T21:00:24Z

[Tom] I thought we were all on board with the goal of the DatetimelikeArray.init being no inference and no copy.

@jbrockmendel you have to see my comments in the same light. I was assuming we had this discussion and agreed on the design of the constructors, so I was wondering why we couldn't already follow that decision in this PR instead of deferring to a follow-up.
So this mis-assumption might have made things more frustrating than needed for both of us.

Your attention has been specifically requested in #23514, whereas I am finding it increasingly frustrating.

Just as for Tom, this PR (and all issues related to the EA refactor) is high on my priority list, so I will still put time on this topic (which is more than writing comments here).
So you can expect a lot more comments to come (not necessarily now, but when you had the time to update the PR). But please don't see this as a bad sign, but rather as an opportunity to move forward quickly (and as an indicator of the importance of this refactor). Tom's PRs on the SparseArray and PeriodArray also saw a huge amount of comments and discussion, but I think that significantly improved the PRs and moved us relatively quickly to a shared understanding.
And to be clear, it is no problem that it takes some time to process those comments.

If you have specific feedback on how I can make my comments less frustrating apart from the above, I am honestly all ears.

But indeed, let's first try to build consensus on the fundamental design questions. I would propose to not do that here on this PR but on the general issue #23185 and the constructor split-off issue #23212, so we can keep the discussions in this PR on technical implementation details of things we in general already agree on.

I would personally propose to have a chat about this, as a more effective way to discuss things. But @jbrockmendel, I think that only makes sense if you can fix the audio issues we were having last time. If that is not possible right now, I would propose to do it on chat at a given time to at least have it a bit more interactive than on github.

jbrockmendel · 2018-11-05T21:16:03Z

If you have specific feedback on how I can make my comments less frustrating apart from the above, I am honestly all ears.

I appreciate the consideration. At this point I think the short-term solution is for me to take a half-step back and disengage. The request that you do the same was primarily so you don't feel that I'm becoming non-responsive.

Medium-run, I very much think the priority should be getting enough of this in place that we can get the tests working. Later if you want to implement datetime_array mirroring period_array or rename __init__ to _from_sequence (I'll be renaming _from_sequence to _from_objects or something) then that will be entirely trivial to do.

I'll be going AFK (at least on this thread) for a few hours.

jbrockmendel · 2018-11-07T03:25:15Z

Good news, for a certain value of "good". In the next pass of de-duplication I found some subtle bugs in the TimedeltaIndex construction, including some that we are specifically testing. So there will be at least one fixing-things PR before this comes back to the forefront.

In the interim, we can make incremental progress on orthogonal DTA/TDA code in #23415 and #23502.

jorisvandenbossche · 2018-11-07T12:17:58Z

So there will be at least one fixing-things PR before this comes back to the forefront.

I know I am not at all in the position to force things, so I will just give my opinion (and take it as that :-)): I think we should rather focus our effort on moving forward this PR (eg start trying to reach consensus on the discussion points above).
It's really good you found those bugs, and we should certainly fix them, but those are not critical for the refactor, while the things discussed here are.

In the interim, we can make incremental progress on orthogonal DTA/TDA code in #23415 and #23502.

I thought you mentioned #23415 was kind of dormant for now? On #23502 I added some additional comments.

jbrockmendel · 2018-11-07T15:54:09Z

I thought you mentioned #23415 was kind of dormant for now?

You thought correctly, but it may be worthwhile to un-dormant-ize it so as to maintain forward momentum while bugs get sorted out.

but those are not critical for the refactor, while the things discussed here are.

Disagree on both counts. Without the bugs being fixed, we can't institute meaningful tests for the array classes. On the other hand, whether we rename __init__/_simple_new to _from_sequence/__init__ can absolutely be postponed until after we have a fully-working implementation.

jbrockmendel · 2018-11-09T16:59:16Z

Closing. I’ll salvage any useful tests in the PR that comes after #23539.

jbrockmendel added 12 commits November 4, 2018 08:27

implement _from_sequence

f25d24c

Add dtype to periodArray.__init__

e47c200

implement require_m8ns_dtype, from_sequence

2cb7597

small cleanups in datetimeIndex.__new

a4512b7

dispatch parts of TimedeltaIndex.__new__

a5ef959

add copy to constructors

83b04fe

small cleanup

e8abc83

Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…

a4c8671

…e-constructors

implement maybe_define_freq

98dca45

handle ABSeries

56fd95e

implement basic constructor tests, fix just enough of the comparisons…

5f92cfa

… to pass them

add note

0e15536

jbrockmendel commented Nov 4, 2018

View reviewed changes

pandas/core/arrays/datetimes.py Show resolved Hide resolved

jbrockmendel commented Nov 4, 2018

View reviewed changes

flake8 fixup

1a015f6

jreback requested changes Nov 4, 2018

View reviewed changes

jreback added Datetime Datetime data dtype Timedelta Timedelta data type labels Nov 4, 2018

jbrockmendel added 2 commits November 4, 2018 14:22

implement scalar_data_error, with tests

5445a56

docstring

272f4b1

jbrockmendel mentioned this pull request Nov 4, 2018

TST: Tests and Helpers for Datetime/Period Arrays #23502

Merged

jbrockmendel added 2 commits November 4, 2018 17:23

Fix TimedeltaArray infer_freq; implement tests

35195bd

Merge branch 'master' of https://github.com/pandas-dev/pandas into pr…

bb394dc

…e-constructors

update tests for changed exception type

510ae3d

jbrockmendel added 2 commits November 4, 2018 20:45

remove redundant dtype check

49cf495

implement maybe_validate_freq

3a62633

jbrockmendel commented Nov 5, 2018

View reviewed changes

TomAugspurger reviewed Nov 5, 2018

View reviewed changes

jorisvandenbossche reviewed Nov 5, 2018

View reviewed changes

jbrockmendel mentioned this pull request Nov 5, 2018

Datetimelike Array Refactor #23185

Closed

jbrockmendel mentioned this pull request Nov 8, 2018

API: Index and Array constructors design #23212

Closed

jbrockmendel closed this Nov 9, 2018

jbrockmendel deleted the pre-constructors branch April 5, 2020 17:38

REF/ENH: Constructors for DatetimeArray/TimedeltaArray #23493

REF/ENH: Constructors for DatetimeArray/TimedeltaArray #23493

Conversation

jbrockmendel commented Nov 4, 2018

pep8speaks commented Nov 4, 2018 • edited Loading

Comment last updated on November 04, 2018 at 18:58 Hours UTC

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Nov 4, 2018

jbrockmendel commented Nov 5, 2018

jbrockmendel commented Nov 5, 2018

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TomAugspurger commented Nov 5, 2018

TomAugspurger left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pep8speaks commented Nov 4, 2018 •

edited

Loading