Skip to content

Fix index for datetime64 conversion. Fixes #13937 #14446

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

Fix index for datetime64 conversion. Fixes #13937 #14446

wants to merge 1 commit into from

Conversation

amolkahat
Copy link
Contributor

@amolkahat amolkahat commented Oct 18, 2016

Fix index for datetime64 conversion

This PR fixes index for datetime64 conversion. Also,
update tests - pandas/pandas/tests/tests_convert_to.py

Fixes #13937

@codecov-io
Copy link

codecov-io commented Oct 19, 2016

Codecov Report

Merging #14446 into master will decrease coverage by -0.03%.
The diff coverage is 100%.

@@            Coverage Diff             @@
##           master   #14446      +/-   ##
==========================================
- Coverage   91.06%   91.04%   -0.03%     
==========================================
  Files         136      136              
  Lines       49104    49104              
==========================================
- Hits        44718    44705      -13     
- Misses       4386     4399      +13
Impacted Files Coverage Δ
pandas/core/frame.py 97.74% <100%> (-0.1%)
pandas/io/gbq.py 40% <0%> (-46.67%)
pandas/util/decorators.py 67.92% <0%> (-0.95%)
pandas/tools/merge.py 91.78% <0%> (-0.35%)
pandas/core/common.py 91.02% <0%> (-0.34%)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 29d81f3...3806983. Read the comment docs.

@amolkahat
Copy link
Contributor Author

@jreback Please take a look. I updated tests as you suggested in PR #14044.

@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype labels Oct 19, 2016
.to_records()
expected = np.rec.array([('x', 'y')], dtype=[('a', 'O'), ('b', 'O')])
tm.assert_almost_equal(result, expected)

def test_to_records_with_tz_gmt(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add a comment for the issue number

df = DataFrame({'datetime': data})
df.set_index('datetime', inplace=True)
dfgmt = df.tz_convert("GMT")
tm.assertIsNot(df.to_records()['datetime'][0].tzinfo,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

construct the expected frame and use assert_frrame_equal

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also simplify the construction of the test data using pandas constructs?

Eg

data = [datetime.datetime.now(pytz.timezone('UTC')) for _ in range(10)]
df = DataFrame({'datetime': data})
df.set_index('datetime', inplace=True)

can be done with (for similar object, not exact same values of course)

df = DataFrame(index=date_range('2016-01-01', periods=10, freq='S', tz='UTC'))

@jreback
Copy link
Contributor

jreback commented Oct 19, 2016

pls add a whatsnew entry for 0.19.1

def test_to_records_with_tz_gmt(self):
"""
This test verifies the bug id #13937
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add this as a comment instead of docstring? (docstrings hide the test name in the output of nosetests)

Also just the number is fine, like # GH 13937

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I'll take care of this. :)

df3 = DataFrame({'datetime': dt3})
df4 = df3.set_index('datetime')

tm.assert_frame_equal(df2, df4)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this test related to the bug you fixed?
It does not use to_records

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche Hmm. yes it is not related to my bug fix. I just focused on data frame equality. Actually there is also one problem assert_frame_equal does not accept any other frames expect DataFrame. In my previous PR 14044 Jeff suggest to construct equal data frame. Can you suggest me how i complete Jeff's assert_frame_equal requirement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jorisvandenbossche @jreback Can you please suggest me how i construct equal data frame?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your question is not fully clear to me.
If you want to test to_records, you can also assert that the resulting array is equal (with assert_numpy_array_equal instead of assert_frame_equal)

df4 = DataFrame({'datetime': dt2})
df5 = df4.set_index('datetime')

tm.assert_frame_equal(df3, df5)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same question for this test.
Was the above code failing before your PR? Or is this not yet tested?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No this code does not falling before my PR or after the fixes.

df = DataFrame({'datetime': data})
df.set_index('datetime', inplace=True)
dfgmt = df.tz_convert("GMT")
tm.assertIsNot(df.to_records()['datetime'][0].tzinfo,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also simplify the construction of the test data using pandas constructs?

Eg

data = [datetime.datetime.now(pytz.timezone('UTC')) for _ in range(10)]
df = DataFrame({'datetime': data})
df.set_index('datetime', inplace=True)

can be done with (for similar object, not exact same values of course)

df = DataFrame(index=date_range('2016-01-01', periods=10, freq='S', tz='UTC'))

@jorisvandenbossche
Copy link
Member

@amolkahat It looks like something went wrong with rebasing/updating. Normally, doing exactly the following should clean it up:

git fetch upstream
git checkout bug_fixes
git rebase upstream/master
git push origin bug_fixes -f

(supposing your fork is 'origin' and pandas-dev is 'upstream')

@jorisvandenbossche
Copy link
Member

@amolkahat Can you rebase? (see my previous comment how to fix it)

@amolkahat
Copy link
Contributor Author

@jorisvandenbossche Please take a look.

@jreback
Copy link
Contributor

jreback commented Dec 30, 2016

can you add a whatsnew note for 0.20.0

freq='S',
tz='UTC'))

dfgmt = df.tz_convert("GMT")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use result = ....

can you construct a rec-array directly and use tm.assert_numpy_array_equal

def test_from_UTC_to_GMT_as_equal_frame(self):
# GH13937

dt1 = [datetime.datetime(2011,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this last test is not necessary

df1 = DataFrame(index=dt1)
df3 = df1.tz_convert("UTC")

df4 = DataFrame(index=dt2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

however the empty frame is good (but instead of a separate tests, you can do this as part of the other one, IOW)
just use a starting from from the first test

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

try to consolidate these tests into a single test (but with multiple parts is ok), its just confusing to start with 3 different frames, start with 1, then do a

result = ....
expected = ....
tm.assert_numpy_array_equal(....)

# next part
....

# last part
...

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if you updated this. the tests needs to be more concise and focused.

@@ -355,3 +355,7 @@ Bug Fixes


- Bug in ``pd.read_csv()`` for the C engine where ``usecols`` were being indexed incorrectly with ``parse_dates`` (:issue:`14792`)
=======
- Bug in ``DataFrame.to_records()`` with converting datetime64 index with timezone (:issue: `13937`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so looks like some merge conflicts are remaining here

8,
15,
8,
15,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is really hard to read. please format better.

freq='S',
tz='GMT'))

tm.assert_almost_equal(df_utc.to_records()['datetime'][0],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this comparison add that the assert_numpy_array_equal does not? (which is the way we would like to compare these)

index=data2)

df_gmt2 = DataFrame(date_range('2016-01-01',
periods=10,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this test actually testing? (which the first test does not)

@jreback
Copy link
Contributor

jreback commented Jan 14, 2017

can you update

@amolkahat
Copy link
Contributor Author

@jreback Please take a look.

@@ -589,3 +589,5 @@ Bug Fixes
- Bug in ``Series.replace`` and ``DataFrame.replace`` which failed on empty replacement dicts (:issue:`15289`)
- Bug in ``pd.melt()`` where passing a tuple value for ``value_vars`` caused a ``TypeError`` (:issue:`15348`)
- Bug in ``.eval()`` which caused multiline evals to fail with local variables not on the first line (:issue:`15342`)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

move into a blank space above so that you don't get conflicts.

with converting a datetime64 index

from pandas.tests.frame.common import TestData


class TestDataFrameConvertTo(tm.TestCase, TestData):
_multiprocess_can_split_ = True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove as we are using pytest now

periods=10,
freq='S',
tz='UTC'))

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do something simpler and much more readable

result = df_utc.tz_convert("GMT").to_records()
expected = df_utc.to_records()

tm.assert_numpy_array_equal(result, expected)


def test_to_records_with_tz(self):
# GH13937
df_utc = DataFrame({'datetime': date_range('2016-01-01',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you format this to make it easier to read


# test DataFrame.to_records() with timezone conversion to UTC

df_gmt = DataFrame({'datetime': date_range('2016-01-01',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same with all of the following, make shorter / simpler / easier-to-read

@jreback
Copy link
Contributor

jreback commented Feb 28, 2017

@amolkahat thanks it looks a bit better. In general, please don't modify whitespace. You must have some editor / settings which are doing things. They are pretty distracting.

@jreback
Copy link
Contributor

jreback commented Mar 1, 2017

you'll want to rebase on master as travis was having some issues

Fix index for datetime64 conversion

This PR fixes index for datetime64 conversion. Also,
update tests - pandas/pandas/tests/tests_convert_to.py

Fixes #13937
@jreback jreback closed this in f000a4e Mar 2, 2017
@jreback jreback added this to the 0.20.0 milestone Mar 2, 2017
@jreback
Copy link
Contributor

jreback commented Mar 2, 2017

thanks!

mcocdawc pushed a commit to mcocdawc/pandas that referenced this pull request Mar 2, 2017
closes pandas-dev#13937

Author: Amol Kahat <[email protected]>

Closes pandas-dev#14446 from amolkahat/bug_fixes and squashes the following commits:

3806983 [Amol Kahat] Modify test cases.
@amolkahat
Copy link
Contributor Author

@jreback Thanks.
Can you tell me on which IRC server & channel pandas-dev discussion happen?

@jorisvandenbossche
Copy link
Member

@amolkahat for chat, there is the gitter chat (https://gitter.im/pydata/pandas), there is also the pandas-dev mailing list (https://mail.python.org/mailman/listinfo/pandas-dev), but most of the discussion happens on github

@amolkahat
Copy link
Contributor Author

@jorisvandenbossche Thank you. 😃

AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
closes pandas-dev#13937

Author: Amol Kahat <[email protected]>

Closes pandas-dev#14446 from amolkahat/bug_fixes and squashes the following commits:

3806983 [Amol Kahat] Modify test cases.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode Timezones Timezone data dtype
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: DataFrame.to_records() bug in converting datetime64 index with timezone
4 participants