Skip to content

add test for hour_angle, vectorize #599

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Oct 5, 2018
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions pvlib/solarposition.py
Original file line number Diff line number Diff line change
Expand Up @@ -1327,8 +1327,8 @@ def hour_angle(times, longitude, equation_of_time):
equation_of_time_Spencer71
equation_of_time_pvcdrom
"""
hours = np.array([(t - t.tz.localize(
dt.datetime(t.year, t.month, t.day)
)).total_seconds() / 3600. for t in times])
timezone = times.tz.utcoffset(times).total_seconds() / 3600.
tz_info = times.tz
timezone = tz_info.utcoffset(times).total_seconds() / 3600.
Copy link
Member

@wholmgren wholmgren Oct 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this the expected behavior?

In [15]: times = pd.DatetimeIndex(start='20180101', end='20190101', freq='1s', tz='America/Phoenix')

In [16]: timezone = times.tz.utcoffset(times).total_seconds() / 3600.

In [17]: timezone
Out[17]: -7.466666666666667

I am surprised because 1. I expected -7 and 2. times is a vector so I expected to get a result for each time (offset could change for DST).

Is this preferable?

In [27]: np.array((times.tz_localize(None).astype(int) - times.astype(int)) / 1e9 / 3600)
Out[27]: array([-7., -7., -7., ..., -7., -7., -7.])

The np.array wrapper ensures a known type instead of a version-dependent Index/array type.

Note that tz_localize(None) will always work if we bump min pandas to 0.15, as discussed elsewhere.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No it's not what I would expect, I took that approach from this SO answer.

TL;DR: let's just use your approach instead

details
Something very weird is going on with this particular America/Phoenix timezone. PyTz is picking a historical timezone called LMT that was discontinued in 1883 because it is listed first in the transition info list.

>>> import pytz
>>> import datetime
>>> america_phoenix = pytz.timezone('America/Phoenix')
<DstTzInfo 'America/Phoenix' LMT-1 day, 16:32:00 STD>
>>> dt = datetime.datetime.now().replace(tzinfo=america_phoenix)
datetime.datetime(2018, 10, 5, 12, 58, 34, 427000,
                  tzinfo=<DstTzInfo 'America/Phoenix' LMT-1 day, 16:32:00 STD>)
>>> america_phoenix.utcoffset(dt)
datetime.timedelta(-1, 59520)

The source for DstTzInfo says that _utcoffset is returned for utcoffset(dt) if localized already, otherwise localize first then return the _utcoffset of the timezone.

>>> america_phoenix._utcoffset
datetime.timedelta(-1, 59520)

And if not localized, then _utcoffset is set to whatever is the first tz info in the transition info table. In this case that's "LMT" which was UTC -7:28:18 until 1883.

>>> america_phoenix._tzinfos
{(datetime.timedelta(-1, 59520),
  datetime.timedelta(0),
  'LMT'): <DstTzInfo 'America/Phoenix' LMT-1 day, 16:32:00 STD>,
 (datetime.timedelta(-1, 61200),
  datetime.timedelta(0),
  'MST'): <DstTzInfo 'America/Phoenix' MST-1 day, 17:00:00 STD>,
 (datetime.timedelta(-1, 64800),
  datetime.timedelta(0, 3600),
  'MDT'): <DstTzInfo 'America/Phoenix' MDT-1 day, 18:00:00 DST>,
 (datetime.timedelta(-1, 64800),
  datetime.timedelta(0, 3600),
  'MWT'): <DstTzInfo 'America/Phoenix' MWT-1 day, 18:00:00 DST>}

>>> america_phoenix._transition_info
[(datetime.timedelta(-1, 59520), datetime.timedelta(0), 'LMT'),
 (datetime.timedelta(-1, 61200), datetime.timedelta(0), 'MST'),
 (datetime.timedelta(-1, 64800), datetime.timedelta(0, 3600), 'MDT'),
 (datetime.timedelta(-1, 61200), datetime.timedelta(0), 'MST'),
 (datetime.timedelta(-1, 64800), datetime.timedelta(0, 3600), 'MDT'),
 (datetime.timedelta(-1, 61200), datetime.timedelta(0), 'MST'),
 (datetime.timedelta(-1, 64800), datetime.timedelta(0, 3600), 'MWT'),
 (datetime.timedelta(-1, 61200), datetime.timedelta(0), 'MST'),
 (datetime.timedelta(-1, 64800), datetime.timedelta(0, 3600), 'MWT'),
 (datetime.timedelta(-1, 61200), datetime.timedelta(0), 'MST'),
 (datetime.timedelta(-1, 64800), datetime.timedelta(0, 3600), 'MDT'),
 (datetime.timedelta(-1, 61200), datetime.timedelta(0), 'MST')]

If the datetime is localized, then PyTz does some smart stuff to figure out which transition info to use, based on the values in _utc_transition_times and returns the corresponding transition info and timezone info, and replaces tzinfo in the datetime.

>>> america_phoenix._utc_transition_times
[datetime.datetime(1, 1, 1, 0, 0),
 datetime.datetime(1901, 12, 13, 20, 45, 52),
 datetime.datetime(1918, 3, 31, 9, 0),
 datetime.datetime(1918, 10, 27, 8, 0),
 datetime.datetime(1919, 3, 30, 9, 0),
 datetime.datetime(1919, 10, 26, 8, 0),
 datetime.datetime(1942, 2, 9, 9, 0),
 datetime.datetime(1944, 1, 1, 6, 1),
 datetime.datetime(1944, 4, 1, 7, 1),
 datetime.datetime(1944, 10, 1, 6, 1),
 datetime.datetime(1967, 4, 30, 9, 0),
 datetime.datetime(1967, 10, 29, 8, 0)]

Since our datetime is 2018-10-05T12:58:34, that's the last index, and we get the the most recent Phoenix timezone from the transition info table.

(datetime.timedelta(-1, 61200), datetime.timedelta(0), 'MST')

which is the key using in the timezone info table and returns the correct timezone

(datetime.timedelta(-1, 61200), datetime.timedelta(0), 'MST'):
    <DstTzInfo 'America/Phoenix' MST-1 day, 17:00:00 STD>

When pandas makes a datetime index, I guess it doesn't care about tzinfo but it does do this with Timestamps which are the items in the datetime index.

DatetimeIndex has wrong timezone

>>> import pandas as pd
>>> times = pd.DatetimeIndex(start='20180101', end='20190101', freq='1s', tz='US/Arizona')
>>> times.tz
<DstTzInfo 'US/Arizona' LMT-1 day, 16:32:00 STD>
>>> times.tzinfo  # Alias for tz attribute (in PyTz)
<DstTzInfo 'US/Arizona' LMT-1 day, 16:32:00 STD>

Timestamp has correct timezone

>>> times[0]
Timestamp('2018-01-01 00:00:00-0700', tz='US/Arizona', offset='S')
>>> times[0].tz
<DstTzInfo 'US/Arizona' MST-1 day, 17:00:00 STD>
>>> times[0].tzinfo
<DstTzInfo 'US/Arizona' MST-1 day, 17:00:00 STD>

Also if to_pydatetime() is called it uses the correct timezone as well.

>>> times.to_pydatetime()
array([ datetime.datetime(2018, 1, 1, 0, 0, tzinfo=<DstTzInfo 'US/Arizona' MST-1 day, 17:00:00 STD>),
       datetime.datetime(2018, 1, 1, 0, 0, 1, tzinfo=<DstTzInfo 'US/Arizona' MST-1 day, 17:00:00 STD>),
       datetime.datetime(2018, 1, 1, 0, 0, 2, tzinfo=<DstTzInfo 'US/Arizona' MST-1 day, 17:00:00 STD>),
       ...,
       datetime.datetime(2018, 12, 31, 23, 59, 58, tzinfo=<DstTzInfo 'US/Arizona' MST-1 day, 17:00:00 STD>),
       datetime.datetime(2018, 12, 31, 23, 59, 59, tzinfo=<DstTzInfo 'US/Arizona' MST-1 day, 17:00:00 STD>),
       datetime.datetime(2019, 1, 1, 0, 0, tzinfo=<DstTzInfo 'US/Arizona' MST-1 day, 17:00:00 STD>)], dtype=object)

Anyway, not sure if this is a bug in pandas or not, but who cares? Now I have to go around and fix this everywhere I've used utcoffset(), bummer. ☹️

Copy link
Member Author

@mikofski mikofski Oct 5, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: it's only used 2 places:

Searching 160 files for "utcoffset"

C:\Users\mikm\Projects\pvlib-python\pvlib\clearsky.py:
  664          * automatically determines sample_interval
  665          * requires a reference clear sky series instead calculating one
  666:           from a user supplied location and UTCoffset
  667          * parameters are controllable via keyword arguments
  668          * option to return individual test components and clearsky scaling

C:\Users\mikm\Projects\pvlib-python\pvlib\solarposition.py:
  547          # pyephem drops timezone when converting to its internal datetime
  548          # format, so handle timezone explicitly here
  549:         obs.date = ephem.Date(thetime - thetime.utcoffset())
  550          sunrise.append(_ephem_to_timezone(rising(sun), time.tz))
  551          sunset.append(_ephem_to_timezone(setting(sun), time.tz))
  ...
 1329      """
 1330      tz_info = times.tz
 1331:     timezone = tz_info.utcoffset(times).total_seconds() / 3600.
 1332      hours = 1 / (3600. * 1.e9) * (
 1333          times.astype(np.int64) - times.normalize().astype(np.int64))

So utcoffset() is also a Python builtin which is overloaded by PyTz. The difference is that if utcoffset is called on a Python datetime then it will return the utcoffset of it's tz info called on itself. So in our example above we could have called:

>>> dt.utcoffset()
datetime.timedelta(-1, 59520)  # wrong timezone b/c localize() was not used!

Also FYI: there is an inefficient loop in rise_set_transit_ephem which uses slow to_pydatetime() and utcoffset(), but it's probably addressed in #588?

Other than that I think there's only 2 other instances of utcoffset this one in hour_angle and one in #583. I will fix both. Thanks for catching this!

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: this is also an example of why we should never replace tzinfo manually, but always use the localize() method so that the correct timezone from the transition info table is used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for looking into it. I've seen this kind of funny behavior with the AZ timezones before but it didn't occur to me that it could be a problem in this function. Some other common timezones have similar behavior (though I don't remember which).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I don't think it's a problem with the function utcoffset(). Honestly, I think this was my mistake for using utcoffset() with a sequence of times like pandas.DatetimeIndex, because there could potentially be a different timezone for each item in the sequence, so it's ambiguous which timezone should be used. Therefore, I think the problem is that the tz (aka tzinfo) that's set in the DatetimeIndex is unreliable and ambiguous, or the problem could be with utcoffset which should be overloaded to return a list of timezones localized to each item in the sequence.

hours = (times.astype(np.int64) - times.normalize().astype(np.int64)) / (
3600. * 1.e9)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

E126 continuation line over-indented for hanging indent

return 15. * (hours - 12. - timezone) + longitude + equation_of_time / 4.
24 changes: 24 additions & 0 deletions pvlib/test/test_solarposition.py
Original file line number Diff line number Diff line change
Expand Up @@ -694,3 +694,27 @@ def test_analytical_azimuth():
azimuths = solarposition.solar_azimuth_analytical(*test_angles.T, zenith=zeniths)

assert not np.isnan(azimuths).any()


def test_hour_angle():
"""
Test conversion from hours to hour angles in degrees given the following
inputs from NREL SPA calculator at Golden, CO
date,times,eot,sunrise,sunset
1/2/2015,7:21:55,-3.935172,-70.699400,70.512721
1/2/2015,16:47:43,-4.117227,-70.699400,70.512721
1/2/2015,12:04:45,-4.026295,-70.699400,70.512721
"""
longitude = -105.1786 # degrees
times = pd.DatetimeIndex([
'2015-01-02 07:21:55.2132',
'2015-01-02 16:47:42.9828',
'2015-01-02 12:04:44.6340'
]).tz_localize('Etc/GMT+7')
eot = np.array([-3.935172, -4.117227, -4.026295])
hours = solarposition.hour_angle(times, longitude, eot)
expected = (-70.682338, 70.72118825000001, 0.000801250)
# FIXME: there are differences from expected NREL SPA calculator values
# sunrise: 4 seconds, sunset: 48 seconds, transit: 0.2 seconds
# but the differences may be due to other SPA input parameters
assert np.allclose(hours, expected)