-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Add get_srml iotools function; deprecate read_srml_month_from_solardat #1779
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 12 commits
f4e8248
15f6d4c
0f83b2e
8f14cb8
b09144f
7057ede
2b5c39b
d689a14
e85ddf7
48b6310
a308cda
1830b03
a1b3922
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -3,7 +3,10 @@ | |
""" | ||
import numpy as np | ||
import pandas as pd | ||
import urllib | ||
import warnings | ||
|
||
from pvlib._deprecation import deprecated | ||
|
||
# VARIABLE_MAP is a dictionary mapping SRML data element numbers to their | ||
# pvlib names. For most variables, only the first three digits are used, | ||
|
@@ -26,8 +29,9 @@ | |
|
||
def read_srml(filename, map_variables=True): | ||
""" | ||
Read University of Oregon SRML 1min .tsv file into pandas dataframe. The | ||
SRML is described in [1]_. | ||
Read University of Oregon SRML 1min .tsv file into pandas dataframe. | ||
|
||
The SRML is described in [1]_. | ||
|
||
Parameters | ||
---------- | ||
|
@@ -51,14 +55,14 @@ def read_srml(filename, map_variables=True): | |
the time of the row until the time of the next row. This is consistent | ||
with pandas' default labeling behavior. | ||
|
||
See SRML's `Archival Files`_ page for more information. | ||
|
||
.. _Archival Files: http://solardat.uoregon.edu/ArchivalFiles.html | ||
See [2]_ for more information concerning the file format. | ||
|
||
References | ||
---------- | ||
.. [1] University of Oregon Solar Radiation Monitoring Laboratory | ||
`http://solardat.uoregon.edu/ <http://solardat.uoregon.edu/>`_ | ||
.. [2] `Archival (short interval) data files | ||
<http://solardat.uoregon.edu/ArchivalFiles.html>`_ | ||
""" | ||
tsv_data = pd.read_csv(filename, delimiter='\t') | ||
data = _format_index(tsv_data) | ||
|
@@ -168,10 +172,12 @@ def _format_index(df): | |
return df | ||
|
||
|
||
@deprecated('0.10.0', alternative='pvlib.iotools.get_srml', removal='0.11.0') | ||
def read_srml_month_from_solardat(station, year, month, filetype='PO', | ||
map_variables=True): | ||
"""Request a month of SRML data from solardat and read it into | ||
a Dataframe. The SRML is described in [1]_. | ||
"""Request a month of SRML data and read it into a Dataframe. | ||
|
||
The SRML is described in [1]_. | ||
|
||
Parameters | ||
---------- | ||
|
@@ -222,3 +228,94 @@ def read_srml_month_from_solardat(station, year, month, filetype='PO', | |
url = "http://solardat.uoregon.edu/download/Archive/" | ||
data = read_srml(url + file_name, map_variables=map_variables) | ||
return data | ||
|
||
|
||
def get_srml(station, start, end, filetype='PO', map_variables=True, | ||
url="http://solardat.uoregon.edu/download/Archive/"): | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thinking more about minor consistency details, I guess we could standardize the order of
#1767 currently does I don't see one as much better than the other. I guess I'd favor There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I actually thought about discussing this for your PR, but I guess I don't have a strong opinion except for consistency. But the logic of having URL last makes sense to me - it's just more of a hassle changing this as there are more existing functions with URL first. I opened #1791 based on this discussion fyi. |
||
"""Request data from UoO SRML and read it into a Dataframe. | ||
|
||
The University of Oregon Solar Radiation Monitoring Laboratory (SRML) is | ||
described in [1]_. A list of stations can be found in [2]_. | ||
|
||
Data is returned for the entire months between and including start and end. | ||
|
||
Parameters | ||
---------- | ||
station : str | ||
Two letter station abbreviation. | ||
start : datetime like | ||
First day of the requested period | ||
end : datetime like | ||
Last day of the requested period | ||
filetype : string, default: 'PO' | ||
SRML file type to gather. See notes for explanation. | ||
map_variables : bool, default: True | ||
When true, renames columns of the DataFrame to pvlib variable names | ||
where applicable. See variable :const:`VARIABLE_MAP`. | ||
url : str, default: 'http://solardat.uoregon.edu/download/Archive/' | ||
API endpoint URL | ||
|
||
Returns | ||
------- | ||
data : pd.DataFrame | ||
Dataframe with data from SRML. | ||
meta : dict | ||
Metadata. | ||
|
||
Notes | ||
----- | ||
File types designate the time interval of a file and if it contains | ||
raw or processed data. For instance, `RO` designates raw, one minute | ||
data and `PO` designates processed one minute data. The availability | ||
of file types varies between sites. Below is a table of file types | ||
and their time intervals. See [1] for site information. | ||
|
||
============= ============ ================== | ||
time interval raw filetype processed filetype | ||
============= ============ ================== | ||
1 minute RO PO | ||
5 minute RF PF | ||
15 minute RQ PQ | ||
hourly RH PH | ||
============= ============ ================== | ||
|
||
Warning | ||
------- | ||
SRML data has nighttime data prefilled with 0s through the end of the | ||
current month (i.e., values are provided for data in the future). | ||
|
||
References | ||
---------- | ||
.. [1] University of Oregon Solar Radiation Measurement Laboratory | ||
`http://solardat.uoregon.edu/ <http://solardat.uoregon.edu/>`_ | ||
.. [2] Station ID codes - Solar Radiation Measurement Laboratory | ||
`http://solardat.uoregon.edu/StationIDCodes.html | ||
<http://solardat.uoregon.edu/StationIDCodes.html>`_ | ||
""" | ||
# Use pd.to_datetime so that strings (e.g. '2021-01-01') are accepted | ||
start = pd.to_datetime(start) | ||
end = pd.to_datetime(end) | ||
|
||
# Generate list of months | ||
months = pd.date_range( | ||
start, end.replace(day=1) + pd.DateOffset(months=1), freq='1M') | ||
months_str = months.strftime('%y%m') | ||
|
||
# Generate list of filenames | ||
filenames = [f"{station}{filetype}{m}.txt" for m in months_str] | ||
|
||
dfs = [] # Initialize list of monthly dataframes | ||
for f in filenames: | ||
try: | ||
dfi = read_srml(url + f, map_variables=map_variables) | ||
dfs.append(dfi) | ||
except urllib.error.HTTPError: | ||
warnings.warn(f"The following file was not found: {f}") | ||
AdamRJensen marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
data = pd.concat(dfs, axis='rows') | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How about following this with a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not opposed to this suggestion, although we may run into some issues related to timezone. I think this is why the There already exist several functions in pvlib that return a full month when requesting a single day btw, e.g. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Okay, that's a good point. Up to you on what is best here. If we keep the current behavior of returning complete months, might be worth a note in the docstring. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This sounds right. And a complicating factor for SRML was nighttime 0s in the future if we requested a day from the current month. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Here's a raw data file for the current month that shows -999 and similar for future dates. Here's a raw data file that includes -999 and similar as well as 0s This caused a problem in SFA SolarArbiter/solarforecastarbiter-core#572 and it's reasonable to expect that it would cause a problem with other user code. I don't know if that's pvlib's problem to solve, but I think it's somewhat more likely to come up with this new function that accepts datetimes instead of entire months. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't see how the new function makes this more of an issue, currently, I would still request the same months with the old function, but it would just be more manual work. I do think that it deserves a Warning entry and perhaps we can also implement a line that cuts off future data? For example:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think many users would have different expectations of the new function that accepts start, end datetimes than the old function that accepts a year and a month. Thanks for adding the warning. I'd rather see the previously rejected start:end slicing than .today() slicing. I'm also fine with just adding the warning and seeing if users complain. |
||
|
||
meta = {'filetype': filetype, | ||
'station': station, | ||
'filenames': filenames} | ||
|
||
return data, meta |
Uh oh!
There was an error while loading. Please reload this page.