Skip to content

Clarified error in read_sas method when buffer object provided withou… #14947

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from

Conversation

tomrod
Copy link
Contributor

@tomrod tomrod commented Dec 21, 2016

…t format

Added three lines to sasreader.py immediately following line 33 (if format==None:) to handle the case when a buffer object is provided without a format='sas7bdat' or format='xport' situation. Method otherwise works splendidly when a filepath is provided, but a buffer object fails. This is an issue when using sasreader directly on SFTP file objects. I am unaware of any bug request (and am happy to open one), but I came across this issue when using the library.

@@ -31,6 +31,9 @@ def read_sas(filepath_or_buffer, format=None, index=None, encoding=None,
"""

if format is None:
bufferr = "Format unrecognized. If buffer object, specify format")
if type(filesize_or_buffer) != str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use not isinstance(filesize_or_buffer, compat.string_types)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bufferr -> buffer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bufferr -- buffer error. Didn't want to conflict with anything downstream.

Copy link
Contributor Author

@tomrod tomrod Dec 30, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compat.string_types doesn't appear to be a standard library item? Instead using:

if type(filepath_or_buffer) not in [str, unicode]:
    raise TypeError(BuffErr)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from pandas import compat
pls do it this way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do, thanks for the pointer.

@@ -271,6 +271,7 @@ Performance Improvements
- Improved performance of ``pd.wide_to_long()`` (:issue:`14779`)
- Increased performance of ``pd.factorize()`` by releasing the GIL with ``object`` dtype when inferred as strings (:issue:`14859`)

- When reading buffer object in ``read_sas()`` method without specified format, filepath string is inferred rather than buffer object. Error is now thrown if buffer object is provided without format {sas7bdat|xport} specification.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just the first sentence.

put in bug fix section; use this PR number as the issue number.

@jreback
Copy link
Contributor

jreback commented Dec 21, 2016

please add a test (you can use StringIO() as a test buffer)

@jreback jreback added Bug IO SAS SAS: read_sas labels Dec 21, 2016
@tomrod
Copy link
Contributor Author

tomrod commented Dec 21, 2016 via email

@TomAugspurger
Copy link
Contributor

The test should basically be a simplified version of your code that discovered the problem. The test should fail without your fix, and pass with it.

Maybe something like

def test_sas_buffer_format(self):
    b = StringIO("")
    with self.assertRaises(TypeError):
        result = pd.read_sas(b)

Haven't tested that, so double check. That can go with the other tests in pandas/io/tests/sas/test_sas.py

@tomrod
Copy link
Contributor Author

tomrod commented Dec 21, 2016 via email

@jreback
Copy link
Contributor

jreback commented Dec 30, 2016

can you update

@tomrod
Copy link
Contributor Author

tomrod commented Dec 31, 2016

Updated

@@ -271,6 +271,7 @@ Performance Improvements
- Improved performance of ``pd.wide_to_long()`` (:issue:`14779`)
- Increased performance of ``pd.factorize()`` by releasing the GIL with ``object`` dtype when inferred as strings (:issue:`14859`)

- When reading buffer object in ``read_sas()`` method without specified format, filepath string is inferred rather than buffer object.
Copy link
Contributor

@jreback jreback Dec 31, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug when reading a buffer object in pd.read_sas(), without a specified format, a filepath string was inferred rather than buffer object.

move to Bug Fix section, add the issue number (this PR) at the end

if format is None:
buffErr = "Format unrecognized. If buffer object, specify format"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use buffer_error_msg.

If this is a buffer object, rather than a string name, you must specify a format string.

@@ -29,8 +29,11 @@ def read_sas(filepath_or_buffer, format=None, index=None, encoding=None,
DataFrame if iterator=False and chunksize=None, else SAS7BDATReader
or XportReader
"""

from pandas import compat
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import should be at the top of the file (if might be there already)


class TestSasBuff(tm.TestCase):
def test_sas_buffer_format(self):
import StringIO
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imports go at the top

def test_sas_buffer_format(self):
import StringIO
from pandas.io.sas.sasreader import read_sas
b = StringIO.StringIO("")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from pandas import read_sas

def test_sas_buffer_format(self):
import StringIO
from pandas.io.sas.sasreader import read_sas
b = StringIO.StringIO("")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from pandas.compat import StringIO

@codecov-io
Copy link

codecov-io commented Dec 31, 2016

Current coverage is 84.75% (diff: 25.00%)

Merging #14947 into master will increase coverage by 0.10%

@@             master     #14947   diff @@
==========================================
  Files           144        145     +1   
  Lines         51021      51150   +129   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          43188      43351   +163   
+ Misses         7833       7799    -34   
  Partials          0          0          

Powered by Codecov. Last update f79bc7a...1285dbb

if format is None:
buffer_error_msg = "If this is a buffer object rather\
than a string name, you must specify a format string"
if not isinstance(filepath_or_buffer,compat.string_types):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have some PEP8 errors (we check for them in the travis build, that is the reason travis failed). For example, here there should be a space after the comma.

Here is the full list (you can find it at the bottom of the failing test (third travis build), but I also recommend setting up your IDE to check for this):

pandas/io/sas/sasreader.py:6:1: E302 expected 2 blank lines, found 1
pandas/io/sas/sasreader.py:35:45: E231 missing whitespace after ','
pandas/io/tests/sas/test_sas.py:5:1: E302 expected 2 blank lines, found 1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I got this covered. I'll reread the Pandas submission info for how to ensure PEP8 compliance tonight.

if format is None:
buffer_error_msg = "If this is a buffer object rather\
than a string name, you must specify a format string"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use

("..."
 "...")

instead of \ for line continuation?

from pandas.compat import StringIO
from pandas import read_sas

class TestSasBuff(tm.TestCase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can call this just TestSas (and there needs to be a blank line after this one)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@jorisvandenbossche
Copy link
Member

@tomrod travis still reports some PEP8 errors:

pandas/io/sas/sasreader.py:6:1: E302 expected 2 blank lines, found 1
pandas/io/tests/sas/test_sas.py:5:1: E302 expected 2 blank lines, found 1

(I would recommend setting your IDE to check for this as well)

@tomrod
Copy link
Contributor Author

tomrod commented Jan 4, 2017

@jorisvandenbossche now passes $ git diff origin | flake8 --diff. Please let me know if you see any more issues.

@jreback jreback added this to the 0.20.0 milestone Jan 9, 2017
@jreback jreback closed this in e7eefc4 Jan 9, 2017
@jreback
Copy link
Contributor

jreback commented Jan 9, 2017

thanks!

AnkurDedania pushed a commit to AnkurDedania/pandas that referenced this pull request Mar 21, 2017
…ithout a format

Author: tomrod <[email protected]>

Closes pandas-dev#14947 from tomrod/sas_read_format_bugfix and squashes the following commits:

1285dbb [tomrod] flake8 testing
4cf9231 [tomrod] PEP8 compliance
ab76d80 [tomrod] Updating to match pep8 whitespace requirements in sasreader.py
ffdce1d [tomrod] Updating to match pep8 as per @jorisvandenbossche
aa1ada3 [tomrod] More specific error message, moved imports to top of files
bf60d23 [tomrod] Adding tests, creating updated information
5efdb85 [tomrod] Adding tests
f8166fc [tomrod] Updated based on feedback from jreback
b52f204 [tomrod] Clarified error in read_sas method when buffer object provided without format
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO SAS SAS: read_sas
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants