-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ENH: Add if_sheet_exists parameter to ExcelWriter #40231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
mirober
commented
Mar 4, 2021
•
edited
Loading
edited
- closes ENH: Allow overwriting existing sheets when appending to excel files #40230
- tests added / passed
- Ensure all linting tests pass, see here for how to run them
- whatsnew entry
Hello @mrob95! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found: There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻 Comment last updated at 2021-04-21 19:55:25 UTC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Looking good, some comments below. Also, can you add test in test_writers for attempting to use if_sheet_exists with engines other than openpyxl.
pandas/io/excel/_base.py
Outdated
|
||
* new: Create a new sheet, with a name determined by the engine. | ||
* replace: Delete the contents of the sheet before writing to it. | ||
* overwrite: Write directly to the named sheet |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the use case for overwrite?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Published statistics, at least in the UK, often have sheets which combine headings, date, description, data quality notes, etc with data tables. To automate something like this you would probably have a pre-written template and then write your data from pandas into specific sheets at specific locations.
An example I happened to be looking at recently, England's daily, weekly and monthly vaccination figures.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mrob95 - does the current implementation in this PR overwrite the template formatting? E.g. if a column in the template is formatted to percent and I have a DataFrame with 0.5, will it be displayed in excel as 50%?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rhshadrach It doesn't overwrite cell formatting unless there is an alternative style set. E.g. df.style.set_properties(**{"number-format": "0.00%"})
will overwrite number formatting but otherwise the written cells inherit the previous formatting, including conditional formatting.
The only exception to this I can see is headers and indexes, which have a hardcoded style which I think will always overwrite certain formats (see def header_style
in io/formats/excel.py
). This may not be ideal for certain use cases but seems like a separate issue, about which there is already discussion (#25185).
Between this and pandas own excel formatting options I think the options are pretty good for styling tables written using if_sheet_exists="overwrite"
.
# GH 40230 | ||
df = DataFrame({"fruit": ["pear"]}) | ||
with tm.ensure_clean(ext) as f: | ||
with pytest.raises(ValueError, match=re.escape(msg)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what about other engines?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Openpyxl is the only engine which supports append mode currently, and this option only affects append mode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so we completely ignore on other engines is fine. is there a reason to raise / warn in that case if its not None?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it's written at the moment passing if_sheet_exists
in write mode will raise an error with all engines (mostly as a feedback mechanism for the user)
pandas/io/excel/_base.py
Outdated
* replace: Delete the contents of the sheet before writing to it. | ||
* overwrite: Write directly to the named sheet | ||
without deleting the previous contents. | ||
* fail: raise a ValueError. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename to 'error'
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
pandas/io/excel/_base.py
Outdated
@@ -667,6 +667,17 @@ class ExcelWriter(metaclass=abc.ABCMeta): | |||
be parsed by ``fsspec``, e.g., starting "s3://", "gcs://". | |||
|
|||
.. versionadded:: 1.2.0 | |||
if_sheet_exists : {'new', 'replace', 'overwrite', 'fail'}, default 'new' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the default not 'error'?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mainly to preserve the current behaviour as the default. Happy to change
Not sure - it might be better as it’s own library
…Sent from my iPhone
On Mar 18, 2021, at 3:02 AM, Richard Shadrach ***@***.***> wrote:
@rhshadrach commented on this pull request.
In pandas/io/excel/_base.py:
> @@ -667,6 +667,17 @@ class ExcelWriter(metaclass=abc.ABCMeta):
be parsed by ``fsspec``, e.g., starting "s3://", "gcs://".
.. versionadded:: 1.2.0
+ if_sheet_exists : {'new', 'replace', 'overwrite', 'fail'}, default 'new'
+ How to behave when trying to write to a sheet that already
+ exists (append mode only).
+
+ * new: Create a new sheet, with a name determined by the engine.
+ * replace: Delete the contents of the sheet before writing to it.
We also don't support overwrite with any other type of IO methods
I don't believe other IO methods have the use case of writing to an existing template for formatting purposes, which is what I see as being the primary case here.
and to your point above there is a lot of ambiguity as to what "overwrite" should actually overwrite
That was not my point, I merely wanted to make sure existing formatting was preserved if the DataFrame was not formatted and indeed that is the case. Not certain what you find unclear.
or if it is something we discuss more and want to include in pandas we I think would just want to spend a little bit more time going over the API
Other than what name to call this, what API choices are there?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Hi @WillAyd @jreback @rhshadrach apologies for the delay on this. How would you like me to proceed with this PR? Propose three options:
Happy to hear any thoughts |
@mrob95 - I wonder if this should be a part of the |
the point of the On the subject of this PR. The problem i have with |
I'll remove |
For me, a use case is to write data to an existing excel sheet (a template) that already has a format. In this way, the result is automatically formatted. I've used this several times, and see people desiring to do so on SO. Since there is opposition to this feature, it seems like @mrob95's solution is a good compromise, at least when one has control of the template sheet (works for my use case). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
pandas/io/excel/_base.py
Outdated
) | ||
if if_sheet_exists and "r+" not in mode: | ||
raise ValueError("if_sheet_exists is only valid in append mode (mode='a')") | ||
if if_sheet_exists is None and "r+" in mode: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is the r+
condition here? (e.g. then if_sheet_exists) should always be a str at this point, right? (if its being written will be ignored anyhow), right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess not a big deal
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, fixed
# GH 40230 | ||
df = DataFrame({"fruit": ["pear"]}) | ||
with tm.ensure_clean(ext) as f: | ||
with pytest.raises(ValueError, match=re.escape(msg)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, so we completely ignore on other engines is fine. is there a reason to raise / warn in that case if its not None?
looks good, can you merge master and a couple of questions (really not a big deal), but an edge case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. @rhshadrach over to you