Skip to content

ENH: Add if_sheet_exists parameter to ExcelWriter #40231

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Apr 22, 2021

Conversation

mirober
Copy link
Contributor

@mirober mirober commented Mar 4, 2021

@pep8speaks
Copy link

pep8speaks commented Mar 4, 2021

Hello @mrob95! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2021-04-21 19:55:25 UTC

@mirober mirober changed the title FEAT: Add if_exists parameter to ExcelWriter ENH: Add if_exists parameter to ExcelWriter Mar 4, 2021
@jreback jreback added the IO Excel read_excel, to_excel label Mar 5, 2021
@mirober mirober changed the title ENH: Add if_exists parameter to ExcelWriter ENH: Add if_sheet_exists parameter to ExcelWriter Mar 7, 2021
Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Looking good, some comments below. Also, can you add test in test_writers for attempting to use if_sheet_exists with engines other than openpyxl.


* new: Create a new sheet, with a name determined by the engine.
* replace: Delete the contents of the sheet before writing to it.
* overwrite: Write directly to the named sheet
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the use case for overwrite?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Published statistics, at least in the UK, often have sheets which combine headings, date, description, data quality notes, etc with data tables. To automate something like this you would probably have a pre-written template and then write your data from pandas into specific sheets at specific locations.

An example I happened to be looking at recently, England's daily, weekly and monthly vaccination figures.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrob95 - does the current implementation in this PR overwrite the template formatting? E.g. if a column in the template is formatted to percent and I have a DataFrame with 0.5, will it be displayed in excel as 50%?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rhshadrach It doesn't overwrite cell formatting unless there is an alternative style set. E.g. df.style.set_properties(**{"number-format": "0.00%"}) will overwrite number formatting but otherwise the written cells inherit the previous formatting, including conditional formatting.

The only exception to this I can see is headers and indexes, which have a hardcoded style which I think will always overwrite certain formats (see def header_style in io/formats/excel.py). This may not be ideal for certain use cases but seems like a separate issue, about which there is already discussion (#25185).

Between this and pandas own excel formatting options I think the options are pretty good for styling tables written using if_sheet_exists="overwrite".

# GH 40230
df = DataFrame({"fruit": ["pear"]})
with tm.ensure_clean(ext) as f:
with pytest.raises(ValueError, match=re.escape(msg)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what about other engines?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Openpyxl is the only engine which supports append mode currently, and this option only affects append mode

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so we completely ignore on other engines is fine. is there a reason to raise / warn in that case if its not None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it's written at the moment passing if_sheet_exists in write mode will raise an error with all engines (mostly as a feedback mechanism for the user)

* replace: Delete the contents of the sheet before writing to it.
* overwrite: Write directly to the named sheet
without deleting the previous contents.
* fail: raise a ValueError.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rename to 'error'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do

@@ -667,6 +667,17 @@ class ExcelWriter(metaclass=abc.ABCMeta):
be parsed by ``fsspec``, e.g., starting "s3://", "gcs://".

.. versionadded:: 1.2.0
if_sheet_exists : {'new', 'replace', 'overwrite', 'fail'}, default 'new'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the default not 'error'?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mainly to preserve the current behaviour as the default. Happy to change

@WillAyd
Copy link
Member

WillAyd commented Mar 18, 2021 via email

@mirober
Copy link
Contributor Author

mirober commented Apr 3, 2021

Hi @WillAyd @jreback @rhshadrach apologies for the delay on this. How would you like me to proceed with this PR? Propose three options:

  • Remove the overwrite option as discussed above
  • Keep all of the current options
  • Rethink the PR. I can see that this parameter might seem too specific, wordy or confusing. A smaller change would be to resurrect BUG: update dict of sheets before check #27730, essentially changing the default behaviour of the openpyxl append mode from "new" to "overwrite". I think this is closer to what most people want to do with append mode, and would make it easier to implement the other behaviours in user code.

Happy to hear any thoughts

@rhshadrach
Copy link
Member

@mrob95 - I wonder if this should be a part of the engine_kwargs argument. We'd need to intercept and react to the argument there, and the docstring would need to be updated with a Notes section detailing this. On the one hand, it saves us from adding a new argument. But I also wonder if it might create confusion. Do users expect engine_kwargs to always be passed through to the engine?

cc @jreback and @WillAyd for any thoughts on this.

@jreback
Copy link
Contributor

jreback commented Apr 12, 2021

@mrob95 - I wonder if this should be a part of the engine_kwargs argument. We'd need to intercept and react to the argument there, and the docstring would need to be updated with a Notes section detailing this. On the one hand, it saves us from adding a new argument. But I also wonder if it might create confusion. Do users expect engine_kwargs to always be passed through to the engine?

cc @jreback and @WillAyd for any thoughts on this.

the point of the engine_kwargs is to pass them thru always, so not sure what else one would expect here.

On the subject of this PR. The problem i have with overwrite is that this is a mutating method, whereas we don't have this anywhere else. Why is creating a new sheet a burden?

@mirober
Copy link
Contributor Author

mirober commented Apr 13, 2021

I'll remove overwrite. Writing data in raw form to a back sheet and then referencing it is probably the best way to handle the kind of "data presentation" issues discussed above, and replace allows that.

@rhshadrach
Copy link
Member

@jreback

Why is creating a new sheet a burden?

For me, a use case is to write data to an existing excel sheet (a template) that already has a format. In this way, the result is automatically formatted. I've used this several times, and see people desiring to do so on SO.

Since there is opposition to this feature, it seems like @mrob95's solution is a good compromise, at least when one has control of the template sheet (works for my use case).

Copy link
Member

@rhshadrach rhshadrach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

)
if if_sheet_exists and "r+" not in mode:
raise ValueError("if_sheet_exists is only valid in append mode (mode='a')")
if if_sheet_exists is None and "r+" in mode:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the r+ condition here? (e.g. then if_sheet_exists) should always be a str at this point, right? (if its being written will be ignored anyhow), right?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess not a big deal

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, fixed

# GH 40230
df = DataFrame({"fruit": ["pear"]})
with tm.ensure_clean(ext) as f:
with pytest.raises(ValueError, match=re.escape(msg)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, so we completely ignore on other engines is fine. is there a reason to raise / warn in that case if its not None?

@jreback
Copy link
Contributor

jreback commented Apr 20, 2021

looks good, can you merge master and a couple of questions (really not a big deal), but an edge case.

@jreback jreback added this to the 1.3 milestone Apr 21, 2021
Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. @rhshadrach over to you

@rhshadrach rhshadrach merged commit ac8977f into pandas-dev:master Apr 22, 2021
yeshsurya pushed a commit to yeshsurya/pandas that referenced this pull request May 6, 2021
JulianWgs pushed a commit to JulianWgs/pandas that referenced this pull request Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement IO Excel read_excel, to_excel
Projects
None yet
Development

Successfully merging this pull request may close these issues.

ENH: Allow overwriting existing sheets when appending to excel files
6 participants