BUG: Parquet size grows exponential for categorical data 

### Pandas version checks

- [X] I have checked that this issue has not already been reported.

- [X] I have confirmed this bug exists on the [latest version](https://pandas.pydata.org/docs/whatsnew/index.html) of pandas.

- [ ] I have confirmed this bug exists on the [main branch](https://pandas.pydata.org/docs/dev/getting_started/install.html#installing-the-development-version-of-pandas) of pandas.


### Reproducible Example

```python
import pandas as pd
import os

if __name__ == "__main__":
    for n in [10, 1e2, 1e3, 1e4, 1e5]:
        for n_col in [1, 10, 100, 1000, 10000]:
            input = pd.DataFrame([{"{i}": f"{i}_cat" for col in range(n_col)} for i in range(int(n))])
            input.iloc[0:100].to_parquet("a.parquet")
            for col in input.columns:
                input[col] = input[col].astype("category")
            input.iloc[0:100].to_parquet("b.parquet")
            a_size_mb = os.stat("a.parquet").st_size / (1024 * 1024)
            b_size_mb = os.stat("b.parquet").st_size / (1024 * 1024)
            print(f"{n} {n_col} {a_size_mb} {b_size_mb} {100*b_size_mb/a_size_mb:.2f}")
```


### Issue Description


It seems that when saving a data frame with a categorical column inside the size can grow exponentially.

This seems to happen because when we save the categorical data to parquet, we are saving the data + all the categories existing in the original data. This happens even when the categories are not present in the original data.

To reproduce the bug, it is enough to run the script above.

That produces this output:

```
10 1 0.0015506744384765625 0.001689910888671875 108.98
10 10 0.0015506744384765625 0.001689910888671875 108.98
10 100 0.0015506744384765625 0.001689910888671875 108.98
10 1000 0.0015506744384765625 0.001689910888671875 108.98
10 10000 0.0015506744384765625 0.001689910888671875 108.98
100.0 1 0.0019960403442382812 0.0021104812622070312 105.73
100.0 10 0.0019960403442382812 0.0021104812622070312 105.73
100.0 100 0.0019960403442382812 0.0021104812622070312 105.73
100.0 1000 0.0019960403442382812 0.0021104812622070312 105.73
100.0 10000 0.0019960403442382812 0.0021104812622070312 105.73
1000.0 1 0.0019960403442382812 0.0053577423095703125 268.42
1000.0 10 0.0019960403442382812 0.0053577423095703125 268.42
1000.0 100 0.0019960403442382812 0.0053577423095703125 268.42
1000.0 1000 0.0019960403442382812 0.0053577423095703125 268.42
1000.0 10000 0.0019960403442382812 0.0053577423095703125 268.42
10000.0 1 0.0019960403442382812 0.042061805725097656 2107.26
10000.0 10 0.0019960403442382812 0.042061805725097656 2107.26
10000.0 100 0.0019960403442382812 0.042061805725097656 2107.26
10000.0 1000 0.0019960403442382812 0.042061805725097656 2107.26
10000.0 10000 0.0019960403442382812 0.042061805725097656 2107.26
100000.0 1 0.0019960403442382812 0.43596935272216797 21841.71
100000.0 10 0.0019960403442382812 0.43596935272216797 21841.71
100000.0 100 0.0019960403442382812 0.43596935272216797 21841.71
100000.0 1000 0.0019960403442382812 0.43596935272216797 21841.71
100000.0 10000 0.0019960403442382812 0.43596935272216797 21841.71
```



### Expected Behavior

In my opinion either:

1.   The two file should have (almost) the same size
2.   There should be warning telling the user that such difference in size is possible

### Installed Versions

<details>
INSTALLED VERSIONS
------------------
commit              : ba1cccd19da778f0c3a7d6a885685da16a072870
python              : 3.10.12.final.0
python-bits         : 64
OS                  : Linux
OS-release          : 5.15.120+
Version             : #1 SMP Wed Aug 30 11:19:59 UTC 2023
machine             : x86_64
processor           : x86_64
byteorder           : little
LC_ALL              : en_US.UTF-8
LANG                : en_US.UTF-8
LOCALE              : en_US.UTF-8

pandas              : 2.1.0
numpy               : 1.23.5
pytz                : 2023.3.post1
dateutil            : 2.8.2
setuptools          : 67.7.2
pip                 : 23.1.2
Cython              : 3.0.4
pytest              : 7.4.3
hypothesis          : None
sphinx              : 5.0.2
blosc               : None
feather             : None
xlsxwriter          : None
lxml.etree          : 4.9.3
html5lib            : 1.1
pymysql             : None
psycopg2            : 2.9.9
jinja2              : 3.1.2
IPython             : 7.34.0
pandas_datareader   : 0.10.0
bs4                 : 4.11.2
bottleneck          : None
dataframe-api-compat: None
fastparquet         : None
fsspec              : 2023.6.0
gcsfs               : 2023.6.0
matplotlib          : 3.7.1
numba               : 0.56.4
numexpr             : 2.8.7
odfpy               : None
openpyxl            : 3.1.2
pandas_gbq          : 0.17.9
pyarrow             : 9.0.0
pyreadstat          : None
pyxlsb              : None
s3fs                : None
scipy               : 1.11.3
sqlalchemy          : 2.0.22
tables              : 3.8.0
tabulate            : 0.9.0
xarray              : 2023.7.0
xlrd                : 2.0.1
zstandard           : None
tzdata              : 2023.3
qtpy                : None
pyqt5               : None
</details>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Parquet size grows exponential for categorical data #55776

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BUG: Parquet size grows exponential for categorical data #55776

Description

Pandas version checks

Reproducible Example

Issue Description

Expected Behavior

Installed Versions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions