Skip to content

gh-132983: Introduce _zstd bindings module #133027

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 31 commits into
base: main
Choose a base branch
from

Conversation

emmatyping
Copy link
Member

@emmatyping emmatyping commented Apr 26, 2025

This is part 2 (but parallel to part 1) for implementing PEP 784. This is probably the meat of the implementation making up ~half of the LOC in the overall diff.

This change:

  • Adds the pyzstd license, since this is the first code introduced based on pyzstd
  • Adds the code for the _zstd module in Modules/_zstd. Tests will be included when the Python compression.zstd module is added. If people think the tests should be included, I can merge the compression.zstd changes into the PR, but I wanted to separate them to make the PRs a bit more manageable.
  • Adds unix build changes (Windows requires changes in cpython-source-deps and will be added in a follow-up PR)

I added skip news as I'd like to write a holistic NEWS/What's New entry once the entire implementation has landed. If people think each PR should have NEWS I can write something up.

TODO before merge:

  • try running tests on top of this branch
  • format PEP 7 style
  • run buildbots on this PR
  • verify _zstd gets installed

📚 Documentation preview 📚: https://cpython-previews--133027.org.readthedocs.build/

@emmatyping emmatyping requested a review from gpshead April 26, 2025 19:38
@emmatyping emmatyping changed the title gh-132983: Introduce _zstd module gh-132983: Introduce _zstd bindings module Apr 26, 2025
@emmatyping
Copy link
Member Author

emmatyping commented Apr 26, 2025

Hm, I'll dig into the C global failure. Didn't show up when the tests ran on the branch :(

E: now fixed

Copy link
Member

@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Some comments at a glance.

Please remove the module state macros. They make this really difficult to review, and don't really add much (is MS_MEMBER(whatever) really that much easier to type than state->whatever?)

@emmatyping
Copy link
Member Author

Thank you for the review @ZeroIntensity! Some of the decisions are just how things were in pyzstd, so I expect they very well should change for merging the code in. I'll go through your review and read up on ob_lock.

@ZeroIntensity
Copy link
Member

There's probably not too much to read about it other than somewhere in PEP-703. You'll just want to replace the locks with critical sections, which have some magic to prevent deadlocks.

Original implementation by Ma Lin with patches by Rogdham.
Refactored for the CPython standard library by Emma Harper Smith.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have generally stopped updating these but I don't know about new ones. With blame there is not much need.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is nice to credit where code comes from (especially since blame will only show me), but I am fine removing this if that is the convention.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

especially since blame will only show me

Perhaps add git trailers to credit Lin & Rodgham? Originally-Authored-By/Authored-By/etc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the originally part we can leave but the new part should go.

adding them to the commit wouldn’t hurt either.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll leave the module description and remove the rest. Perhaps when merged this PR can include the git trailers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay these should all be updated now.

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few preliminary comments, although I'm not sure it can be modified under the given license.

Copy link
Member

@picnixz picnixz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some other comments I managed to find. I didn't review the decompressor but I think most of my comments that I said for the compressor would apply. The most important ones are essentially function signatures where I want you to use PyObject *self and do a manual cast to the expected type inside the function:

#define MyType_CAST(op)	((MyType *)(op))

static void
my_method(PyObject *op) {
    MyType *self = MyType_CAST(op);
    ...
}

The use of a macro (or a static inline function, up to you) is to allow future runtime type checks.

}

typedef struct {
const int parameter;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

General comment: we don't use const T when T is a primitive type such as int even to convene the intent of a constant variable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true in argument positions too? e.g. should I remove the const Py_ssize_t from buffer.h?

Copy link
Member

@picnixz picnixz Apr 27, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have a bit of both.

  • Benefits:
    • avoid mistakes
    • convene the intent of a read-only value
  • Downsides:
    • adds more words to read
    • for a function parameter, it does nothing except for a pointer where it makes sense (it helps in its usage but it doesn't help more the compiler)

It's really up to you though, but I would say that it'd be good to be consistent in the const usage. I however haven't looked in all places to see if structures are declared with a const or not.

But we can discuss this in a separate PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the context. I will think about it. If the tradition is to not have it in functions or structs then I think it would be good to remove them throughout.

list = cp_list;
list_size = Py_ARRAY_LENGTH(cp_list);
type = "compression";
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit, but can you (in a follow-up PR if you want to) make it so that the new code follow PEP-7? this includes writing something like:

if (...) {
...
}
else {
...
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I plan on making the code PEP 7 compliant prior to merge, but want to work through the semantic/functional issues first.

/*[clinic input]
_zstd._train_dict
samples_bytes: PyBytesObject
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use 4-space indents here instead of 8 spaces in general? TiA.

}

static void
ZstdCompressor_dealloc(ZstdCompressor *self)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Methods for PyObjects must always take a PyObject * as self and make the cast inside otherwise you'll get an "incorrect function signature" as it's an undefined behavior if we call this function with a PyObject* at runtime.

Comment on lines 230 to 238
PyErr_SetString(PyExc_ValueError,
"Items in samples_size_list should be an int "
"object, with a size_t value.");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this exception can be triggered by Python code, we should avoid using C types in it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I went through all of the exception messages and tried to make sure this was done consistently. Hopefully they look better now!

emmatyping and others added 11 commits April 28, 2025 17:52
This commit introduces the `_zstd` module, with bindings to libzstd from
the pyzstd project. It also includes the unix build system configuration.
Windows build system support will be integrated independently as it
depends on integration with cpython-source-deps.
Also removes module state references from the classes in the _zstd
module and instead uses PyType_GetModuleState()
This should avoid races and deadlocks.
The `compress`/`decompress` functions will be moved to Python code for simplicity.
C implementations can always be re-added in the future.

Also, mark _zstd as not requiring the GIL.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants