-
-
Notifications
You must be signed in to change notification settings - Fork 31.7k
gh-132983: Introduce _zstd
bindings module
#133027
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
_zstd
module_zstd
bindings module
E: now fixed |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Some comments at a glance.
Please remove the module state macros. They make this really difficult to review, and don't really add much (is MS_MEMBER(whatever)
really that much easier to type than state->whatever
?)
Thank you for the review @ZeroIntensity! Some of the decisions are just how things were in |
There's probably not too much to read about it other than somewhere in PEP-703. You'll just want to replace the locks with critical sections, which have some magic to prevent deadlocks. |
Modules/_zstd/_zstdmodule.h
Outdated
Original implementation by Ma Lin with patches by Rogdham. | ||
Refactored for the CPython standard library by Emma Harper Smith. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have generally stopped updating these but I don't know about new ones. With blame there is not much need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is nice to credit where code comes from (especially since blame will only show me), but I am fine removing this if that is the convention.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
especially since blame will only show me
Perhaps add git trailers to credit Lin & Rodgham? Originally-Authored-By
/Authored-By
/etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the originally part we can leave but the new part should go.
adding them to the commit wouldn’t hurt either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll leave the module description and remove the rest. Perhaps when merged this PR can include the git trailers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay these should all be updated now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Few preliminary comments, although I'm not sure it can be modified under the given license.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some other comments I managed to find. I didn't review the decompressor but I think most of my comments that I said for the compressor would apply. The most important ones are essentially function signatures where I want you to use PyObject *self
and do a manual cast to the expected type inside the function:
#define MyType_CAST(op) ((MyType *)(op))
static void
my_method(PyObject *op) {
MyType *self = MyType_CAST(op);
...
}
The use of a macro (or a static inline function, up to you) is to allow future runtime type checks.
Modules/_zstd/_zstdmodule.c
Outdated
} | ||
|
||
typedef struct { | ||
const int parameter; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
General comment: we don't use const T
when T
is a primitive type such as int even to convene the intent of a constant variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this true in argument positions too? e.g. should I remove the const Py_ssize_t
from buffer.h?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have a bit of both.
- Benefits:
- avoid mistakes
- convene the intent of a read-only value
- Downsides:
- adds more words to read
- for a function parameter, it does nothing except for a pointer where it makes sense (it helps in its usage but it doesn't help more the compiler)
It's really up to you though, but I would say that it'd be good to be consistent in the const
usage. I however haven't looked in all places to see if structures are declared with a const
or not.
But we can discuss this in a separate PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the context. I will think about it. If the tradition is to not have it in functions or structs then I think it would be good to remove them throughout.
list = cp_list; | ||
list_size = Py_ARRAY_LENGTH(cp_list); | ||
type = "compression"; | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit, but can you (in a follow-up PR if you want to) make it so that the new code follow PEP-7? this includes writing something like:
if (...) {
...
}
else {
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I plan on making the code PEP 7 compliant prior to merge, but want to work through the semantic/functional issues first.
/*[clinic input] | ||
_zstd._train_dict | ||
samples_bytes: PyBytesObject |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use 4-space indents here instead of 8 spaces in general? TiA.
} | ||
|
||
static void | ||
ZstdCompressor_dealloc(ZstdCompressor *self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Methods for PyObjects must always take a PyObject *
as self
and make the cast inside otherwise you'll get an "incorrect function signature" as it's an undefined behavior if we call this function with a PyObject*
at runtime.
Modules/_zstd/_zstdmodule.c
Outdated
PyErr_SetString(PyExc_ValueError, | ||
"Items in samples_size_list should be an int " | ||
"object, with a size_t value."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this exception can be triggered by Python code, we should avoid using C types in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went through all of the exception messages and tried to make sure this was done consistently. Hopefully they look better now!
This commit introduces the `_zstd` module, with bindings to libzstd from the pyzstd project. It also includes the unix build system configuration. Windows build system support will be integrated independently as it depends on integration with cpython-source-deps.
Also removes module state references from the classes in the _zstd module and instead uses PyType_GetModuleState()
Co-authored-by: Stan Ulbrych <[email protected]>
This should avoid races and deadlocks.
The `compress`/`decompress` functions will be moved to Python code for simplicity. C implementations can always be re-added in the future. Also, mark _zstd as not requiring the GIL.
Co-authored-by: Adam Turner <[email protected]>
... for `get_zstd_state_from_type` Co-authored-by: Bénédikt Tran <[email protected]>
PyModule_AddObject is soft-deprecated, so we should use PyModule_AddObjectRef
Co-authored-by: Bénédikt Tran <[email protected]>
Co-authored-by: Bénédikt Tran <[email protected]>
da9c225
to
cd2f085
Compare
We should instead document that to pickle a dictionary a user should use the `.dict_content` attribute.
This is part 2 (but parallel to part 1) for implementing PEP 784. This is probably the meat of the implementation making up ~half of the LOC in the overall diff.
This change:
_zstd
module inModules/_zstd
. Tests will be included when the Pythoncompression.zstd
module is added. If people think the tests should be included, I can merge thecompression.zstd
changes into the PR, but I wanted to separate them to make the PRs a bit more manageable.I added skip news as I'd like to write a holistic NEWS/What's New entry once the entire implementation has landed. If people think each PR should have NEWS I can write something up.
TODO before merge:
_zstd
gets installed📚 Documentation preview 📚: https://cpython-previews--133027.org.readthedocs.build/