blosc2.jit support for pandas UDFs

xref https://github.com/pandas-dev/pandas/issues/61125

We discussed this informally in the past, sharing more clearly how blosc2.jit and pandas can interact.

I'm about to open a PR in pandas to support this:

```python
import pandas
import blosc2

def my_func(x):
    return np.sin(x * 2)

s = pandas.Series([1, 2, 3], index=list('abc'), name='sample')

# normal call executed by pandas
print(s.map(my_func))

# we let blosc2 handle this
print(s.map(my_func, engine=blosc2.jit))
```

To be able to do this, we would need blosc2 to implement a new interface. The implementation shouldn't be too complex, something like (the example ignores `skip_na` and another method `apply` for column-wise operations (function being called with the whole array, not each scalar):

```python
import numpy as np
import blosc2

# Reference base class: https://github.com/pandas-dev/pandas/blob/main/pandas/core/apply.py#L77
class Blosc2ExecutionEngine:
    @staticmethod
    def map(data, func, args, kwargs, decorator, skip_na):
        if not isinstance(data, np.ndarray):
            # we probably received a Series
            if hasattr(data, "values"):
                data = data.values
            else:
                # there is a chance that we call this with a pyarrow object in the future
                raise ValueError("blosc2.jit does not support {data.__name__}")
                
        func = decorator(func)
        result = func(data, *args, **kwargs)
        return result


blosc2.jit.__pandas_udf__ = Blosc2ExecutionEngine
```

The advantage of this approach over just decorating the function is that the whole execution loop can be jitted, not only the individual calls.

What do you think? Is this something you'd like to implement? Any feedback? It's designed in a way that you don't need to add a dependency on pandas. We aim to have Numba and Bodo supporting this same interface, and possibly others.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

blosc2.jit support for pandas UDFs #383

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

blosc2.jit support for pandas UDFs #383

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions