Skip to content

PYTHON-5172 bugfix: Add __repr__ and __eq__ to bson.binary.BinaryVector #2162

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Mar 10, 2025
Merged
12 changes: 10 additions & 2 deletions bson/binary.py
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,6 @@
from __future__ import annotations

import struct
from dataclasses import dataclass
from enum import Enum
from typing import TYPE_CHECKING, Any, Optional, Sequence, Tuple, Type, Union, overload
from uuid import UUID
Expand Down Expand Up @@ -227,7 +226,6 @@ class BinaryVectorDtype(Enum):
PACKED_BIT = b"\x10"


@dataclass
class BinaryVector:
"""Vector of numbers along with metadata for binary interoperability.
.. versionadded:: 4.10
Expand All @@ -247,6 +245,16 @@ def __init__(self, data: Sequence[float | int], dtype: BinaryVectorDtype, paddin
self.dtype = dtype
self.padding = padding

def __repr__(self) -> str:
return f"BinaryVector(dtype={self.dtype}, padding={self.padding}, data={self.data})"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hold up, the real bug here is that we're using @dataclass but also defining __init__. Why is that? The whole point of dataclass is that it provides __init__ and __repr__ automatically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that we had issues with the combo of {dataclass, slots, defaults}. So I've just removed the dataclass decorator.

Copy link
Member

@ShaneHarvey ShaneHarvey Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any other features that will break by removing @dataclass? We may be stuck with it until 5.0 if it's a breaking change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so. As mentioned in the closed PR, we've done init, and repr, and we don't want to rely on equality except via the binary representation.

Copy link
Contributor Author

@caseyclements caseyclements Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code seems happy to be a dataclass or not. I have no strong preference.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@ShaneHarvey ShaneHarvey Mar 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you show an example of == and > before and after this change? I'm wondering if the old code (with @dataclass) even works at all.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. It's easy to try. Put a breakpoint on line 83 of test_bson_binary.

# With the decorator:
vector_obs == vector_exp.as_vector()
True
vector_obs > vector_exp.as_vector()
# TypeError: '>' not supported between instances of 'BinaryVector' and 'BinaryVector'

# Without the decorator:
vector_obs == vector_exp.as_vector()
False
vector_obs > vector_exp.as_vector()
# TypeError: '>' not supported between instances of 'BinaryVector' and 'BinaryVector'

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

== is already broken with our current @dataclass approach:

>>> from bson.binary import *
>>> BinaryVector([1], BinaryVectorDtype.INT8) == BinaryVector([2], BinaryVectorDtype.INT8)
True

So we need to remove dataclass and implement == as well as __repr__.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Take a look. I hate subtleties with == and floats..

def __eq__(self, other: Any) -> bool:
if not isinstance(other, BinaryVector):
return False
return (
self.dtype == other.dtype and self.padding == other.padding and self.data == other.data
)


class Binary(bytes):
"""Representation of BSON binary data.
Expand Down
58 changes: 58 additions & 0 deletions test/test_bson.py
Original file line number Diff line number Diff line change
Expand Up @@ -809,6 +809,64 @@ def test_vector(self):
dtype=BinaryVectorDtype.PACKED_BIT,
) # type: ignore[call-overload]

def assertRepr(self, obj):
new_obj = eval(repr(obj))
self.assertEqual(type(new_obj), type(obj))
self.assertEqual(repr(new_obj), repr(obj))

def test_binaryvector_repr(self):
"""Tests of repr(BinaryVector)"""

data = [1 / 127, -7 / 6]
one = BinaryVector(data, BinaryVectorDtype.FLOAT32)
self.assertEqual(
repr(one), f"BinaryVector(dtype=BinaryVectorDtype.FLOAT32, padding=0, data={data})"
)
self.assertRepr(one)

data = [127, 7]
two = BinaryVector(data, BinaryVectorDtype.INT8)
self.assertEqual(
repr(two), f"BinaryVector(dtype=BinaryVectorDtype.INT8, padding=0, data={data})"
)
self.assertRepr(two)

three = BinaryVector(data, BinaryVectorDtype.INT8, padding=0)
self.assertEqual(
repr(three), f"BinaryVector(dtype=BinaryVectorDtype.INT8, padding=0, data={data})"
)
self.assertRepr(three)

four = BinaryVector(data, BinaryVectorDtype.PACKED_BIT, padding=3)
self.assertEqual(
repr(four), f"BinaryVector(dtype=BinaryVectorDtype.PACKED_BIT, padding=3, data={data})"
)
self.assertRepr(four)

zero = BinaryVector([], BinaryVectorDtype.INT8)
self.assertEqual(
repr(zero), "BinaryVector(dtype=BinaryVectorDtype.INT8, padding=0, data=[])"
)
self.assertRepr(zero)

def test_binaryvector_equality(self):
"""Tests of == __eq__"""
self.assertEqual(
BinaryVector([1.2, 1 - 1 / 3], BinaryVectorDtype.FLOAT32, 0),
BinaryVector([1.2, 1 - 1.0 / 3.0], BinaryVectorDtype.FLOAT32, 0),
)
self.assertNotEqual(
BinaryVector([1.2, 1 - 1 / 3], BinaryVectorDtype.FLOAT32, 0),
BinaryVector([1.2, 6.0 / 9.0], BinaryVectorDtype.FLOAT32, 0),
)
self.assertEqual(
BinaryVector([], BinaryVectorDtype.FLOAT32, 0),
BinaryVector([], BinaryVectorDtype.FLOAT32, 0),
)
self.assertNotEqual(
BinaryVector([1], BinaryVectorDtype.INT8), BinaryVector([2], BinaryVectorDtype.INT8)
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add asserts using assertRepr like we do in test_connection_monitoring.py?:

    def assertRepr(self, obj):
        new_obj = eval(repr(obj))
        self.assertEqual(type(new_obj), type(obj))
        self.assertEqual(repr(new_obj), repr(obj))

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. That's very cool, that one can create an object using eval(repr(obj))


def test_unicode_regex(self):
"""Tests we do not get a segfault for C extension on unicode RegExs.
This had been happening.
Expand Down
Loading