Skip to content

Add NamedData to flat_tensor schema #9571

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 25, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 15 additions & 2 deletions extension/flat_tensor/serialize/flat_tensor.fbs
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,8 @@ table TensorMetadata {
// To retrieve a given tensor:
// 1. segment_base_offset: from the file header.
// 2. segment_offset: segments[segment_index].offset
// 3. tensor_offset: segments[segment_offset].tensor_metadata[j].offset
// Find the relevant index j by matching on tensor fqn.
// 3. tensor_offset: the offset within the segment. If there is only one item
// in the segment, offset=0.
offset: uint64;
}

Expand All @@ -55,6 +55,15 @@ table DataSegment {
size: uint64;
}

// Attributes a name to data referenced by FlatTensor.segments.
table NamedData {
// The unique id of the data blob.
key: string;

// Index of the segment in FlatTensor.segments.
segment_index: uint32;
}

// FlatTensor is a flatbuffer-based format for storing and loading tensors.
table FlatTensor {
// Schema version.
Expand All @@ -70,6 +79,10 @@ table FlatTensor {
// List of data segments that follow the FlatTensor data in this file, sorted by
// offset. Elements in this schema can refer to these segments by index.
segments: [DataSegment];

// List of blobs keyed by a unique name. Note that multiple 'NamedData'
// entries could point to the same segment index.
named_data: [NamedData];
}

root_type FlatTensor;
7 changes: 7 additions & 0 deletions extension/flat_tensor/serialize/flat_tensor_schema.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,16 @@ class DataSegment:
size: int


@dataclass
class NamedData:
key: str
segment_index: int


@dataclass
class FlatTensor:
version: int
tensor_alignment: int
tensors: List[TensorMetadata]
segments: List[DataSegment]
named_data: List[NamedData]
1 change: 1 addition & 0 deletions extension/flat_tensor/serialize/serialize.py
Original file line number Diff line number Diff line change
Expand Up @@ -282,6 +282,7 @@ def serialize(
tensor_alignment=self.config.tensor_alignment,
tensors=flat_tensor_metadata,
segments=[DataSegment(offset=0, size=len(flat_tensor_data))],
named_data=[],
)

flatbuffer_payload = _serialize_to_flatbuffer(flat_tensor)
Expand Down
Loading