const weight packing support

During model inference, model weight is frozen and won't change between iterations. CPU prefers special weight layout to accelerate the execution, then we need to prepack the model weight before model execution. This issue covers below items:

- Analyze how weight pre-packing is done in openvino.
- Provide the RFC about how to do weight pre-packing in MLIR to meet openvino requirement.
- Implement the weight pre-packing pass with current CPU pipeline to support BF16 MLP inference.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

const weight packing support #146

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

const weight packing support #146

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions