Skip to content

const weight packing support #146

Open
@ZhennanQin

Description

@ZhennanQin

During model inference, model weight is frozen and won't change between iterations. CPU prefers special weight layout to accelerate the execution, then we need to prepack the model weight before model execution. This issue covers below items:

  • Analyze how weight pre-packing is done in openvino.
  • Provide the RFC about how to do weight pre-packing in MLIR to meet openvino requirement.
  • Implement the weight pre-packing pass with current CPU pipeline to support BF16 MLP inference.

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions