Open
Description
During model inference, model weight is frozen and won't change between iterations. CPU prefers special weight layout to accelerate the execution, then we need to prepack the model weight before model execution. This issue covers below items:
- Analyze how weight pre-packing is done in openvino.
- Provide the RFC about how to do weight pre-packing in MLIR to meet openvino requirement.
- Implement the weight pre-packing pass with current CPU pipeline to support BF16 MLP inference.