Add Beit segmentation model

# Add Beit to SMP

BEiT-3 is a general-purpose multimodal foundation model developed by Microsoft that excels in various vision and vision-language tasks, including semantic segmentation. It employs a unified architecture with Multiway Transformers, enabling both deep fusion and modality-specific encoding. Pretrained using a masked "language" modeling approach on images ("Imglish"), texts, and image-text pairs, BEiT-3 effectively models images as another language. This design allows it to achieve state-of-the-art performance across a wide range of tasks, such as object detection, image classification, and semantic segmentation.

 - Achieves top 1 results on ADE20K-val

Papers with Code:
https://paperswithcode.com/paper/image-as-a-foreign-language-beit-pretraining

Paper:
https://arxiv.org/abs/2208.10442

HF reference implementation:
https://huggingface.co/docs/transformers/model_doc/beit
https://github.com/huggingface/transformers/blob/v4.47.1/src/transformers/models/beit/modeling_beit.py

## Comments

As an example pls see the latest model additions:

 - https://github.com/qubvel-org/segmentation_models.pytorch/pull/944
 - https://github.com/qubvel-org/segmentation_models.pytorch/pull/926

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Beit segmentation model #1024

Add Beit to SMP

Comments

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add Beit segmentation model #1024

Description

Add Beit to SMP

Comments

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions