Skip to content
This repository was archived by the owner on Mar 19, 2024. It is now read-only.

Commit 722a7cc

Browse files
prigoyalfacebook-github-bot
authored andcommitted
Add beit transformer models (#511)
Summary: Pull Request resolved: #511 as title Reviewed By: QuentinDuval Differential Revision: D33793945 fbshipit-source-id: 3e664fb7699beb04d012039e930149e8eb4b7617
1 parent 7337369 commit 722a7cc

File tree

3 files changed

+520
-0
lines changed

3 files changed

+520
-0
lines changed

vissl/config/defaults.yaml

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -692,6 +692,34 @@ config:
692692
QKV_BIAS: True # Bias for QKV in attention layers.
693693
QK_SCALE: False # Scale
694694

695+
# ------------------------------------------------------------- #
696+
# BEiT.
697+
# https://github.com/microsoft/unilm/blob/master/beit/modeling_finetune.py
698+
# https://arxiv.org/pdf/2106.08254.pdf
699+
# ------------------------------------------------------------- #
700+
BEIT:
701+
IMAGE_SIZE: 224
702+
PATCH_SIZE: 16
703+
NUM_LAYERS: 12
704+
NUM_HEADS: 12
705+
HIDDEN_DIM: 768
706+
MLP_RATIO: 4.0
707+
# MLP and projection layer dropout rate
708+
DROPOUT_RATE: 0
709+
# Attention dropout rate
710+
ATTENTION_DROPOUT_RATE: 0
711+
# Stochastic depth dropout rate. Turning on stochastic depth and
712+
# using aggressive augmentation is essentially the difference
713+
# between a DeiT and a ViT.
714+
DROP_PATH_RATE: 0
715+
QKV_BIAS: False # Bias for QKV in attention layers.
716+
QK_SCALE: False # Scale
717+
USE_ABS_POS_EMB: True
718+
USE_REL_POS_BIAS: False
719+
USE_SHARED_REL_POS_BIAS: False
720+
USE_MEAN_POOLING: True
721+
INIT_VALUES: False
722+
695723
# ------------------------------------------------------------- #
696724
# Parameters unique to the ConViT and not used for standard vision
697725
# transformers

vissl/models/model_helpers.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -649,6 +649,9 @@ def __init__(self, drop_prob=None):
649649
def forward(self, x):
650650
return drop_path(x, self.drop_prob, self.training)
651651

652+
def extra_repr(self) -> str:
653+
return "p={}".format(self.drop_prob)
654+
652655

653656
to_1tuple = _ntuple(1)
654657
to_2tuple = _ntuple(2)

0 commit comments

Comments
 (0)