Update README, missed small discrep in adafactor min dim update

rwightman · rwightman · commit 886eb7793811 · 2024-11-29T10:57:47.000-08:00
diff --git a/README.md b/README.md
@@ -12,6 +12,20 @@
 
 ## What's New
 
+## Nov 28, 2024
+* More optimizers
+  * Add MARS optimizer (https://arxiv.org/abs/2411.10438, https://github.com/AGI-Arena/MARS)
+  * Add LaProp optimizer (https://arxiv.org/abs/2002.04839, https://github.com/Z-T-WANG/LaProp-Optimizer)
+  * Add masking from 'Cautious Optimizers' (https://arxiv.org/abs/2411.16085, https://github.com/kyleliang919/C-Optim) to Adafactor, Adafactor Big Vision, AdamW (legacy), Adopt, Lamb, LaProp, Lion, NadamW, RMSPropTF, SGDW
+  * Cleanup some docstrings and type annotations re optimizers and factory
+* Add MobileNet-V4 Conv Medium models pretrained on in12k and fine-tuned in1k @ 384x384
+  * https://huggingface.co/timm/mobilenetv4_conv_medium.e250_r384_in12k_ft_in1k
+  * https://huggingface.co/timm/mobilenetv4_conv_medium.e250_r384_in12k
+  * https://huggingface.co/timm/mobilenetv4_conv_medium.e180_ad_r384_in12k
+  * https://huggingface.co/timm/mobilenetv4_conv_medium.e180_r384_in12k
+* Add small cs3darknet, quite good for the speed
+  * https://huggingface.co/timm/cs3darknet_focus_s.ra4_e3600_r256_in1k
+
 ## Nov 12, 2024
 * Optimizer factory refactor
   * New factory works by registering optimizers using an OptimInfo dataclass w/ some key traits
@@ -463,12 +477,14 @@ Included optimizers available via `timm.optim.create_optimizer_v2` factory metho
 * `adahessian` by [David Samuel](https://github.com/davda54/ada-hessian) - https://arxiv.org/abs/2006.00719
 * `adamp` and `sgdp` by [Naver ClovAI](https://github.com/clovaai) - https://arxiv.org/abs/2006.08217
 * `adan` an implementation of Adan adapted from https://github.com/sail-sg/Adan - https://arxiv.org/abs/2208.06677
-* `adopt` - adapted from https://github.com/iShohei220/adopt - https://arxiv.org/abs/2411.02853
+* `adopt` ADOPT adapted from https://github.com/iShohei220/adopt - https://arxiv.org/abs/2411.02853
 * `lamb` an implementation of Lamb and LambC (w/ trust-clipping) cleaned up and modified to support use with XLA - https://arxiv.org/abs/1904.00962
+* `laprop` optimizer from https://github.com/Z-T-WANG/LaProp-Optimizer - https://arxiv.org/abs/2002.04839
 * `lars` an implementation of LARS and LARC (w/ trust-clipping) - https://arxiv.org/abs/1708.03888
 * `lion` and implementation of Lion adapted from https://github.com/google/automl/tree/master/lion - https://arxiv.org/abs/2302.06675
 * `lookahead` adapted from impl by [Liam](https://github.com/alphadl/lookahead.pytorch) - https://arxiv.org/abs/1907.08610
-* `madgrad` - and implementation of MADGRAD adapted from https://github.com/facebookresearch/madgrad - https://arxiv.org/abs/2101.11075
+* `madgrad` an implementation of MADGRAD adapted from https://github.com/facebookresearch/madgrad - https://arxiv.org/abs/2101.11075
+* `mars` MARS optimizer from https://github.com/AGI-Arena/MARS - https://arxiv.org/abs/2411.10438
 * `nadam` an implementation of Adam w/ Nesterov momentum
 * `nadamw` an impementation of AdamW (Adam w/ decoupled weight-decay) w/ Nesterov momentum. A simplified impl based on https://github.com/mlcommons/algorithmic-efficiency
 * `novograd` by [Masashi Kimura](https://github.com/convergence-lab/novograd) - https://arxiv.org/abs/1905.11286
@@ -477,6 +493,7 @@ Included optimizers available via `timm.optim.create_optimizer_v2` factory metho
 * `sgdw` and implementation of SGD w/ decoupled weight-decay
 * `fused<name>` optimizers by name with [NVIDIA Apex](https://github.com/NVIDIA/apex/tree/master/apex/optimizers) installed
 * `bnb<name>` optimizers by name with [BitsAndBytes](https://github.com/TimDettmers/bitsandbytes) installed
+* `cadamw`, `clion`, and more 'Cautious' optimizers from https://github.com/kyleliang919/C-Optim - https://arxiv.org/abs/2411.16085
 * `adam`, `adamw`, `rmsprop`, `adadelta`, `adagrad`, and `sgd` pass through to `torch.optim` implementations
 
 ### Augmentations
diff --git a/timm/optim/adafactor.py b/timm/optim/adafactor.py
@@ -83,7 +83,7 @@ def __setstate__(self, state):
         super().__setstate__(state)
         for group in self.param_groups:
             group.setdefault('caution', False)
-            group.setdefault('min_dim_size_to_factor', 32)
+            group.setdefault('min_dim_size_to_factor', 16)
 
     @staticmethod
     def _get_lr(param_group, param_state):