Skip to content

Commit 886eb77

Browse files
committed
Update README, missed small discrep in adafactor min dim update
1 parent e3e434b commit 886eb77

File tree

2 files changed

+20
-3
lines changed

2 files changed

+20
-3
lines changed

README.md

Lines changed: 19 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,20 @@
1212

1313
## What's New
1414

15+
## Nov 28, 2024
16+
* More optimizers
17+
* Add MARS optimizer (https://arxiv.org/abs/2411.10438, https://github.com/AGI-Arena/MARS)
18+
* Add LaProp optimizer (https://arxiv.org/abs/2002.04839, https://github.com/Z-T-WANG/LaProp-Optimizer)
19+
* Add masking from 'Cautious Optimizers' (https://arxiv.org/abs/2411.16085, https://github.com/kyleliang919/C-Optim) to Adafactor, Adafactor Big Vision, AdamW (legacy), Adopt, Lamb, LaProp, Lion, NadamW, RMSPropTF, SGDW
20+
* Cleanup some docstrings and type annotations re optimizers and factory
21+
* Add MobileNet-V4 Conv Medium models pretrained on in12k and fine-tuned in1k @ 384x384
22+
* https://huggingface.co/timm/mobilenetv4_conv_medium.e250_r384_in12k_ft_in1k
23+
* https://huggingface.co/timm/mobilenetv4_conv_medium.e250_r384_in12k
24+
* https://huggingface.co/timm/mobilenetv4_conv_medium.e180_ad_r384_in12k
25+
* https://huggingface.co/timm/mobilenetv4_conv_medium.e180_r384_in12k
26+
* Add small cs3darknet, quite good for the speed
27+
* https://huggingface.co/timm/cs3darknet_focus_s.ra4_e3600_r256_in1k
28+
1529
## Nov 12, 2024
1630
* Optimizer factory refactor
1731
* New factory works by registering optimizers using an OptimInfo dataclass w/ some key traits
@@ -463,12 +477,14 @@ Included optimizers available via `timm.optim.create_optimizer_v2` factory metho
463477
* `adahessian` by [David Samuel](https://github.com/davda54/ada-hessian) - https://arxiv.org/abs/2006.00719
464478
* `adamp` and `sgdp` by [Naver ClovAI](https://github.com/clovaai) - https://arxiv.org/abs/2006.08217
465479
* `adan` an implementation of Adan adapted from https://github.com/sail-sg/Adan - https://arxiv.org/abs/2208.06677
466-
* `adopt` - adapted from https://github.com/iShohei220/adopt - https://arxiv.org/abs/2411.02853
480+
* `adopt` ADOPT adapted from https://github.com/iShohei220/adopt - https://arxiv.org/abs/2411.02853
467481
* `lamb` an implementation of Lamb and LambC (w/ trust-clipping) cleaned up and modified to support use with XLA - https://arxiv.org/abs/1904.00962
482+
* `laprop` optimizer from https://github.com/Z-T-WANG/LaProp-Optimizer - https://arxiv.org/abs/2002.04839
468483
* `lars` an implementation of LARS and LARC (w/ trust-clipping) - https://arxiv.org/abs/1708.03888
469484
* `lion` and implementation of Lion adapted from https://github.com/google/automl/tree/master/lion - https://arxiv.org/abs/2302.06675
470485
* `lookahead` adapted from impl by [Liam](https://github.com/alphadl/lookahead.pytorch) - https://arxiv.org/abs/1907.08610
471-
* `madgrad` - and implementation of MADGRAD adapted from https://github.com/facebookresearch/madgrad - https://arxiv.org/abs/2101.11075
486+
* `madgrad` an implementation of MADGRAD adapted from https://github.com/facebookresearch/madgrad - https://arxiv.org/abs/2101.11075
487+
* `mars` MARS optimizer from https://github.com/AGI-Arena/MARS - https://arxiv.org/abs/2411.10438
472488
* `nadam` an implementation of Adam w/ Nesterov momentum
473489
* `nadamw` an impementation of AdamW (Adam w/ decoupled weight-decay) w/ Nesterov momentum. A simplified impl based on https://github.com/mlcommons/algorithmic-efficiency
474490
* `novograd` by [Masashi Kimura](https://github.com/convergence-lab/novograd) - https://arxiv.org/abs/1905.11286
@@ -477,6 +493,7 @@ Included optimizers available via `timm.optim.create_optimizer_v2` factory metho
477493
* `sgdw` and implementation of SGD w/ decoupled weight-decay
478494
* `fused<name>` optimizers by name with [NVIDIA Apex](https://github.com/NVIDIA/apex/tree/master/apex/optimizers) installed
479495
* `bnb<name>` optimizers by name with [BitsAndBytes](https://github.com/TimDettmers/bitsandbytes) installed
496+
* `cadamw`, `clion`, and more 'Cautious' optimizers from https://github.com/kyleliang919/C-Optim - https://arxiv.org/abs/2411.16085
480497
* `adam`, `adamw`, `rmsprop`, `adadelta`, `adagrad`, and `sgd` pass through to `torch.optim` implementations
481498

482499
### Augmentations

timm/optim/adafactor.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ def __setstate__(self, state):
8383
super().__setstate__(state)
8484
for group in self.param_groups:
8585
group.setdefault('caution', False)
86-
group.setdefault('min_dim_size_to_factor', 32)
86+
group.setdefault('min_dim_size_to_factor', 16)
8787

8888
@staticmethod
8989
def _get_lr(param_group, param_state):

0 commit comments

Comments
 (0)