-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Genet from timm #344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Genet from timm #344
Conversation
requirements.txt
Outdated
@@ -1,4 +1,4 @@ | |||
torchvision>=0.3.0 | |||
pretrainedmodels==0.7.4 | |||
efficientnet-pytorch==0.6.3 | |||
timm==0.3.2 | |||
git+https://github.com/rwightman/pytorch-image-models@d8e69206be253892b2956341fea09fdebfaae4e3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No release is out with genet yet, so we'll have to wait until release
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pip does not support such requirements
Possible solution: in encoders/init.py use try: except
while loading this encoder with suggestion to install latest source timm (before timm new release)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pip supports such requirement and pip install -r requirements.txt
works fine. It is still a temporary solution as timm releases often and in my opinion it's easier to wait.
import torch.nn as nn | ||
|
||
|
||
class GERNetEncoder(ByobNet, EncoderMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This inherits from byobnet same as new repvgg so we should consider adding generic ByoBnetEncoder to comply with future timm changes https://github.com/rwightman/pytorch-image-models/blob/d8e69206be253892b2956341fea09fdebfaae4e3/timm/models/byobnet.py#L645
|
||
|
||
class GERNetEncoder(ByobNet, EncoderMixin): | ||
def __init__(self, out_channels, depth=6, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For genet I've found 6(+rgb=7) stages and most others encoders have 5, maybe I've misunderstood the concept or we should join some stages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
each stage should reduce spatial resolution, you may combine several stages in one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which stages should we combine if the constant resolution stages are different for each version?
(3, 13, 48, 48, 384, 560, 1920) - s
(3, 32, 128, 192, 640, 640, 2560) - m, l
Combining different stages for different versions seems odd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is channels resolution, not spatial
try to print tensor shapes afrer each stage
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suppose you have image 224 224 3
stem 112 112 13
stage_1 64 64 48
stage_2 32 32 48
stage_3 32 32 386
etc...
so, you should combine stage_2 and stage_3
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s - [torch.Size([1, 3, 64, 64]), torch.Size([1, 13, 32, 32]), torch.Size([1, 48, 16, 16]), torch.Size([1, 48, 8, 8]), torch.Size([1, 384, 4, 4]), torch.Size([1, 560, 2, 2]), torch.Size([1, 1920, 2, 2])]
m, l - [torch.Size([1, 3, 64, 64]), torch.Size([1, 32, 32, 32]), torch.Size([1, 128, 16, 16]), torch.Size([1, 192, 8, 8]), torch.Size([1, 640, 4, 4]), torch.Size([1, 640, 2, 2]), torch.Size([1, 2560, 2, 2])]
Last 2 should be joined as I understood?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, and out_channels
should be modified:
(3, 13, 48, 48, 384, 560, 1920) ->
(3, 13, 48, 48, 384, 1920)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And depth to 5
for source_name, source_url in sources.items(): | ||
pretrained_settings[model_name][source_name] = { | ||
"url": source_url, | ||
'input_size': [3, 224, 224] if not model_name == 'timm-gernet_l' else [3, 256, 256], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure that such if else inside builder is appropriate, but it seems better than 3 separate large configs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
input_size
is not necessery kwarg, you may remove it
@@ -16,6 +16,7 @@ | |||
from .timm_res2net import timm_res2net_encoders | |||
from .timm_regnet import timm_regnet_encoders | |||
from .timm_sknet import timm_sknet_encoders | |||
from .timm_gernet import timm_gernet_encoders |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
try:
from .timm_gernet import timm_gernet_encoders
except <Exception> as e:
timm_gernet_encoders = {}
<warning>
Thanks for PR, I have left some comments about what needs to be corrected. And also, please, add information about Encoder in following places:
You can use misc/generate_table.py script to get number of params for encoders. |
Still got 2 questions regarding the Manet failing test and stage combining. |
Updated all the points. The only thing left is that manet test |
Could you patch MAnet decoder as follows and run tests: reduced_channels = max(1, skip_channels // reduction)
self.SE_ll = nn.Sequential(
nn.AdaptiveAvgPool2d(1),
nn.Conv2d(skip_channels, reduced_channels, 1),
nn.ReLU(inplace=True),
nn.Conv2d(reduced_channels, skip_channels, 1),
nn.Sigmoid(),
)
self.SE_hl = nn.Sequential(
nn.AdaptiveAvgPool2d(1),
nn.Conv2d(skip_channels, reduced_channels, 1),
nn.ReLU(inplace=True),
nn.Conv2d(reduced_channels, skip_channels, 1),
nn.Sigmoid(),
) P.S. The problem is in stem, it has just 13 channels, and reducing it by 16 lead to 0 channels, which raises error. |
And please add here (after that line):
to run tests with new encoder in CI |
All passing now |
I will try to test it on some of my tasks, and then merge |
timm is released |
|
Some concerns listed in code comments.
I didn't test it on the real world task
Tests pass and weights are loading