-
Notifications
You must be signed in to change notification settings - Fork 6k
[LoRA] support more comyui loras for Flux 🚨 #10985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
This pops up if I do a full text search:
|
There are some |
You mean When doing a LoRA merge in comfy you subtract the base model from the LoRAed model. Maybe diff is hinting at that and it's some temporaries. Probably doing costly SVD on all that zeros too. |
Yes, indeed. I will push the I think this particular LoRA in question probably stretches the different LoRA shenanigans the most. But the rest of the ComfyUI LoRAs should hopefully be loadable with this PR as is. Thanks for helping! |
FluxDFaeTasticDetails does load and gives a similar style to what is shown in the LoRA gallery. Prompt: a majestic magical koi fish swimming gracefully through a serene pond. The koi fish has shimmering, iridescent scales that glow softly, illuminating the water around it, colored ink painting on parchment, black and white ink with splashes of color |
Nice, that is promising. Do keep testing other LoRAs (apart from the one we discussed already) and ping me if there's something off. |
If |
Regarding parameters ending in .diff or .diff_b. I crawled all my local LoRAs and there they appear only in LoRAs from this guy with the RM_ prefix and are all completely zero. I tried "Extract and Save LoRA" myself and found a bias_dif switch. With this checked, extracting a LoRA from a "finetuned" transformer (often just someone merging several LoRAs into the transformer and distributing that I suspect) I get a few nonzero .diff_b parameters. Pretty small values but not entirely zero. Didn't check extracting from text_encoder. So in conclusion I think .dif is supposed to be lora_A.bias and .dif_b is lora_B.bias with a yet to be determined impact on generations and if they are important when extracting from an actually fully finetuned transformer. |
The xlabs LoRA works for me too |
Ah sorry, I think there is a regression with vanilla Kohya-LoRAs now. |
@spezialspezial that should be fixed now. Would be nice if you could check it again. |
@spezialspezial I have also pushed a couple of updates in 5c4976b to better filter T5 and other |
The WizardWhitebeard LoRA (kind: kohya, dtype: bfloat16, rank: 20) works again |
Not expecting you to fix it but check out this odd bird: https://civitai.com/models/1269030?modelVersionId=1431428
Pay attention to the smaller dimension of the tensors. |
RandomMaxx Illustrify (kind: comfyui, dtype: float16, rank: 32) works |
Yeah it has different ranks for different layers -- something we have supported for a long time. Any errors? |
Yeah, fails for me with: size mismatch for transformer_blocks.17.attn.to_v.lora_B.Masha Babko.weight: copying a param with shape torch.Size([3072, 10]) from checkpoint, the shape in current model is torch.Size([3072, 8]). |
Okay then for some reason, the rank pattern is not getting updated. Will take a look soon after the ICCV submissions. Hope that is okay. |
You have a talk there? Nice! Is it about fighting sleeplessness from seeing too many obscure LoRAs? |
@BenjaminBossan @hlky the Code: from diffusers import DiffusionPipeline
import torch
repo_id = "black-forest-labs/FLUX.1-dev"
lora_name = "babko.safetensors"
pipeline = DiffusionPipeline.from_pretrained(repo_id, torch_dtype=torch.bfloat16)
pipeline.load_lora_weights(
"sayakpaul/different-lora-from-civitai", weight_name=lora_name
) Consider the following two keys:
This becomes ambiguous for the adjustment method even after #10808 and #11004. I am running out of sane ways to handle this, hence the mention here. Maybe the PEFT PR here could be prioritized. The LoRA tested in #10808 didn't have the issue because it has the same rank for all the layers unlike the current LoRA we're testing. This is what is happening to the config before and after the adjustment: Before{
"r": 4,
"lora_alpha": 10,
"rank_pattern": {
"transformer_blocks.0.attn.to_out.0": 10,
"transformer_blocks.0.attn.to_q": 10,
"transformer_blocks.0.attn.to_k": 10,
"transformer_blocks.0.attn.to_v": 10,
"transformer_blocks.0.ff.net.0.proj": 18,
"transformer_blocks.0.ff.net.2": 11,
"transformer_blocks.0.norm1.linear": 12,
"transformer_blocks.0.attn.to_add_out": 3,
"transformer_blocks.0.ff_context.net.0.proj": 2,
"transformer_blocks.0.norm1_context.linear": 2,
"transformer_blocks.1.attn.to_out.0": 10,
"transformer_blocks.1.attn.to_q": 9,
"transformer_blocks.1.attn.to_k": 9,
"transformer_blocks.1.attn.to_v": 9,
"transformer_blocks.1.ff.net.0.proj": 29,
"transformer_blocks.1.ff.net.2": 9,
"transformer_blocks.1.norm1.linear": 20,
"transformer_blocks.1.attn.to_add_out": 3,
"transformer_blocks.1.attn.add_q_proj": 3,
"transformer_blocks.1.attn.add_k_proj": 3,
"transformer_blocks.1.attn.add_v_proj": 3,
"transformer_blocks.1.ff_context.net.0.proj": 3,
"transformer_blocks.1.ff_context.net.2": 2,
"transformer_blocks.1.norm1_context.linear": 2,
"transformer_blocks.2.attn.to_out.0": 13,
"transformer_blocks.2.attn.to_q": 17,
"transformer_blocks.2.attn.to_k": 17,
"transformer_blocks.2.attn.to_v": 17,
"transformer_blocks.2.ff.net.0.proj": 26,
"transformer_blocks.2.ff.net.2": 8,
"transformer_blocks.2.norm1.linear": 20,
"transformer_blocks.2.attn.to_add_out": 5,
"transformer_blocks.3.attn.to_out.0": 13,
"transformer_blocks.3.attn.to_q": 13,
"transformer_blocks.3.attn.to_k": 13,
"transformer_blocks.3.attn.to_v": 13,
"transformer_blocks.3.ff.net.0.proj": 11,
"transformer_blocks.3.ff.net.2": 5,
"transformer_blocks.3.norm1.linear": 10,
"transformer_blocks.3.attn.to_add_out": 5,
"transformer_blocks.3.attn.add_q_proj": 7,
"transformer_blocks.3.attn.add_k_proj": 7,
"transformer_blocks.3.attn.add_v_proj": 7,
"transformer_blocks.3.ff_context.net.0.proj": 3,
"transformer_blocks.3.ff_context.net.2": 5,
"transformer_blocks.3.norm1_context.linear": 5,
"transformer_blocks.4.attn.to_out.0": 8,
"transformer_blocks.4.attn.to_q": 16,
"transformer_blocks.4.attn.to_k": 16,
"transformer_blocks.4.attn.to_v": 16,
"transformer_blocks.4.ff.net.0.proj": 8,
"transformer_blocks.4.ff.net.2": 6,
"transformer_blocks.4.norm1.linear": 8,
"transformer_blocks.4.attn.to_add_out": 5,
"transformer_blocks.4.attn.add_q_proj": 6,
"transformer_blocks.4.attn.add_k_proj": 6,
"transformer_blocks.4.attn.add_v_proj": 6,
"transformer_blocks.4.ff_context.net.0.proj": 3,
"transformer_blocks.4.norm1_context.linear": 5,
"transformer_blocks.5.attn.to_out.0": 9,
"transformer_blocks.5.attn.to_q": 8,
"transformer_blocks.5.attn.to_k": 8,
"transformer_blocks.5.attn.to_v": 8,
"transformer_blocks.5.ff.net.0.proj": 10,
"transformer_blocks.5.ff.net.2": 7,
"transformer_blocks.5.norm1.linear": 8,
"transformer_blocks.5.attn.add_q_proj": 3,
"transformer_blocks.5.attn.add_k_proj": 3,
"transformer_blocks.5.attn.add_v_proj": 3,
"transformer_blocks.5.ff_context.net.2": 2,
"transformer_blocks.5.norm1_context.linear": 3,
"transformer_blocks.6.attn.to_out.0": 11,
"transformer_blocks.6.attn.to_q": 11,
"transformer_blocks.6.attn.to_k": 11,
"transformer_blocks.6.attn.to_v": 11,
"transformer_blocks.6.ff.net.0.proj": 11,
"transformer_blocks.6.ff.net.2": 5,
"transformer_blocks.6.norm1.linear": 8,
"transformer_blocks.6.attn.to_add_out": 5,
"transformer_blocks.6.attn.add_q_proj": 3,
"transformer_blocks.6.attn.add_k_proj": 3,
"transformer_blocks.6.attn.add_v_proj": 3,
"transformer_blocks.6.ff_context.net.0.proj": 5,
"transformer_blocks.6.norm1_context.linear": 2,
"transformer_blocks.7.attn.to_out.0": 12,
"transformer_blocks.7.attn.to_q": 10,
"transformer_blocks.7.attn.to_k": 10,
"transformer_blocks.7.attn.to_v": 10,
"transformer_blocks.7.ff.net.0.proj": 12,
"transformer_blocks.7.ff.net.2": 5,
"transformer_blocks.7.norm1.linear": 8,
"transformer_blocks.7.attn.add_q_proj": 5,
"transformer_blocks.7.attn.add_k_proj": 5,
"transformer_blocks.7.attn.add_v_proj": 5,
"transformer_blocks.7.ff_context.net.0.proj": 3,
"transformer_blocks.7.norm1_context.linear": 6,
"transformer_blocks.8.attn.to_out.0": 13,
"transformer_blocks.8.attn.to_q": 13,
"transformer_blocks.8.attn.to_k": 13,
"transformer_blocks.8.attn.to_v": 13,
"transformer_blocks.8.ff.net.0.proj": 9,
"transformer_blocks.8.ff.net.2": 6,
"transformer_blocks.8.norm1.linear": 7,
"transformer_blocks.8.attn.to_add_out": 7,
"transformer_blocks.8.ff_context.net.2": 6,
"transformer_blocks.8.norm1_context.linear": 3,
"transformer_blocks.9.attn.to_out.0": 7,
"transformer_blocks.9.attn.to_q": 10,
"transformer_blocks.9.attn.to_k": 10,
"transformer_blocks.9.attn.to_v": 10,
"transformer_blocks.9.ff.net.0.proj": 12,
"transformer_blocks.9.ff.net.2": 6,
"transformer_blocks.9.norm1.linear": 8,
"transformer_blocks.9.attn.to_add_out": 7,
"transformer_blocks.9.attn.add_q_proj": 6,
"transformer_blocks.9.attn.add_k_proj": 6,
"transformer_blocks.9.attn.add_v_proj": 6,
"transformer_blocks.9.ff_context.net.0.proj": 3,
"transformer_blocks.9.ff_context.net.2": 5,
"transformer_blocks.10.attn.to_out.0": 6,
"transformer_blocks.10.attn.to_q": 7,
"transformer_blocks.10.attn.to_k": 7,
"transformer_blocks.10.attn.to_v": 7,
"transformer_blocks.10.ff.net.0.proj": 8,
"transformer_blocks.10.ff.net.2": 5,
"transformer_blocks.10.norm1.linear": 7,
"transformer_blocks.10.attn.to_add_out": 6,
"transformer_blocks.10.attn.add_q_proj": 7,
"transformer_blocks.10.attn.add_k_proj": 7,
"transformer_blocks.10.attn.add_v_proj": 7,
"transformer_blocks.10.ff_context.net.0.proj": 3,
"transformer_blocks.10.norm1_context.linear": 7,
"transformer_blocks.11.attn.to_out.0": 9,
"transformer_blocks.11.attn.to_q": 9,
"transformer_blocks.11.attn.to_k": 9,
"transformer_blocks.11.attn.to_v": 9,
"transformer_blocks.11.ff.net.0.proj": 10,
"transformer_blocks.11.ff.net.2": 6,
"transformer_blocks.11.norm1.linear": 7,
"transformer_blocks.11.attn.to_add_out": 3,
"transformer_blocks.11.attn.add_q_proj": 6,
"transformer_blocks.11.attn.add_k_proj": 6,
"transformer_blocks.11.attn.add_v_proj": 6,
"transformer_blocks.11.ff_context.net.0.proj": 3,
"transformer_blocks.11.ff_context.net.2": 5,
"transformer_blocks.12.attn.to_out.0": 11,
"transformer_blocks.12.attn.to_q": 15,
"transformer_blocks.12.attn.to_k": 15,
"transformer_blocks.12.attn.to_v": 15,
"transformer_blocks.12.ff.net.0.proj": 12,
"transformer_blocks.12.ff.net.2": 6,
"transformer_blocks.12.norm1.linear": 7,
"transformer_blocks.12.attn.to_add_out": 5,
"transformer_blocks.12.attn.add_q_proj": 5,
"transformer_blocks.12.attn.add_k_proj": 5,
"transformer_blocks.12.attn.add_v_proj": 5,
"transformer_blocks.12.ff_context.net.0.proj": 3,
"transformer_blocks.12.norm1_context.linear": 5,
"transformer_blocks.13.attn.to_out.0": 8,
"transformer_blocks.13.attn.to_q": 8,
"transformer_blocks.13.attn.to_k": 8,
"transformer_blocks.13.attn.to_v": 8,
"transformer_blocks.13.ff.net.0.proj": 9,
"transformer_blocks.13.norm1.linear": 5,
"transformer_blocks.13.attn.to_add_out": 5,
"transformer_blocks.13.ff_context.net.0.proj": 2,
"transformer_blocks.13.ff_context.net.2": 3,
"transformer_blocks.13.norm1_context.linear": 6,
"transformer_blocks.14.attn.to_out.0": 11,
"transformer_blocks.14.attn.to_q": 10,
"transformer_blocks.14.attn.to_k": 10,
"transformer_blocks.14.attn.to_v": 10,
"transformer_blocks.14.ff.net.0.proj": 12,
"transformer_blocks.14.ff.net.2": 6,
"transformer_blocks.14.norm1.linear": 5,
"transformer_blocks.14.attn.to_add_out": 3,
"transformer_blocks.14.attn.add_q_proj": 5,
"transformer_blocks.14.attn.add_k_proj": 5,
"transformer_blocks.14.attn.add_v_proj": 5,
"transformer_blocks.14.ff_context.net.0.proj": 3,
"transformer_blocks.14.norm1_context.linear": 3,
"transformer_blocks.15.attn.to_out.0": 9,
"transformer_blocks.15.attn.to_q": 8,
"transformer_blocks.15.attn.to_k": 8,
"transformer_blocks.15.attn.to_v": 8,
"transformer_blocks.15.ff.net.0.proj": 12,
"transformer_blocks.15.ff.net.2": 7,
"transformer_blocks.15.norm1.linear": 6,
"transformer_blocks.15.attn.add_q_proj": 3,
"transformer_blocks.15.attn.add_k_proj": 3,
"transformer_blocks.15.attn.add_v_proj": 3,
"transformer_blocks.16.attn.to_out.0": 10,
"transformer_blocks.16.attn.to_q": 13,
"transformer_blocks.16.attn.to_k": 13,
"transformer_blocks.16.attn.to_v": 13,
"transformer_blocks.16.ff.net.0.proj": 12,
"transformer_blocks.16.ff.net.2": 7,
"transformer_blocks.16.norm1.linear": 6,
"transformer_blocks.16.attn.add_q_proj": 6,
"transformer_blocks.16.attn.add_k_proj": 6,
"transformer_blocks.16.attn.add_v_proj": 6,
"transformer_blocks.16.ff_context.net.2": 5,
"transformer_blocks.16.norm1_context.linear": 3,
"transformer_blocks.17.attn.to_out.0": 8,
"transformer_blocks.17.attn.to_q": 10,
"transformer_blocks.17.attn.to_k": 10,
"transformer_blocks.17.attn.to_v": 10,
"transformer_blocks.17.ff.net.0.proj": 9,
"transformer_blocks.17.ff.net.2": 9,
"transformer_blocks.17.norm1.linear": 8,
"transformer_blocks.17.attn.to_add_out": 5,
"transformer_blocks.17.attn.add_q_proj": 5,
"transformer_blocks.17.attn.add_k_proj": 5,
"transformer_blocks.17.attn.add_v_proj": 5,
"transformer_blocks.17.ff_context.net.0.proj": 2,
"transformer_blocks.17.norm1_context.linear": 2,
"transformer_blocks.18.attn.to_out.0": 5,
"transformer_blocks.18.attn.to_q": 8,
"transformer_blocks.18.attn.to_k": 8,
"transformer_blocks.18.attn.to_v": 8,
"transformer_blocks.18.ff.net.0.proj": 11,
"transformer_blocks.18.ff.net.2": 8,
"transformer_blocks.18.norm1.linear": 6,
"transformer_blocks.18.ff_context.net.0.proj": 3,
"transformer_blocks.18.norm1_context.linear": 7,
"single_transformer_blocks.0.attn.to_q": 18,
"single_transformer_blocks.0.attn.to_k": 18,
"single_transformer_blocks.0.attn.to_v": 18,
"single_transformer_blocks.0.proj_mlp": 18,
"single_transformer_blocks.0.proj_out": 8,
"single_transformer_blocks.0.norm.linear": 7,
"single_transformer_blocks.1.attn.to_q": 22,
"single_transformer_blocks.1.attn.to_k": 22,
"single_transformer_blocks.1.attn.to_v": 22,
"single_transformer_blocks.1.proj_mlp": 22,
"single_transformer_blocks.1.proj_out": 7,
"single_transformer_blocks.1.norm.linear": 7,
"single_transformer_blocks.2.attn.to_q": 18,
"single_transformer_blocks.2.attn.to_k": 18,
"single_transformer_blocks.2.attn.to_v": 18,
"single_transformer_blocks.2.proj_mlp": 18,
"single_transformer_blocks.2.proj_out": 7,
"single_transformer_blocks.2.norm.linear": 7,
"single_transformer_blocks.3.attn.to_q": 21,
"single_transformer_blocks.3.attn.to_k": 21,
"single_transformer_blocks.3.attn.to_v": 21,
"single_transformer_blocks.3.proj_mlp": 21,
"single_transformer_blocks.3.proj_out": 7,
"single_transformer_blocks.3.norm.linear": 8,
"single_transformer_blocks.4.attn.to_q": 18,
"single_transformer_blocks.4.attn.to_k": 18,
"single_transformer_blocks.4.attn.to_v": 18,
"single_transformer_blocks.4.proj_mlp": 18,
"single_transformer_blocks.4.proj_out": 8,
"single_transformer_blocks.4.norm.linear": 8,
"single_transformer_blocks.5.attn.to_q": 20,
"single_transformer_blocks.5.attn.to_k": 20,
"single_transformer_blocks.5.attn.to_v": 20,
"single_transformer_blocks.5.proj_mlp": 20,
"single_transformer_blocks.5.proj_out": 8,
"single_transformer_blocks.5.norm.linear": 9,
"single_transformer_blocks.6.attn.to_q": 20,
"single_transformer_blocks.6.attn.to_k": 20,
"single_transformer_blocks.6.attn.to_v": 20,
"single_transformer_blocks.6.proj_mlp": 20,
"single_transformer_blocks.6.proj_out": 6,
"single_transformer_blocks.6.norm.linear": 10,
"single_transformer_blocks.7.attn.to_q": 21,
"single_transformer_blocks.7.attn.to_k": 21,
"single_transformer_blocks.7.attn.to_v": 21,
"single_transformer_blocks.7.proj_mlp": 21,
"single_transformer_blocks.7.proj_out": 8,
"single_transformer_blocks.7.norm.linear": 10,
"single_transformer_blocks.8.attn.to_q": 19,
"single_transformer_blocks.8.attn.to_k": 19,
"single_transformer_blocks.8.attn.to_v": 19,
"single_transformer_blocks.8.proj_mlp": 19,
"single_transformer_blocks.8.proj_out": 7,
"single_transformer_blocks.8.norm.linear": 7,
"single_transformer_blocks.9.attn.to_q": 22,
"single_transformer_blocks.9.attn.to_k": 22,
"single_transformer_blocks.9.attn.to_v": 22,
"single_transformer_blocks.9.proj_mlp": 22,
"single_transformer_blocks.9.proj_out": 9,
"single_transformer_blocks.9.norm.linear": 8,
"single_transformer_blocks.10.attn.to_q": 22,
"single_transformer_blocks.10.attn.to_k": 22,
"single_transformer_blocks.10.attn.to_v": 22,
"single_transformer_blocks.10.proj_mlp": 22,
"single_transformer_blocks.10.proj_out": 11,
"single_transformer_blocks.10.norm.linear": 11,
"single_transformer_blocks.11.attn.to_q": 22,
"single_transformer_blocks.11.attn.to_k": 22,
"single_transformer_blocks.11.attn.to_v": 22,
"single_transformer_blocks.11.proj_mlp": 22,
"single_transformer_blocks.11.proj_out": 10,
"single_transformer_blocks.11.norm.linear": 10,
"single_transformer_blocks.12.attn.to_q": 23,
"single_transformer_blocks.12.attn.to_k": 23,
"single_transformer_blocks.12.attn.to_v": 23,
"single_transformer_blocks.12.proj_mlp": 23,
"single_transformer_blocks.12.proj_out": 10,
"single_transformer_blocks.12.norm.linear": 8,
"single_transformer_blocks.13.attn.to_q": 22,
"single_transformer_blocks.13.attn.to_k": 22,
"single_transformer_blocks.13.attn.to_v": 22,
"single_transformer_blocks.13.proj_mlp": 22,
"single_transformer_blocks.13.proj_out": 10,
"single_transformer_blocks.13.norm.linear": 12,
"single_transformer_blocks.14.attn.to_q": 24,
"single_transformer_blocks.14.attn.to_k": 24,
"single_transformer_blocks.14.attn.to_v": 24,
"single_transformer_blocks.14.proj_mlp": 24,
"single_transformer_blocks.14.proj_out": 11,
"single_transformer_blocks.14.norm.linear": 11,
"single_transformer_blocks.15.attn.to_q": 25,
"single_transformer_blocks.15.attn.to_k": 25,
"single_transformer_blocks.15.attn.to_v": 25,
"single_transformer_blocks.15.proj_mlp": 25,
"single_transformer_blocks.15.proj_out": 10,
"single_transformer_blocks.15.norm.linear": 11,
"single_transformer_blocks.16.attn.to_q": 26,
"single_transformer_blocks.16.attn.to_k": 26,
"single_transformer_blocks.16.attn.to_v": 26,
"single_transformer_blocks.16.proj_mlp": 26,
"single_transformer_blocks.16.proj_out": 10,
"single_transformer_blocks.16.norm.linear": 9,
"single_transformer_blocks.17.attn.to_q": 28,
"single_transformer_blocks.17.attn.to_k": 28,
"single_transformer_blocks.17.attn.to_v": 28,
"single_transformer_blocks.17.proj_mlp": 28,
"single_transformer_blocks.17.proj_out": 10,
"single_transformer_blocks.17.norm.linear": 9,
"single_transformer_blocks.18.attn.to_q": 28,
"single_transformer_blocks.18.attn.to_k": 28,
"single_transformer_blocks.18.attn.to_v": 28,
"single_transformer_blocks.18.proj_mlp": 28,
"single_transformer_blocks.18.proj_out": 8,
"single_transformer_blocks.18.norm.linear": 11,
"single_transformer_blocks.19.attn.to_q": 27,
"single_transformer_blocks.19.attn.to_k": 27,
"single_transformer_blocks.19.attn.to_v": 27,
"single_transformer_blocks.19.proj_mlp": 27,
"single_transformer_blocks.19.proj_out": 10,
"single_transformer_blocks.19.norm.linear": 10,
"single_transformer_blocks.20.attn.to_q": 29,
"single_transformer_blocks.20.attn.to_k": 29,
"single_transformer_blocks.20.attn.to_v": 29,
"single_transformer_blocks.20.proj_mlp": 29,
"single_transformer_blocks.20.proj_out": 11,
"single_transformer_blocks.20.norm.linear": 12,
"single_transformer_blocks.21.attn.to_q": 26,
"single_transformer_blocks.21.attn.to_k": 26,
"single_transformer_blocks.21.attn.to_v": 26,
"single_transformer_blocks.21.proj_mlp": 26,
"single_transformer_blocks.21.proj_out": 11,
"single_transformer_blocks.21.norm.linear": 11,
"single_transformer_blocks.22.attn.to_q": 26,
"single_transformer_blocks.22.attn.to_k": 26,
"single_transformer_blocks.22.attn.to_v": 26,
"single_transformer_blocks.22.proj_mlp": 26,
"single_transformer_blocks.22.proj_out": 11,
"single_transformer_blocks.22.norm.linear": 10,
"single_transformer_blocks.23.attn.to_q": 30,
"single_transformer_blocks.23.attn.to_k": 30,
"single_transformer_blocks.23.attn.to_v": 30,
"single_transformer_blocks.23.proj_mlp": 30,
"single_transformer_blocks.23.proj_out": 11,
"single_transformer_blocks.23.norm.linear": 11,
"single_transformer_blocks.24.attn.to_q": 28,
"single_transformer_blocks.24.attn.to_k": 28,
"single_transformer_blocks.24.attn.to_v": 28,
"single_transformer_blocks.24.proj_mlp": 28,
"single_transformer_blocks.24.proj_out": 12,
"single_transformer_blocks.24.norm.linear": 11,
"single_transformer_blocks.25.attn.to_q": 28,
"single_transformer_blocks.25.attn.to_k": 28,
"single_transformer_blocks.25.attn.to_v": 28,
"single_transformer_blocks.25.proj_mlp": 28,
"single_transformer_blocks.25.proj_out": 14,
"single_transformer_blocks.25.norm.linear": 9,
"single_transformer_blocks.26.attn.to_q": 28,
"single_transformer_blocks.26.attn.to_k": 28,
"single_transformer_blocks.26.attn.to_v": 28,
"single_transformer_blocks.26.proj_mlp": 28,
"single_transformer_blocks.26.proj_out": 13,
"single_transformer_blocks.26.norm.linear": 9,
"single_transformer_blocks.27.attn.to_q": 29,
"single_transformer_blocks.27.attn.to_k": 29,
"single_transformer_blocks.27.attn.to_v": 29,
"single_transformer_blocks.27.proj_mlp": 29,
"single_transformer_blocks.27.proj_out": 14,
"single_transformer_blocks.27.norm.linear": 8,
"single_transformer_blocks.28.attn.to_q": 29,
"single_transformer_blocks.28.attn.to_k": 29,
"single_transformer_blocks.28.attn.to_v": 29,
"single_transformer_blocks.28.proj_mlp": 29,
"single_transformer_blocks.28.proj_out": 14,
"single_transformer_blocks.28.norm.linear": 12,
"single_transformer_blocks.29.attn.to_q": 30,
"single_transformer_blocks.29.attn.to_k": 30,
"single_transformer_blocks.29.attn.to_v": 30,
"single_transformer_blocks.29.proj_mlp": 30,
"single_transformer_blocks.29.proj_out": 14,
"single_transformer_blocks.29.norm.linear": 10,
"single_transformer_blocks.30.attn.to_q": 32,
"single_transformer_blocks.30.attn.to_k": 32,
"single_transformer_blocks.30.attn.to_v": 32,
"single_transformer_blocks.30.proj_mlp": 32,
"single_transformer_blocks.30.proj_out": 15,
"single_transformer_blocks.30.norm.linear": 9,
"single_transformer_blocks.31.attn.to_q": 30,
"single_transformer_blocks.31.attn.to_k": 30,
"single_transformer_blocks.31.attn.to_v": 30,
"single_transformer_blocks.31.proj_mlp": 30,
"single_transformer_blocks.31.proj_out": 16,
"single_transformer_blocks.31.norm.linear": 11,
"single_transformer_blocks.32.attn.to_q": 30,
"single_transformer_blocks.32.attn.to_k": 30,
"single_transformer_blocks.32.attn.to_v": 30,
"single_transformer_blocks.32.proj_mlp": 30,
"single_transformer_blocks.32.proj_out": 15,
"single_transformer_blocks.32.norm.linear": 9,
"single_transformer_blocks.33.attn.to_q": 32,
"single_transformer_blocks.33.attn.to_k": 32,
"single_transformer_blocks.33.attn.to_v": 32,
"single_transformer_blocks.33.proj_mlp": 32,
"single_transformer_blocks.33.proj_out": 16,
"single_transformer_blocks.33.norm.linear": 8,
"single_transformer_blocks.34.attn.to_q": 30,
"single_transformer_blocks.34.attn.to_k": 30,
"single_transformer_blocks.34.attn.to_v": 30,
"single_transformer_blocks.34.proj_mlp": 30,
"single_transformer_blocks.34.proj_out": 16,
"single_transformer_blocks.34.norm.linear": 7,
"single_transformer_blocks.35.attn.to_q": 31,
"single_transformer_blocks.35.attn.to_k": 31,
"single_transformer_blocks.35.attn.to_v": 31,
"single_transformer_blocks.35.proj_mlp": 31,
"single_transformer_blocks.35.proj_out": 14,
"single_transformer_blocks.35.norm.linear": 8,
"single_transformer_blocks.36.attn.to_q": 30,
"single_transformer_blocks.36.attn.to_k": 30,
"single_transformer_blocks.36.attn.to_v": 30,
"single_transformer_blocks.36.proj_mlp": 30,
"single_transformer_blocks.36.proj_out": 12,
"single_transformer_blocks.36.norm.linear": 9,
"single_transformer_blocks.37.attn.to_q": 17,
"single_transformer_blocks.37.attn.to_k": 17,
"single_transformer_blocks.37.attn.to_v": 17,
"single_transformer_blocks.37.proj_mlp": 17,
"single_transformer_blocks.37.proj_out": 7,
"single_transformer_blocks.37.norm.linear": 9
},
"alpha_pattern": {},
"target_modules": [
"transformer_blocks.11.ff_context.net.2",
"single_transformer_blocks.23.attn.to_k",
"transformer_blocks.8.attn.add_v_proj",
"single_transformer_blocks.5.proj_out",
"transformer_blocks.18.attn.to_add_out",
"single_transformer_blocks.33.attn.to_v",
"transformer_blocks.17.norm1.linear",
"single_transformer_blocks.1.proj_mlp",
"transformer_blocks.4.attn.to_v",
"transformer_blocks.7.attn.to_v",
"transformer_blocks.13.attn.to_add_out",
"transformer_blocks.10.ff.net.2",
"single_transformer_blocks.0.attn.to_v",
"single_transformer_blocks.22.attn.to_v",
"transformer_blocks.6.ff_context.net.2",
"transformer_blocks.15.attn.to_q",
"single_transformer_blocks.25.proj_mlp",
"single_transformer_blocks.17.attn.to_q",
"single_transformer_blocks.26.proj_out",
"single_transformer_blocks.30.attn.to_k",
"single_transformer_blocks.32.attn.to_v",
"transformer_blocks.1.ff.net.2",
"single_transformer_blocks.10.proj_out",
"single_transformer_blocks.14.norm.linear",
"transformer_blocks.15.attn.to_out.0",
"transformer_blocks.17.ff.net.2",
"single_transformer_blocks.34.attn.to_v",
"single_transformer_blocks.4.proj_mlp",
"transformer_blocks.7.attn.to_q",
"transformer_blocks.0.ff.net.0.proj",
"transformer_blocks.14.attn.add_v_proj",
"transformer_blocks.8.ff_context.net.0.proj",
"single_transformer_blocks.28.attn.to_q",
"transformer_blocks.13.attn.add_k_proj",
"single_transformer_blocks.2.proj_mlp",
"single_transformer_blocks.21.attn.to_v",
"single_transformer_blocks.9.attn.to_k",
"single_transformer_blocks.9.attn.to_v",
"transformer_blocks.17.attn.to_v",
"single_transformer_blocks.31.attn.to_k",
"transformer_blocks.10.norm1_context.linear",
"transformer_blocks.17.attn.add_q_proj",
"transformer_blocks.6.ff.net.2",
"transformer_blocks.12.ff.net.0.proj",
"single_transformer_blocks.1.attn.to_k",
"transformer_blocks.1.attn.to_v",
"single_transformer_blocks.23.attn.to_q",
"transformer_blocks.13.ff.net.0.proj",
"single_transformer_blocks.17.proj_mlp",
"transformer_blocks.8.attn.add_k_proj",
"transformer_blocks.5.ff_context.net.0.proj",
"single_transformer_blocks.5.norm.linear",
"transformer_blocks.15.attn.to_k",
"single_transformer_blocks.20.attn.to_v",
"single_transformer_blocks.36.attn.to_q",
"transformer_blocks.9.norm1.linear",
"single_transformer_blocks.5.attn.to_q",
"single_transformer_blocks.10.attn.to_v",
"single_transformer_blocks.30.attn.to_q",
"transformer_blocks.12.attn.to_add_out",
"transformer_blocks.12.ff_context.net.2",
"transformer_blocks.11.ff_context.net.0.proj",
"transformer_blocks.2.ff_context.net.0.proj",
"single_transformer_blocks.6.attn.to_q",
"single_transformer_blocks.3.attn.to_k",
"single_transformer_blocks.13.attn.to_k",
"single_transformer_blocks.16.attn.to_q",
"single_transformer_blocks.27.attn.to_q",
"transformer_blocks.7.ff_context.net.0.proj",
"single_transformer_blocks.23.proj_out",
"transformer_blocks.12.attn.add_k_proj",
"single_transformer_blocks.15.attn.to_q",
"single_transformer_blocks.37.proj_out",
"transformer_blocks.3.attn.to_v",
"transformer_blocks.17.attn.add_v_proj",
"transformer_blocks.2.norm1_context.linear",
"single_transformer_blocks.32.norm.linear",
"transformer_blocks.12.ff.net.2",
"single_transformer_blocks.20.attn.to_k",
"transformer_blocks.2.ff.net.2",
"single_transformer_blocks.3.proj_out",
"transformer_blocks.18.attn.to_q",
"single_transformer_blocks.14.attn.to_q",
"single_transformer_blocks.11.proj_mlp",
"single_transformer_blocks.14.attn.to_v",
"transformer_blocks.13.attn.add_q_proj",
"transformer_blocks.15.ff_context.net.0.proj",
"single_transformer_blocks.8.proj_mlp",
"single_transformer_blocks.16.attn.to_k",
"single_transformer_blocks.27.norm.linear",
"transformer_blocks.8.ff.net.0.proj",
"single_transformer_blocks.33.norm.linear",
"transformer_blocks.10.attn.to_add_out",
"single_transformer_blocks.33.proj_out",
"transformer_blocks.16.attn.add_v_proj",
"transformer_blocks.12.attn.to_out.0",
"transformer_blocks.15.attn.add_q_proj",
"transformer_blocks.2.attn.to_v",
"single_transformer_blocks.17.proj_out",
"transformer_blocks.5.attn.add_k_proj",
"transformer_blocks.12.attn.to_q",
"single_transformer_blocks.0.attn.to_q",
"transformer_blocks.1.attn.add_q_proj",
"transformer_blocks.14.attn.to_out.0",
"single_transformer_blocks.33.attn.to_k",
"transformer_blocks.3.ff.net.0.proj",
"single_transformer_blocks.18.proj_mlp",
"single_transformer_blocks.26.attn.to_k",
"transformer_blocks.4.attn.add_q_proj",
"transformer_blocks.10.attn.to_out.0",
"transformer_blocks.13.ff_context.net.2",
"transformer_blocks.6.ff.net.0.proj",
"transformer_blocks.2.attn.to_q",
"transformer_blocks.17.attn.to_add_out",
"single_transformer_blocks.24.attn.to_q",
"single_transformer_blocks.32.attn.to_q",
"transformer_blocks.16.ff.net.2",
"single_transformer_blocks.12.norm.linear",
"transformer_blocks.4.attn.to_add_out",
"single_transformer_blocks.28.attn.to_k",
"transformer_blocks.18.ff_context.net.2",
"transformer_blocks.9.attn.to_q",
"transformer_blocks.8.norm1.linear",
"transformer_blocks.16.attn.to_k",
"transformer_blocks.13.attn.to_q",
"transformer_blocks.5.attn.to_add_out",
"transformer_blocks.7.attn.to_add_out",
"transformer_blocks.9.attn.add_v_proj",
"transformer_blocks.1.attn.to_add_out",
"transformer_blocks.1.ff_context.net.0.proj",
"transformer_blocks.2.ff.net.0.proj",
"transformer_blocks.7.ff.net.2",
"transformer_blocks.13.attn.add_v_proj",
"single_transformer_blocks.35.attn.to_k",
"single_transformer_blocks.29.norm.linear",
"transformer_blocks.11.attn.to_out.0",
"transformer_blocks.8.ff_context.net.2",
"transformer_blocks.12.norm1.linear",
"transformer_blocks.7.norm1.linear",
"single_transformer_blocks.9.attn.to_q",
"transformer_blocks.9.attn.add_k_proj",
"single_transformer_blocks.14.attn.to_k",
"single_transformer_blocks.3.attn.to_v",
"single_transformer_blocks.37.norm.linear",
"transformer_blocks.13.attn.to_v",
"single_transformer_blocks.0.proj_mlp",
"transformer_blocks.0.attn.to_q",
"transformer_blocks.6.attn.to_add_out",
"transformer_blocks.12.attn.to_v",
"single_transformer_blocks.4.proj_out",
"single_transformer_blocks.24.attn.to_v",
"single_transformer_blocks.4.attn.to_v",
"transformer_blocks.3.attn.to_k",
"transformer_blocks.5.ff_context.net.2",
"single_transformer_blocks.15.proj_out",
"transformer_blocks.14.ff.net.2",
"transformer_blocks.0.attn.add_v_proj",
"single_transformer_blocks.15.norm.linear",
"single_transformer_blocks.19.attn.to_q",
"single_transformer_blocks.35.norm.linear",
"transformer_blocks.11.attn.to_q",
"single_transformer_blocks.6.proj_mlp",
"transformer_blocks.16.attn.to_add_out",
"single_transformer_blocks.14.proj_mlp",
"transformer_blocks.10.attn.add_v_proj",
"transformer_blocks.4.ff_context.net.2",
"transformer_blocks.15.attn.to_add_out",
"single_transformer_blocks.27.attn.to_v",
"transformer_blocks.2.attn.to_k",
"transformer_blocks.7.attn.add_q_proj",
"single_transformer_blocks.1.norm.linear",
"single_transformer_blocks.24.attn.to_k",
"transformer_blocks.18.attn.add_q_proj",
"transformer_blocks.14.ff_context.net.0.proj",
"single_transformer_blocks.6.attn.to_k",
"single_transformer_blocks.37.attn.to_v",
"transformer_blocks.0.attn.add_q_proj",
"transformer_blocks.0.attn.to_out.0",
"single_transformer_blocks.5.attn.to_v",
"single_transformer_blocks.5.attn.to_k",
"transformer_blocks.18.norm1_context.linear",
"transformer_blocks.4.ff.net.2",
"single_transformer_blocks.29.attn.to_k",
"transformer_blocks.6.ff_context.net.0.proj",
"transformer_blocks.9.attn.to_k",
"single_transformer_blocks.12.attn.to_q",
"single_transformer_blocks.37.attn.to_k",
"single_transformer_blocks.22.proj_mlp",
"transformer_blocks.15.attn.add_v_proj",
"transformer_blocks.4.norm1_context.linear",
"single_transformer_blocks.13.norm.linear",
"single_transformer_blocks.36.norm.linear",
"transformer_blocks.5.norm1.linear",
"transformer_blocks.9.ff_context.net.0.proj",
"single_transformer_blocks.36.attn.to_v",
"transformer_blocks.6.norm1.linear",
"transformer_blocks.15.norm1.linear",
"single_transformer_blocks.16.proj_out",
"single_transformer_blocks.15.attn.to_v",
"transformer_blocks.10.attn.add_q_proj",
"transformer_blocks.2.norm1.linear",
"single_transformer_blocks.29.proj_out",
"single_transformer_blocks.28.attn.to_v",
"single_transformer_blocks.0.proj_out",
"single_transformer_blocks.28.proj_out",
"single_transformer_blocks.21.norm.linear",
"single_transformer_blocks.16.proj_mlp",
"single_transformer_blocks.33.attn.to_q",
"single_transformer_blocks.36.proj_mlp",
"single_transformer_blocks.36.proj_out",
"transformer_blocks.11.attn.add_v_proj",
"single_transformer_blocks.26.attn.to_q",
"transformer_blocks.8.attn.to_add_out",
"transformer_blocks.18.attn.to_k",
"single_transformer_blocks.31.attn.to_q",
"single_transformer_blocks.2.attn.to_k",
"single_transformer_blocks.10.attn.to_k",
"transformer_blocks.4.ff_context.net.0.proj",
"single_transformer_blocks.37.proj_mlp",
"transformer_blocks.5.attn.add_q_proj",
"single_transformer_blocks.27.proj_out",
"transformer_blocks.5.ff.net.2",
"transformer_blocks.4.attn.add_v_proj",
"transformer_blocks.17.attn.to_out.0",
"transformer_blocks.0.ff_context.net.0.proj",
"transformer_blocks.11.ff.net.2",
"transformer_blocks.9.attn.to_out.0",
"single_transformer_blocks.32.proj_mlp",
"transformer_blocks.3.norm1.linear",
"single_transformer_blocks.2.attn.to_v",
"single_transformer_blocks.12.attn.to_v",
"single_transformer_blocks.21.proj_out",
"transformer_blocks.5.attn.add_v_proj",
"transformer_blocks.14.attn.to_add_out",
"single_transformer_blocks.7.attn.to_k",
"single_transformer_blocks.11.attn.to_v",
"single_transformer_blocks.4.attn.to_q",
"transformer_blocks.10.attn.to_k",
"single_transformer_blocks.17.attn.to_k",
"single_transformer_blocks.30.norm.linear",
"transformer_blocks.9.attn.add_q_proj",
"transformer_blocks.4.norm1.linear",
"single_transformer_blocks.37.attn.to_q",
"transformer_blocks.16.ff_context.net.0.proj",
"single_transformer_blocks.34.attn.to_q",
"transformer_blocks.13.attn.to_k",
"transformer_blocks.12.norm1_context.linear",
"single_transformer_blocks.29.attn.to_v",
"transformer_blocks.4.attn.to_out.0",
"single_transformer_blocks.35.proj_out",
"transformer_blocks.8.attn.to_q",
"single_transformer_blocks.23.norm.linear",
"transformer_blocks.0.ff_context.net.2",
"transformer_blocks.14.norm1_context.linear",
"transformer_blocks.16.ff.net.0.proj",
"transformer_blocks.10.attn.add_k_proj",
"single_transformer_blocks.34.norm.linear",
"transformer_blocks.17.attn.to_q",
"single_transformer_blocks.6.norm.linear",
"transformer_blocks.1.ff.net.0.proj",
"single_transformer_blocks.15.proj_mlp",
"transformer_blocks.16.ff_context.net.2",
"single_transformer_blocks.8.attn.to_q",
"transformer_blocks.3.attn.to_add_out",
"single_transformer_blocks.20.proj_out",
"single_transformer_blocks.26.attn.to_v",
"single_transformer_blocks.32.attn.to_k",
"single_transformer_blocks.16.attn.to_v",
"single_transformer_blocks.17.norm.linear",
"transformer_blocks.13.norm1_context.linear",
"transformer_blocks.3.ff.net.2",
"transformer_blocks.10.ff_context.net.0.proj",
"transformer_blocks.18.attn.add_k_proj",
"transformer_blocks.0.attn.to_v",
"single_transformer_blocks.12.attn.to_k",
"single_transformer_blocks.19.proj_mlp",
"transformer_blocks.7.ff_context.net.2",
"single_transformer_blocks.25.norm.linear",
"transformer_blocks.0.norm1_context.linear",
"transformer_blocks.9.attn.to_v",
"transformer_blocks.7.ff.net.0.proj",
"single_transformer_blocks.30.proj_out",
"single_transformer_blocks.7.norm.linear",
"single_transformer_blocks.31.attn.to_v",
"transformer_blocks.11.norm1_context.linear",
"transformer_blocks.15.ff.net.2",
"single_transformer_blocks.28.norm.linear",
"transformer_blocks.17.ff_context.net.2",
"single_transformer_blocks.2.proj_out",
"single_transformer_blocks.2.norm.linear",
"transformer_blocks.14.attn.to_v",
"single_transformer_blocks.19.attn.to_k",
"single_transformer_blocks.18.attn.to_k",
"transformer_blocks.8.attn.to_k",
"transformer_blocks.8.norm1_context.linear",
"transformer_blocks.11.attn.add_k_proj",
"single_transformer_blocks.1.proj_out",
"single_transformer_blocks.25.attn.to_k",
"transformer_blocks.8.attn.to_v",
"single_transformer_blocks.28.proj_mlp",
"transformer_blocks.4.ff.net.0.proj",
"transformer_blocks.15.attn.to_v",
"single_transformer_blocks.24.proj_mlp",
"transformer_blocks.5.attn.to_q",
"transformer_blocks.10.ff_context.net.2",
"single_transformer_blocks.9.proj_mlp",
"single_transformer_blocks.25.attn.to_q",
"single_transformer_blocks.16.norm.linear",
"single_transformer_blocks.26.proj_mlp",
"transformer_blocks.14.attn.to_k",
"single_transformer_blocks.6.attn.to_v",
"single_transformer_blocks.24.norm.linear",
"transformer_blocks.2.attn.add_k_proj",
"single_transformer_blocks.17.attn.to_v",
"transformer_blocks.10.attn.to_v",
"single_transformer_blocks.1.attn.to_q",
"transformer_blocks.18.ff.net.2",
"single_transformer_blocks.4.norm.linear",
"transformer_blocks.9.norm1_context.linear",
"single_transformer_blocks.8.proj_out",
"transformer_blocks.1.norm1.linear",
"single_transformer_blocks.18.attn.to_v",
"single_transformer_blocks.20.proj_mlp",
"transformer_blocks.14.norm1.linear",
"single_transformer_blocks.30.attn.to_v",
"transformer_blocks.16.attn.to_q",
"transformer_blocks.17.ff_context.net.0.proj",
"transformer_blocks.6.attn.add_k_proj",
"transformer_blocks.15.norm1_context.linear",
"transformer_blocks.3.attn.to_q",
"transformer_blocks.13.ff.net.2",
"single_transformer_blocks.26.norm.linear",
"single_transformer_blocks.31.proj_mlp",
"single_transformer_blocks.25.proj_out",
"transformer_blocks.1.attn.add_k_proj",
"single_transformer_blocks.5.proj_mlp",
"single_transformer_blocks.29.attn.to_q",
"single_transformer_blocks.13.proj_out",
"transformer_blocks.0.ff.net.2",
"transformer_blocks.2.attn.to_out.0",
"transformer_blocks.6.attn.to_k",
"transformer_blocks.6.attn.add_q_proj",
"single_transformer_blocks.33.proj_mlp",
"transformer_blocks.5.norm1_context.linear",
"transformer_blocks.7.attn.add_k_proj",
"single_transformer_blocks.0.norm.linear",
"transformer_blocks.9.ff_context.net.2",
"transformer_blocks.16.attn.to_out.0",
"single_transformer_blocks.35.attn.to_v",
"single_transformer_blocks.0.attn.to_k",
"single_transformer_blocks.21.attn.to_k",
"transformer_blocks.3.attn.add_k_proj",
"transformer_blocks.7.norm1_context.linear",
"transformer_blocks.18.norm1.linear",
"single_transformer_blocks.22.proj_out",
"single_transformer_blocks.23.proj_mlp",
"single_transformer_blocks.11.attn.to_q",
"single_transformer_blocks.22.norm.linear",
"transformer_blocks.1.attn.to_q",
"transformer_blocks.0.attn.add_k_proj",
"transformer_blocks.13.ff_context.net.0.proj",
"single_transformer_blocks.27.proj_mlp",
"single_transformer_blocks.3.norm.linear",
"transformer_blocks.8.attn.add_q_proj",
"single_transformer_blocks.8.attn.to_k",
"single_transformer_blocks.11.proj_out",
"transformer_blocks.14.ff.net.0.proj",
"transformer_blocks.17.attn.to_k",
"transformer_blocks.6.attn.to_q",
"single_transformer_blocks.24.proj_out",
"transformer_blocks.11.attn.to_v",
"transformer_blocks.2.attn.to_add_out",
"single_transformer_blocks.8.norm.linear",
"transformer_blocks.3.attn.add_v_proj",
"transformer_blocks.12.attn.add_v_proj",
"single_transformer_blocks.12.proj_mlp",
"transformer_blocks.3.ff_context.net.2",
"single_transformer_blocks.30.proj_mlp",
"transformer_blocks.6.attn.add_v_proj",
"transformer_blocks.17.attn.add_k_proj",
"transformer_blocks.15.attn.add_k_proj",
"transformer_blocks.11.norm1.linear",
"transformer_blocks.1.attn.add_v_proj",
"single_transformer_blocks.7.attn.to_v",
"single_transformer_blocks.1.attn.to_v",
"single_transformer_blocks.7.proj_out",
"transformer_blocks.5.ff.net.0.proj",
"single_transformer_blocks.21.proj_mlp",
"single_transformer_blocks.18.attn.to_q",
"single_transformer_blocks.7.attn.to_q",
"transformer_blocks.12.ff_context.net.0.proj",
"single_transformer_blocks.19.attn.to_v",
"transformer_blocks.5.attn.to_out.0",
"single_transformer_blocks.18.norm.linear",
"transformer_blocks.10.norm1.linear",
"single_transformer_blocks.11.norm.linear",
"single_transformer_blocks.32.proj_out",
"single_transformer_blocks.6.proj_out",
"transformer_blocks.12.attn.to_k",
"single_transformer_blocks.2.attn.to_q",
"transformer_blocks.3.norm1_context.linear",
"transformer_blocks.8.attn.to_out.0",
"transformer_blocks.13.attn.to_out.0",
"transformer_blocks.16.norm1.linear",
"single_transformer_blocks.35.attn.to_q",
"transformer_blocks.7.attn.to_out.0",
"transformer_blocks.4.attn.to_q",
"transformer_blocks.12.attn.add_q_proj",
"transformer_blocks.2.attn.add_q_proj",
"transformer_blocks.14.ff_context.net.2",
"transformer_blocks.14.attn.to_q",
"single_transformer_blocks.13.attn.to_q",
"transformer_blocks.18.attn.add_v_proj",
"transformer_blocks.2.attn.add_v_proj",
"transformer_blocks.11.attn.add_q_proj",
"single_transformer_blocks.13.attn.to_v",
"single_transformer_blocks.19.norm.linear",
"single_transformer_blocks.31.norm.linear",
"single_transformer_blocks.25.attn.to_v",
"transformer_blocks.18.ff_context.net.0.proj",
"single_transformer_blocks.10.attn.to_q",
"single_transformer_blocks.13.proj_mlp",
"transformer_blocks.7.attn.add_v_proj",
"single_transformer_blocks.7.proj_mlp",
"single_transformer_blocks.3.attn.to_q",
"single_transformer_blocks.22.attn.to_k",
"single_transformer_blocks.21.attn.to_q",
"transformer_blocks.18.attn.to_v",
"transformer_blocks.13.norm1.linear",
"single_transformer_blocks.10.norm.linear",
"transformer_blocks.14.attn.add_k_proj",
"transformer_blocks.16.attn.add_q_proj",
"transformer_blocks.15.ff_context.net.2",
"transformer_blocks.10.ff.net.0.proj",
"single_transformer_blocks.8.attn.to_v",
"single_transformer_blocks.29.proj_mlp",
"transformer_blocks.1.attn.to_k",
"transformer_blocks.9.ff.net.0.proj",
"single_transformer_blocks.34.attn.to_k",
"transformer_blocks.16.attn.add_k_proj",
"transformer_blocks.16.norm1_context.linear",
"transformer_blocks.18.ff.net.0.proj",
"transformer_blocks.6.attn.to_out.0",
"transformer_blocks.11.attn.to_add_out",
"transformer_blocks.1.ff_context.net.2",
"single_transformer_blocks.20.norm.linear",
"transformer_blocks.8.ff.net.2",
"single_transformer_blocks.19.proj_out",
"transformer_blocks.4.attn.add_k_proj",
"transformer_blocks.5.attn.to_v",
"transformer_blocks.10.attn.to_q",
"transformer_blocks.11.ff.net.0.proj",
"transformer_blocks.4.attn.to_k",
"transformer_blocks.1.norm1_context.linear",
"single_transformer_blocks.31.proj_out",
"single_transformer_blocks.11.attn.to_k",
"single_transformer_blocks.23.attn.to_v",
"single_transformer_blocks.3.proj_mlp",
"single_transformer_blocks.4.attn.to_k",
"transformer_blocks.15.ff.net.0.proj",
"single_transformer_blocks.27.attn.to_k",
"transformer_blocks.3.attn.add_q_proj",
"transformer_blocks.3.attn.to_out.0",
"transformer_blocks.0.attn.to_add_out",
"transformer_blocks.9.attn.to_add_out",
"transformer_blocks.2.ff_context.net.2",
"transformer_blocks.0.attn.to_k",
"transformer_blocks.11.attn.to_k",
"transformer_blocks.6.attn.to_v",
"transformer_blocks.3.ff_context.net.0.proj",
"single_transformer_blocks.14.proj_out",
"transformer_blocks.17.norm1_context.linear",
"single_transformer_blocks.12.proj_out",
"transformer_blocks.18.attn.to_out.0",
"single_transformer_blocks.10.proj_mlp",
"transformer_blocks.0.norm1.linear",
"single_transformer_blocks.22.attn.to_q",
"transformer_blocks.16.attn.to_v",
"transformer_blocks.7.attn.to_k",
"single_transformer_blocks.34.proj_mlp",
"transformer_blocks.9.ff.net.2",
"transformer_blocks.14.attn.add_q_proj",
"single_transformer_blocks.20.attn.to_q",
"transformer_blocks.1.attn.to_out.0",
"single_transformer_blocks.9.norm.linear",
"single_transformer_blocks.34.proj_out",
"single_transformer_blocks.9.proj_out",
"transformer_blocks.5.attn.to_k",
"transformer_blocks.17.ff.net.0.proj",
"transformer_blocks.6.norm1_context.linear",
"single_transformer_blocks.36.attn.to_k",
"single_transformer_blocks.35.proj_mlp",
"single_transformer_blocks.15.attn.to_k",
"single_transformer_blocks.18.proj_out"
],
"use_dora": false,
"lora_bias": false
} After{
"r": 10,
"lora_alpha": 10,
"rank_pattern": {
"transformer_blocks.0.attn.to_out.0": 10,
"transformer_blocks.0.ff.net.0.proj": 18,
"transformer_blocks.0.ff.net.2": 11,
"transformer_blocks.0.norm1.linear": 12,
"transformer_blocks.0.attn.to_add_out": 3,
"transformer_blocks.0.ff_context.net.0.proj": 2,
"transformer_blocks.0.norm1_context.linear": 2,
"transformer_blocks.1.attn.to_out.0": 10,
"transformer_blocks.1.ff.net.0.proj": 29,
"transformer_blocks.1.ff.net.2": 9,
"transformer_blocks.1.norm1.linear": 20,
"transformer_blocks.1.attn.to_add_out": 3,
"transformer_blocks.1.attn.add_q_proj": 3,
"transformer_blocks.1.attn.add_k_proj": 3,
"transformer_blocks.1.attn.add_v_proj": 3,
"transformer_blocks.1.ff_context.net.0.proj": 3,
"transformer_blocks.1.ff_context.net.2": 2,
"transformer_blocks.1.norm1_context.linear": 2,
"transformer_blocks.2.attn.to_out.0": 13,
"transformer_blocks.2.ff.net.0.proj": 26,
"transformer_blocks.2.ff.net.2": 8,
"transformer_blocks.2.norm1.linear": 20,
"transformer_blocks.2.attn.to_add_out": 5,
"transformer_blocks.3.attn.to_out.0": 13,
"transformer_blocks.3.ff.net.0.proj": 11,
"transformer_blocks.3.ff.net.2": 5,
"transformer_blocks.3.norm1.linear": 10,
"transformer_blocks.3.attn.to_add_out": 5,
"transformer_blocks.3.attn.add_q_proj": 7,
"transformer_blocks.3.attn.add_k_proj": 7,
"transformer_blocks.3.attn.add_v_proj": 7,
"transformer_blocks.3.ff_context.net.0.proj": 3,
"transformer_blocks.3.ff_context.net.2": 5,
"transformer_blocks.3.norm1_context.linear": 5,
"transformer_blocks.4.attn.to_out.0": 8,
"transformer_blocks.4.ff.net.0.proj": 8,
"transformer_blocks.4.ff.net.2": 6,
"transformer_blocks.4.norm1.linear": 8,
"transformer_blocks.4.attn.to_add_out": 5,
"transformer_blocks.4.attn.add_q_proj": 6,
"transformer_blocks.4.attn.add_k_proj": 6,
"transformer_blocks.4.attn.add_v_proj": 6,
"transformer_blocks.4.ff_context.net.0.proj": 3,
"transformer_blocks.4.norm1_context.linear": 5,
"transformer_blocks.5.attn.to_out.0": 9,
"transformer_blocks.5.ff.net.0.proj": 10,
"transformer_blocks.5.ff.net.2": 7,
"transformer_blocks.5.norm1.linear": 8,
"transformer_blocks.5.attn.add_q_proj": 3,
"transformer_blocks.5.attn.add_k_proj": 3,
"transformer_blocks.5.attn.add_v_proj": 3,
"transformer_blocks.5.ff_context.net.2": 2,
"transformer_blocks.5.norm1_context.linear": 3,
"transformer_blocks.6.attn.to_out.0": 11,
"transformer_blocks.6.ff.net.0.proj": 11,
"transformer_blocks.6.ff.net.2": 5,
"transformer_blocks.6.norm1.linear": 8,
"transformer_blocks.6.attn.to_add_out": 5,
"transformer_blocks.6.attn.add_q_proj": 3,
"transformer_blocks.6.attn.add_k_proj": 3,
"transformer_blocks.6.attn.add_v_proj": 3,
"transformer_blocks.6.ff_context.net.0.proj": 5,
"transformer_blocks.6.norm1_context.linear": 2,
"transformer_blocks.7.attn.to_out.0": 12,
"transformer_blocks.7.ff.net.0.proj": 12,
"transformer_blocks.7.ff.net.2": 5,
"transformer_blocks.7.norm1.linear": 8,
"transformer_blocks.7.attn.add_q_proj": 5,
"transformer_blocks.7.attn.add_k_proj": 5,
"transformer_blocks.7.attn.add_v_proj": 5,
"transformer_blocks.7.ff_context.net.0.proj": 3,
"transformer_blocks.7.norm1_context.linear": 6,
"transformer_blocks.8.attn.to_out.0": 13,
"transformer_blocks.8.ff.net.0.proj": 9,
"transformer_blocks.8.ff.net.2": 6,
"transformer_blocks.8.norm1.linear": 7,
"transformer_blocks.8.attn.to_add_out": 7,
"transformer_blocks.8.ff_context.net.2": 6,
"transformer_blocks.8.norm1_context.linear": 3,
"transformer_blocks.9.attn.to_out.0": 7,
"transformer_blocks.9.ff.net.0.proj": 12,
"transformer_blocks.9.ff.net.2": 6,
"transformer_blocks.9.norm1.linear": 8,
"transformer_blocks.9.attn.to_add_out": 7,
"transformer_blocks.9.attn.add_q_proj": 6,
"transformer_blocks.9.attn.add_k_proj": 6,
"transformer_blocks.9.attn.add_v_proj": 6,
"transformer_blocks.9.ff_context.net.0.proj": 3,
"transformer_blocks.9.ff_context.net.2": 5,
"transformer_blocks.10.attn.to_out.0": 6,
"transformer_blocks.10.ff.net.0.proj": 8,
"transformer_blocks.10.ff.net.2": 5,
"transformer_blocks.10.norm1.linear": 7,
"transformer_blocks.10.attn.to_add_out": 6,
"transformer_blocks.10.attn.add_q_proj": 7,
"transformer_blocks.10.attn.add_k_proj": 7,
"transformer_blocks.10.attn.add_v_proj": 7,
"transformer_blocks.10.ff_context.net.0.proj": 3,
"transformer_blocks.10.norm1_context.linear": 7,
"transformer_blocks.11.attn.to_out.0": 9,
"transformer_blocks.11.ff.net.0.proj": 10,
"transformer_blocks.11.ff.net.2": 6,
"transformer_blocks.11.norm1.linear": 7,
"transformer_blocks.11.attn.to_add_out": 3,
"transformer_blocks.11.attn.add_q_proj": 6,
"transformer_blocks.11.attn.add_k_proj": 6,
"transformer_blocks.11.attn.add_v_proj": 6,
"transformer_blocks.11.ff_context.net.0.proj": 3,
"transformer_blocks.11.ff_context.net.2": 5,
"transformer_blocks.12.attn.to_out.0": 11,
"transformer_blocks.12.ff.net.0.proj": 12,
"transformer_blocks.12.ff.net.2": 6,
"transformer_blocks.12.norm1.linear": 7,
"transformer_blocks.12.attn.to_add_out": 5,
"transformer_blocks.12.attn.add_q_proj": 5,
"transformer_blocks.12.attn.add_k_proj": 5,
"transformer_blocks.12.attn.add_v_proj": 5,
"transformer_blocks.12.ff_context.net.0.proj": 3,
"transformer_blocks.12.norm1_context.linear": 5,
"transformer_blocks.13.attn.to_out.0": 8,
"transformer_blocks.13.ff.net.0.proj": 9,
"transformer_blocks.13.norm1.linear": 5,
"transformer_blocks.13.attn.to_add_out": 5,
"transformer_blocks.13.ff_context.net.0.proj": 2,
"transformer_blocks.13.ff_context.net.2": 3,
"transformer_blocks.13.norm1_context.linear": 6,
"transformer_blocks.14.attn.to_out.0": 11,
"transformer_blocks.14.ff.net.0.proj": 12,
"transformer_blocks.14.ff.net.2": 6,
"transformer_blocks.14.norm1.linear": 5,
"transformer_blocks.14.attn.to_add_out": 3,
"transformer_blocks.14.attn.add_q_proj": 5,
"transformer_blocks.14.attn.add_k_proj": 5,
"transformer_blocks.14.attn.add_v_proj": 5,
"transformer_blocks.14.ff_context.net.0.proj": 3,
"transformer_blocks.14.norm1_context.linear": 3,
"transformer_blocks.15.attn.to_out.0": 9,
"transformer_blocks.15.ff.net.0.proj": 12,
"transformer_blocks.15.ff.net.2": 7,
"transformer_blocks.15.norm1.linear": 6,
"transformer_blocks.15.attn.add_q_proj": 3,
"transformer_blocks.15.attn.add_k_proj": 3,
"transformer_blocks.15.attn.add_v_proj": 3,
"transformer_blocks.16.attn.to_out.0": 10,
"transformer_blocks.16.ff.net.0.proj": 12,
"transformer_blocks.16.ff.net.2": 7,
"transformer_blocks.16.norm1.linear": 6,
"transformer_blocks.16.attn.add_q_proj": 6,
"transformer_blocks.16.attn.add_k_proj": 6,
"transformer_blocks.16.attn.add_v_proj": 6,
"transformer_blocks.16.ff_context.net.2": 5,
"transformer_blocks.16.norm1_context.linear": 3,
"transformer_blocks.17.attn.to_out.0": 8,
"transformer_blocks.17.ff.net.0.proj": 9,
"transformer_blocks.17.ff.net.2": 9,
"transformer_blocks.17.norm1.linear": 8,
"transformer_blocks.17.attn.to_add_out": 5,
"transformer_blocks.17.attn.add_q_proj": 5,
"transformer_blocks.17.attn.add_k_proj": 5,
"transformer_blocks.17.attn.add_v_proj": 5,
"transformer_blocks.17.ff_context.net.0.proj": 2,
"transformer_blocks.17.norm1_context.linear": 2,
"transformer_blocks.18.attn.to_out.0": 5,
"transformer_blocks.18.ff.net.0.proj": 11,
"transformer_blocks.18.ff.net.2": 8,
"transformer_blocks.18.norm1.linear": 6,
"transformer_blocks.18.ff_context.net.0.proj": 3,
"transformer_blocks.18.norm1_context.linear": 7,
"single_transformer_blocks.0.attn.to_q": 18,
"single_transformer_blocks.0.attn.to_k": 18,
"single_transformer_blocks.0.attn.to_v": 18,
"single_transformer_blocks.0.proj_mlp": 18,
"single_transformer_blocks.0.proj_out": 8,
"single_transformer_blocks.0.norm.linear": 7,
"single_transformer_blocks.1.attn.to_q": 22,
"single_transformer_blocks.1.attn.to_k": 22,
"single_transformer_blocks.1.attn.to_v": 22,
"single_transformer_blocks.1.proj_mlp": 22,
"single_transformer_blocks.1.proj_out": 7,
"single_transformer_blocks.1.norm.linear": 7,
"single_transformer_blocks.2.attn.to_q": 18,
"single_transformer_blocks.2.attn.to_k": 18,
"single_transformer_blocks.2.attn.to_v": 18,
"single_transformer_blocks.2.proj_mlp": 18,
"single_transformer_blocks.2.proj_out": 7,
"single_transformer_blocks.2.norm.linear": 7,
"single_transformer_blocks.3.attn.to_q": 21,
"single_transformer_blocks.3.attn.to_k": 21,
"single_transformer_blocks.3.attn.to_v": 21,
"single_transformer_blocks.3.proj_mlp": 21,
"single_transformer_blocks.3.proj_out": 7,
"single_transformer_blocks.3.norm.linear": 8,
"single_transformer_blocks.4.attn.to_q": 18,
"single_transformer_blocks.4.attn.to_k": 18,
"single_transformer_blocks.4.attn.to_v": 18,
"single_transformer_blocks.4.proj_mlp": 18,
"single_transformer_blocks.4.proj_out": 8,
"single_transformer_blocks.4.norm.linear": 8,
"single_transformer_blocks.5.attn.to_q": 20,
"single_transformer_blocks.5.attn.to_k": 20,
"single_transformer_blocks.5.attn.to_v": 20,
"single_transformer_blocks.5.proj_mlp": 20,
"single_transformer_blocks.5.proj_out": 8,
"single_transformer_blocks.5.norm.linear": 9,
"single_transformer_blocks.6.attn.to_q": 20,
"single_transformer_blocks.6.attn.to_k": 20,
"single_transformer_blocks.6.attn.to_v": 20,
"single_transformer_blocks.6.proj_mlp": 20,
"single_transformer_blocks.6.proj_out": 6,
"single_transformer_blocks.6.norm.linear": 10,
"single_transformer_blocks.7.attn.to_q": 21,
"single_transformer_blocks.7.attn.to_k": 21,
"single_transformer_blocks.7.attn.to_v": 21,
"single_transformer_blocks.7.proj_mlp": 21,
"single_transformer_blocks.7.proj_out": 8,
"single_transformer_blocks.7.norm.linear": 10,
"single_transformer_blocks.8.attn.to_q": 19,
"single_transformer_blocks.8.attn.to_k": 19,
"single_transformer_blocks.8.attn.to_v": 19,
"single_transformer_blocks.8.proj_mlp": 19,
"single_transformer_blocks.8.proj_out": 7,
"single_transformer_blocks.8.norm.linear": 7,
"single_transformer_blocks.9.attn.to_q": 22,
"single_transformer_blocks.9.attn.to_k": 22,
"single_transformer_blocks.9.attn.to_v": 22,
"single_transformer_blocks.9.proj_mlp": 22,
"single_transformer_blocks.9.proj_out": 9,
"single_transformer_blocks.9.norm.linear": 8,
"single_transformer_blocks.10.attn.to_q": 22,
"single_transformer_blocks.10.attn.to_k": 22,
"single_transformer_blocks.10.attn.to_v": 22,
"single_transformer_blocks.10.proj_mlp": 22,
"single_transformer_blocks.10.proj_out": 11,
"single_transformer_blocks.10.norm.linear": 11,
"single_transformer_blocks.11.attn.to_q": 22,
"single_transformer_blocks.11.attn.to_k": 22,
"single_transformer_blocks.11.attn.to_v": 22,
"single_transformer_blocks.11.proj_mlp": 22,
"single_transformer_blocks.11.proj_out": 10,
"single_transformer_blocks.11.norm.linear": 10,
"single_transformer_blocks.12.attn.to_q": 23,
"single_transformer_blocks.12.attn.to_k": 23,
"single_transformer_blocks.12.attn.to_v": 23,
"single_transformer_blocks.12.proj_mlp": 23,
"single_transformer_blocks.12.proj_out": 10,
"single_transformer_blocks.12.norm.linear": 8,
"single_transformer_blocks.13.attn.to_q": 22,
"single_transformer_blocks.13.attn.to_k": 22,
"single_transformer_blocks.13.attn.to_v": 22,
"single_transformer_blocks.13.proj_mlp": 22,
"single_transformer_blocks.13.proj_out": 10,
"single_transformer_blocks.13.norm.linear": 12,
"single_transformer_blocks.14.attn.to_q": 24,
"single_transformer_blocks.14.attn.to_k": 24,
"single_transformer_blocks.14.attn.to_v": 24,
"single_transformer_blocks.14.proj_mlp": 24,
"single_transformer_blocks.14.proj_out": 11,
"single_transformer_blocks.14.norm.linear": 11,
"single_transformer_blocks.15.attn.to_q": 25,
"single_transformer_blocks.15.attn.to_k": 25,
"single_transformer_blocks.15.attn.to_v": 25,
"single_transformer_blocks.15.proj_mlp": 25,
"single_transformer_blocks.15.proj_out": 10,
"single_transformer_blocks.15.norm.linear": 11,
"single_transformer_blocks.16.attn.to_q": 26,
"single_transformer_blocks.16.attn.to_k": 26,
"single_transformer_blocks.16.attn.to_v": 26,
"single_transformer_blocks.16.proj_mlp": 26,
"single_transformer_blocks.16.proj_out": 10,
"single_transformer_blocks.16.norm.linear": 9,
"single_transformer_blocks.17.attn.to_q": 28,
"single_transformer_blocks.17.attn.to_k": 28,
"single_transformer_blocks.17.attn.to_v": 28,
"single_transformer_blocks.17.proj_mlp": 28,
"single_transformer_blocks.17.proj_out": 10,
"single_transformer_blocks.17.norm.linear": 9,
"single_transformer_blocks.18.attn.to_q": 28,
"single_transformer_blocks.18.attn.to_k": 28,
"single_transformer_blocks.18.attn.to_v": 28,
"single_transformer_blocks.18.proj_mlp": 28,
"single_transformer_blocks.18.proj_out": 8,
"single_transformer_blocks.18.norm.linear": 11,
"single_transformer_blocks.19.attn.to_q": 27,
"single_transformer_blocks.19.attn.to_k": 27,
"single_transformer_blocks.19.attn.to_v": 27,
"single_transformer_blocks.19.proj_mlp": 27,
"single_transformer_blocks.19.proj_out": 10,
"single_transformer_blocks.19.norm.linear": 10,
"single_transformer_blocks.20.attn.to_q": 29,
"single_transformer_blocks.20.attn.to_k": 29,
"single_transformer_blocks.20.attn.to_v": 29,
"single_transformer_blocks.20.proj_mlp": 29,
"single_transformer_blocks.20.proj_out": 11,
"single_transformer_blocks.20.norm.linear": 12,
"single_transformer_blocks.21.attn.to_q": 26,
"single_transformer_blocks.21.attn.to_k": 26,
"single_transformer_blocks.21.attn.to_v": 26,
"single_transformer_blocks.21.proj_mlp": 26,
"single_transformer_blocks.21.proj_out": 11,
"single_transformer_blocks.21.norm.linear": 11,
"single_transformer_blocks.22.attn.to_q": 26,
"single_transformer_blocks.22.attn.to_k": 26,
"single_transformer_blocks.22.attn.to_v": 26,
"single_transformer_blocks.22.proj_mlp": 26,
"single_transformer_blocks.22.proj_out": 11,
"single_transformer_blocks.22.norm.linear": 10,
"single_transformer_blocks.23.attn.to_q": 30,
"single_transformer_blocks.23.attn.to_k": 30,
"single_transformer_blocks.23.attn.to_v": 30,
"single_transformer_blocks.23.proj_mlp": 30,
"single_transformer_blocks.23.proj_out": 11,
"single_transformer_blocks.23.norm.linear": 11,
"single_transformer_blocks.24.attn.to_q": 28,
"single_transformer_blocks.24.attn.to_k": 28,
"single_transformer_blocks.24.attn.to_v": 28,
"single_transformer_blocks.24.proj_mlp": 28,
"single_transformer_blocks.24.proj_out": 12,
"single_transformer_blocks.24.norm.linear": 11,
"single_transformer_blocks.25.attn.to_q": 28,
"single_transformer_blocks.25.attn.to_k": 28,
"single_transformer_blocks.25.attn.to_v": 28,
"single_transformer_blocks.25.proj_mlp": 28,
"single_transformer_blocks.25.proj_out": 14,
"single_transformer_blocks.25.norm.linear": 9,
"single_transformer_blocks.26.attn.to_q": 28,
"single_transformer_blocks.26.attn.to_k": 28,
"single_transformer_blocks.26.attn.to_v": 28,
"single_transformer_blocks.26.proj_mlp": 28,
"single_transformer_blocks.26.proj_out": 13,
"single_transformer_blocks.26.norm.linear": 9,
"single_transformer_blocks.27.attn.to_q": 29,
"single_transformer_blocks.27.attn.to_k": 29,
"single_transformer_blocks.27.attn.to_v": 29,
"single_transformer_blocks.27.proj_mlp": 29,
"single_transformer_blocks.27.proj_out": 14,
"single_transformer_blocks.27.norm.linear": 8,
"single_transformer_blocks.28.attn.to_q": 29,
"single_transformer_blocks.28.attn.to_k": 29,
"single_transformer_blocks.28.attn.to_v": 29,
"single_transformer_blocks.28.proj_mlp": 29,
"single_transformer_blocks.28.proj_out": 14,
"single_transformer_blocks.28.norm.linear": 12,
"single_transformer_blocks.29.attn.to_q": 30,
"single_transformer_blocks.29.attn.to_k": 30,
"single_transformer_blocks.29.attn.to_v": 30,
"single_transformer_blocks.29.proj_mlp": 30,
"single_transformer_blocks.29.proj_out": 14,
"single_transformer_blocks.29.norm.linear": 10,
"single_transformer_blocks.30.attn.to_q": 32,
"single_transformer_blocks.30.attn.to_k": 32,
"single_transformer_blocks.30.attn.to_v": 32,
"single_transformer_blocks.30.proj_mlp": 32,
"single_transformer_blocks.30.proj_out": 15,
"single_transformer_blocks.30.norm.linear": 9,
"single_transformer_blocks.31.attn.to_q": 30,
"single_transformer_blocks.31.attn.to_k": 30,
"single_transformer_blocks.31.attn.to_v": 30,
"single_transformer_blocks.31.proj_mlp": 30,
"single_transformer_blocks.31.proj_out": 16,
"single_transformer_blocks.31.norm.linear": 11,
"single_transformer_blocks.32.attn.to_q": 30,
"single_transformer_blocks.32.attn.to_k": 30,
"single_transformer_blocks.32.attn.to_v": 30,
"single_transformer_blocks.32.proj_mlp": 30,
"single_transformer_blocks.32.proj_out": 15,
"single_transformer_blocks.32.norm.linear": 9,
"single_transformer_blocks.33.attn.to_q": 32,
"single_transformer_blocks.33.attn.to_k": 32,
"single_transformer_blocks.33.attn.to_v": 32,
"single_transformer_blocks.33.proj_mlp": 32,
"single_transformer_blocks.33.proj_out": 16,
"single_transformer_blocks.33.norm.linear": 8,
"single_transformer_blocks.34.attn.to_q": 30,
"single_transformer_blocks.34.attn.to_k": 30,
"single_transformer_blocks.34.attn.to_v": 30,
"single_transformer_blocks.34.proj_mlp": 30,
"single_transformer_blocks.34.proj_out": 16,
"single_transformer_blocks.34.norm.linear": 7,
"single_transformer_blocks.35.attn.to_q": 31,
"single_transformer_blocks.35.attn.to_k": 31,
"single_transformer_blocks.35.attn.to_v": 31,
"single_transformer_blocks.35.proj_mlp": 31,
"single_transformer_blocks.35.proj_out": 14,
"single_transformer_blocks.35.norm.linear": 8,
"single_transformer_blocks.36.attn.to_q": 30,
"single_transformer_blocks.36.attn.to_k": 30,
"single_transformer_blocks.36.attn.to_v": 30,
"single_transformer_blocks.36.proj_mlp": 30,
"single_transformer_blocks.36.proj_out": 12,
"single_transformer_blocks.36.norm.linear": 9,
"single_transformer_blocks.37.attn.to_q": 17,
"single_transformer_blocks.37.attn.to_k": 17,
"single_transformer_blocks.37.attn.to_v": 17,
"single_transformer_blocks.37.proj_mlp": 17,
"single_transformer_blocks.37.proj_out": 7,
"single_transformer_blocks.37.norm.linear": 9,
"transformer_blocks.8.attn.add_v_proj": 4,
"transformer_blocks.18.attn.to_add_out": 4,
"transformer_blocks.6.ff_context.net.2": 4,
"transformer_blocks.8.ff_context.net.0.proj": 4,
"transformer_blocks.13.attn.add_k_proj": 4,
"transformer_blocks.8.attn.add_k_proj": 4,
"transformer_blocks.5.ff_context.net.0.proj": 4,
"transformer_blocks.12.ff_context.net.2": 4,
"transformer_blocks.2.ff_context.net.0.proj": 4,
"transformer_blocks.2.norm1_context.linear": 4,
"transformer_blocks.13.attn.add_q_proj": 4,
"transformer_blocks.15.ff_context.net.0.proj": 4,
"transformer_blocks.18.ff_context.net.2": 4,
"transformer_blocks.5.attn.to_add_out": 4,
"transformer_blocks.7.attn.to_add_out": 4,
"transformer_blocks.13.attn.add_v_proj": 4,
"transformer_blocks.0.attn.add_v_proj": 4,
"transformer_blocks.16.attn.to_add_out": 4,
"transformer_blocks.4.ff_context.net.2": 4,
"transformer_blocks.15.attn.to_add_out": 4,
"transformer_blocks.18.attn.add_q_proj": 4,
"transformer_blocks.0.attn.add_q_proj": 4,
"transformer_blocks.16.ff_context.net.0.proj": 4,
"transformer_blocks.0.ff_context.net.2": 4,
"transformer_blocks.18.attn.add_k_proj": 4,
"transformer_blocks.7.ff_context.net.2": 4,
"transformer_blocks.11.norm1_context.linear": 4,
"transformer_blocks.17.ff_context.net.2": 4,
"transformer_blocks.10.ff_context.net.2": 4,
"transformer_blocks.2.attn.add_k_proj": 4,
"transformer_blocks.9.norm1_context.linear": 4,
"transformer_blocks.15.norm1_context.linear": 4,
"transformer_blocks.13.ff.net.2": 4,
"transformer_blocks.0.attn.add_k_proj": 4,
"transformer_blocks.8.attn.add_q_proj": 4,
"transformer_blocks.2.attn.add_q_proj": 4,
"transformer_blocks.14.ff_context.net.2": 4,
"transformer_blocks.18.attn.add_v_proj": 4,
"transformer_blocks.2.attn.add_v_proj": 4,
"transformer_blocks.15.ff_context.net.2": 4,
"transformer_blocks.2.ff_context.net.2": 4
},
"alpha_pattern": {
"transformer_blocks.0.attn.to_out.0": 10,
"transformer_blocks.0.ff.net.0.proj": 18,
"transformer_blocks.0.ff.net.2": 11,
"transformer_blocks.0.norm1.linear": 12,
"transformer_blocks.0.attn.to_add_out": 3,
"transformer_blocks.0.ff_context.net.0.proj": 2,
"transformer_blocks.0.norm1_context.linear": 2,
"transformer_blocks.1.attn.to_out.0": 10,
"transformer_blocks.1.ff.net.0.proj": 29,
"transformer_blocks.1.ff.net.2": 9,
"transformer_blocks.1.norm1.linear": 20,
"transformer_blocks.1.attn.to_add_out": 3,
"transformer_blocks.1.attn.add_q_proj": 3,
"transformer_blocks.1.attn.add_k_proj": 3,
"transformer_blocks.1.attn.add_v_proj": 3,
"transformer_blocks.1.ff_context.net.0.proj": 3,
"transformer_blocks.1.ff_context.net.2": 2,
"transformer_blocks.1.norm1_context.linear": 2,
"transformer_blocks.2.attn.to_out.0": 13,
"transformer_blocks.2.ff.net.0.proj": 26,
"transformer_blocks.2.ff.net.2": 8,
"transformer_blocks.2.norm1.linear": 20,
"transformer_blocks.2.attn.to_add_out": 5,
"transformer_blocks.3.attn.to_out.0": 13,
"transformer_blocks.3.ff.net.0.proj": 11,
"transformer_blocks.3.ff.net.2": 5,
"transformer_blocks.3.norm1.linear": 10,
"transformer_blocks.3.attn.to_add_out": 5,
"transformer_blocks.3.attn.add_q_proj": 7,
"transformer_blocks.3.attn.add_k_proj": 7,
"transformer_blocks.3.attn.add_v_proj": 7,
"transformer_blocks.3.ff_context.net.0.proj": 3,
"transformer_blocks.3.ff_context.net.2": 5,
"transformer_blocks.3.norm1_context.linear": 5,
"transformer_blocks.4.attn.to_out.0": 8,
"transformer_blocks.4.ff.net.0.proj": 8,
"transformer_blocks.4.ff.net.2": 6,
"transformer_blocks.4.norm1.linear": 8,
"transformer_blocks.4.attn.to_add_out": 5,
"transformer_blocks.4.attn.add_q_proj": 6,
"transformer_blocks.4.attn.add_k_proj": 6,
"transformer_blocks.4.attn.add_v_proj": 6,
"transformer_blocks.4.ff_context.net.0.proj": 3,
"transformer_blocks.4.norm1_context.linear": 5,
"transformer_blocks.5.attn.to_out.0": 9,
"transformer_blocks.5.ff.net.0.proj": 10,
"transformer_blocks.5.ff.net.2": 7,
"transformer_blocks.5.norm1.linear": 8,
"transformer_blocks.5.attn.add_q_proj": 3,
"transformer_blocks.5.attn.add_k_proj": 3,
"transformer_blocks.5.attn.add_v_proj": 3,
"transformer_blocks.5.ff_context.net.2": 2,
"transformer_blocks.5.norm1_context.linear": 3,
"transformer_blocks.6.attn.to_out.0": 11,
"transformer_blocks.6.ff.net.0.proj": 11,
"transformer_blocks.6.ff.net.2": 5,
"transformer_blocks.6.norm1.linear": 8,
"transformer_blocks.6.attn.to_add_out": 5,
"transformer_blocks.6.attn.add_q_proj": 3,
"transformer_blocks.6.attn.add_k_proj": 3,
"transformer_blocks.6.attn.add_v_proj": 3,
"transformer_blocks.6.ff_context.net.0.proj": 5,
"transformer_blocks.6.norm1_context.linear": 2,
"transformer_blocks.7.attn.to_out.0": 12,
"transformer_blocks.7.ff.net.0.proj": 12,
"transformer_blocks.7.ff.net.2": 5,
"transformer_blocks.7.norm1.linear": 8,
"transformer_blocks.7.attn.add_q_proj": 5,
"transformer_blocks.7.attn.add_k_proj": 5,
"transformer_blocks.7.attn.add_v_proj": 5,
"transformer_blocks.7.ff_context.net.0.proj": 3,
"transformer_blocks.7.norm1_context.linear": 6,
"transformer_blocks.8.attn.to_out.0": 13,
"transformer_blocks.8.ff.net.0.proj": 9,
"transformer_blocks.8.ff.net.2": 6,
"transformer_blocks.8.norm1.linear": 7,
"transformer_blocks.8.attn.to_add_out": 7,
"transformer_blocks.8.ff_context.net.2": 6,
"transformer_blocks.8.norm1_context.linear": 3,
"transformer_blocks.9.attn.to_out.0": 7,
"transformer_blocks.9.ff.net.0.proj": 12,
"transformer_blocks.9.ff.net.2": 6,
"transformer_blocks.9.norm1.linear": 8,
"transformer_blocks.9.attn.to_add_out": 7,
"transformer_blocks.9.attn.add_q_proj": 6,
"transformer_blocks.9.attn.add_k_proj": 6,
"transformer_blocks.9.attn.add_v_proj": 6,
"transformer_blocks.9.ff_context.net.0.proj": 3,
"transformer_blocks.9.ff_context.net.2": 5,
"transformer_blocks.10.attn.to_out.0": 6,
"transformer_blocks.10.ff.net.0.proj": 8,
"transformer_blocks.10.ff.net.2": 5,
"transformer_blocks.10.norm1.linear": 7,
"transformer_blocks.10.attn.to_add_out": 6,
"transformer_blocks.10.attn.add_q_proj": 7,
"transformer_blocks.10.attn.add_k_proj": 7,
"transformer_blocks.10.attn.add_v_proj": 7,
"transformer_blocks.10.ff_context.net.0.proj": 3,
"transformer_blocks.10.norm1_context.linear": 7,
"transformer_blocks.11.attn.to_out.0": 9,
"transformer_blocks.11.ff.net.0.proj": 10,
"transformer_blocks.11.ff.net.2": 6,
"transformer_blocks.11.norm1.linear": 7,
"transformer_blocks.11.attn.to_add_out": 3,
"transformer_blocks.11.attn.add_q_proj": 6,
"transformer_blocks.11.attn.add_k_proj": 6,
"transformer_blocks.11.attn.add_v_proj": 6,
"transformer_blocks.11.ff_context.net.0.proj": 3,
"transformer_blocks.11.ff_context.net.2": 5,
"transformer_blocks.12.attn.to_out.0": 11,
"transformer_blocks.12.ff.net.0.proj": 12,
"transformer_blocks.12.ff.net.2": 6,
"transformer_blocks.12.norm1.linear": 7,
"transformer_blocks.12.attn.to_add_out": 5,
"transformer_blocks.12.attn.add_q_proj": 5,
"transformer_blocks.12.attn.add_k_proj": 5,
"transformer_blocks.12.attn.add_v_proj": 5,
"transformer_blocks.12.ff_context.net.0.proj": 3,
"transformer_blocks.12.norm1_context.linear": 5,
"transformer_blocks.13.attn.to_out.0": 8,
"transformer_blocks.13.ff.net.0.proj": 9,
"transformer_blocks.13.norm1.linear": 5,
"transformer_blocks.13.attn.to_add_out": 5,
"transformer_blocks.13.ff_context.net.0.proj": 2,
"transformer_blocks.13.ff_context.net.2": 3,
"transformer_blocks.13.norm1_context.linear": 6,
"transformer_blocks.14.attn.to_out.0": 11,
"transformer_blocks.14.ff.net.0.proj": 12,
"transformer_blocks.14.ff.net.2": 6,
"transformer_blocks.14.norm1.linear": 5,
"transformer_blocks.14.attn.to_add_out": 3,
"transformer_blocks.14.attn.add_q_proj": 5,
"transformer_blocks.14.attn.add_k_proj": 5,
"transformer_blocks.14.attn.add_v_proj": 5,
"transformer_blocks.14.ff_context.net.0.proj": 3,
"transformer_blocks.14.norm1_context.linear": 3,
"transformer_blocks.15.attn.to_out.0": 9,
"transformer_blocks.15.ff.net.0.proj": 12,
"transformer_blocks.15.ff.net.2": 7,
"transformer_blocks.15.norm1.linear": 6,
"transformer_blocks.15.attn.add_q_proj": 3,
"transformer_blocks.15.attn.add_k_proj": 3,
"transformer_blocks.15.attn.add_v_proj": 3,
"transformer_blocks.16.attn.to_out.0": 10,
"transformer_blocks.16.ff.net.0.proj": 12,
"transformer_blocks.16.ff.net.2": 7,
"transformer_blocks.16.norm1.linear": 6,
"transformer_blocks.16.attn.add_q_proj": 6,
"transformer_blocks.16.attn.add_k_proj": 6,
"transformer_blocks.16.attn.add_v_proj": 6,
"transformer_blocks.16.ff_context.net.2": 5,
"transformer_blocks.16.norm1_context.linear": 3,
"transformer_blocks.17.attn.to_out.0": 8,
"transformer_blocks.17.ff.net.0.proj": 9,
"transformer_blocks.17.ff.net.2": 9,
"transformer_blocks.17.norm1.linear": 8,
"transformer_blocks.17.attn.to_add_out": 5,
"transformer_blocks.17.attn.add_q_proj": 5,
"transformer_blocks.17.attn.add_k_proj": 5,
"transformer_blocks.17.attn.add_v_proj": 5,
"transformer_blocks.17.ff_context.net.0.proj": 2,
"transformer_blocks.17.norm1_context.linear": 2,
"transformer_blocks.18.attn.to_out.0": 5,
"transformer_blocks.18.ff.net.0.proj": 11,
"transformer_blocks.18.ff.net.2": 8,
"transformer_blocks.18.norm1.linear": 6,
"transformer_blocks.18.ff_context.net.0.proj": 3,
"transformer_blocks.18.norm1_context.linear": 7,
"single_transformer_blocks.0.attn.to_q": 18,
"single_transformer_blocks.0.attn.to_k": 18,
"single_transformer_blocks.0.attn.to_v": 18,
"single_transformer_blocks.0.proj_mlp": 18,
"single_transformer_blocks.0.proj_out": 8,
"single_transformer_blocks.0.norm.linear": 7,
"single_transformer_blocks.1.attn.to_q": 22,
"single_transformer_blocks.1.attn.to_k": 22,
"single_transformer_blocks.1.attn.to_v": 22,
"single_transformer_blocks.1.proj_mlp": 22,
"single_transformer_blocks.1.proj_out": 7,
"single_transformer_blocks.1.norm.linear": 7,
"single_transformer_blocks.2.attn.to_q": 18,
"single_transformer_blocks.2.attn.to_k": 18,
"single_transformer_blocks.2.attn.to_v": 18,
"single_transformer_blocks.2.proj_mlp": 18,
"single_transformer_blocks.2.proj_out": 7,
"single_transformer_blocks.2.norm.linear": 7,
"single_transformer_blocks.3.attn.to_q": 21,
"single_transformer_blocks.3.attn.to_k": 21,
"single_transformer_blocks.3.attn.to_v": 21,
"single_transformer_blocks.3.proj_mlp": 21,
"single_transformer_blocks.3.proj_out": 7,
"single_transformer_blocks.3.norm.linear": 8,
"single_transformer_blocks.4.attn.to_q": 18,
"single_transformer_blocks.4.attn.to_k": 18,
"single_transformer_blocks.4.attn.to_v": 18,
"single_transformer_blocks.4.proj_mlp": 18,
"single_transformer_blocks.4.proj_out": 8,
"single_transformer_blocks.4.norm.linear": 8,
"single_transformer_blocks.5.attn.to_q": 20,
"single_transformer_blocks.5.attn.to_k": 20,
"single_transformer_blocks.5.attn.to_v": 20,
"single_transformer_blocks.5.proj_mlp": 20,
"single_transformer_blocks.5.proj_out": 8,
"single_transformer_blocks.5.norm.linear": 9,
"single_transformer_blocks.6.attn.to_q": 20,
"single_transformer_blocks.6.attn.to_k": 20,
"single_transformer_blocks.6.attn.to_v": 20,
"single_transformer_blocks.6.proj_mlp": 20,
"single_transformer_blocks.6.proj_out": 6,
"single_transformer_blocks.6.norm.linear": 10,
"single_transformer_blocks.7.attn.to_q": 21,
"single_transformer_blocks.7.attn.to_k": 21,
"single_transformer_blocks.7.attn.to_v": 21,
"single_transformer_blocks.7.proj_mlp": 21,
"single_transformer_blocks.7.proj_out": 8,
"single_transformer_blocks.7.norm.linear": 10,
"single_transformer_blocks.8.attn.to_q": 19,
"single_transformer_blocks.8.attn.to_k": 19,
"single_transformer_blocks.8.attn.to_v": 19,
"single_transformer_blocks.8.proj_mlp": 19,
"single_transformer_blocks.8.proj_out": 7,
"single_transformer_blocks.8.norm.linear": 7,
"single_transformer_blocks.9.attn.to_q": 22,
"single_transformer_blocks.9.attn.to_k": 22,
"single_transformer_blocks.9.attn.to_v": 22,
"single_transformer_blocks.9.proj_mlp": 22,
"single_transformer_blocks.9.proj_out": 9,
"single_transformer_blocks.9.norm.linear": 8,
"single_transformer_blocks.10.attn.to_q": 22,
"single_transformer_blocks.10.attn.to_k": 22,
"single_transformer_blocks.10.attn.to_v": 22,
"single_transformer_blocks.10.proj_mlp": 22,
"single_transformer_blocks.10.proj_out": 11,
"single_transformer_blocks.10.norm.linear": 11,
"single_transformer_blocks.11.attn.to_q": 22,
"single_transformer_blocks.11.attn.to_k": 22,
"single_transformer_blocks.11.attn.to_v": 22,
"single_transformer_blocks.11.proj_mlp": 22,
"single_transformer_blocks.11.proj_out": 10,
"single_transformer_blocks.11.norm.linear": 10,
"single_transformer_blocks.12.attn.to_q": 23,
"single_transformer_blocks.12.attn.to_k": 23,
"single_transformer_blocks.12.attn.to_v": 23,
"single_transformer_blocks.12.proj_mlp": 23,
"single_transformer_blocks.12.proj_out": 10,
"single_transformer_blocks.12.norm.linear": 8,
"single_transformer_blocks.13.attn.to_q": 22,
"single_transformer_blocks.13.attn.to_k": 22,
"single_transformer_blocks.13.attn.to_v": 22,
"single_transformer_blocks.13.proj_mlp": 22,
"single_transformer_blocks.13.proj_out": 10,
"single_transformer_blocks.13.norm.linear": 12,
"single_transformer_blocks.14.attn.to_q": 24,
"single_transformer_blocks.14.attn.to_k": 24,
"single_transformer_blocks.14.attn.to_v": 24,
"single_transformer_blocks.14.proj_mlp": 24,
"single_transformer_blocks.14.proj_out": 11,
"single_transformer_blocks.14.norm.linear": 11,
"single_transformer_blocks.15.attn.to_q": 25,
"single_transformer_blocks.15.attn.to_k": 25,
"single_transformer_blocks.15.attn.to_v": 25,
"single_transformer_blocks.15.proj_mlp": 25,
"single_transformer_blocks.15.proj_out": 10,
"single_transformer_blocks.15.norm.linear": 11,
"single_transformer_blocks.16.attn.to_q": 26,
"single_transformer_blocks.16.attn.to_k": 26,
"single_transformer_blocks.16.attn.to_v": 26,
"single_transformer_blocks.16.proj_mlp": 26,
"single_transformer_blocks.16.proj_out": 10,
"single_transformer_blocks.16.norm.linear": 9,
"single_transformer_blocks.17.attn.to_q": 28,
"single_transformer_blocks.17.attn.to_k": 28,
"single_transformer_blocks.17.attn.to_v": 28,
"single_transformer_blocks.17.proj_mlp": 28,
"single_transformer_blocks.17.proj_out": 10,
"single_transformer_blocks.17.norm.linear": 9,
"single_transformer_blocks.18.attn.to_q": 28,
"single_transformer_blocks.18.attn.to_k": 28,
"single_transformer_blocks.18.attn.to_v": 28,
"single_transformer_blocks.18.proj_mlp": 28,
"single_transformer_blocks.18.proj_out": 8,
"single_transformer_blocks.18.norm.linear": 11,
"single_transformer_blocks.19.attn.to_q": 27,
"single_transformer_blocks.19.attn.to_k": 27,
"single_transformer_blocks.19.attn.to_v": 27,
"single_transformer_blocks.19.proj_mlp": 27,
"single_transformer_blocks.19.proj_out": 10,
"single_transformer_blocks.19.norm.linear": 10,
"single_transformer_blocks.20.attn.to_q": 29,
"single_transformer_blocks.20.attn.to_k": 29,
"single_transformer_blocks.20.attn.to_v": 29,
"single_transformer_blocks.20.proj_mlp": 29,
"single_transformer_blocks.20.proj_out": 11,
"single_transformer_blocks.20.norm.linear": 12,
"single_transformer_blocks.21.attn.to_q": 26,
"single_transformer_blocks.21.attn.to_k": 26,
"single_transformer_blocks.21.attn.to_v": 26,
"single_transformer_blocks.21.proj_mlp": 26,
"single_transformer_blocks.21.proj_out": 11,
"single_transformer_blocks.21.norm.linear": 11,
"single_transformer_blocks.22.attn.to_q": 26,
"single_transformer_blocks.22.attn.to_k": 26,
"single_transformer_blocks.22.attn.to_v": 26,
"single_transformer_blocks.22.proj_mlp": 26,
"single_transformer_blocks.22.proj_out": 11,
"single_transformer_blocks.22.norm.linear": 10,
"single_transformer_blocks.23.attn.to_q": 30,
"single_transformer_blocks.23.attn.to_k": 30,
"single_transformer_blocks.23.attn.to_v": 30,
"single_transformer_blocks.23.proj_mlp": 30,
"single_transformer_blocks.23.proj_out": 11,
"single_transformer_blocks.23.norm.linear": 11,
"single_transformer_blocks.24.attn.to_q": 28,
"single_transformer_blocks.24.attn.to_k": 28,
"single_transformer_blocks.24.attn.to_v": 28,
"single_transformer_blocks.24.proj_mlp": 28,
"single_transformer_blocks.24.proj_out": 12,
"single_transformer_blocks.24.norm.linear": 11,
"single_transformer_blocks.25.attn.to_q": 28,
"single_transformer_blocks.25.attn.to_k": 28,
"single_transformer_blocks.25.attn.to_v": 28,
"single_transformer_blocks.25.proj_mlp": 28,
"single_transformer_blocks.25.proj_out": 14,
"single_transformer_blocks.25.norm.linear": 9,
"single_transformer_blocks.26.attn.to_q": 28,
"single_transformer_blocks.26.attn.to_k": 28,
"single_transformer_blocks.26.attn.to_v": 28,
"single_transformer_blocks.26.proj_mlp": 28,
"single_transformer_blocks.26.proj_out": 13,
"single_transformer_blocks.26.norm.linear": 9,
"single_transformer_blocks.27.attn.to_q": 29,
"single_transformer_blocks.27.attn.to_k": 29,
"single_transformer_blocks.27.attn.to_v": 29,
"single_transformer_blocks.27.proj_mlp": 29,
"single_transformer_blocks.27.proj_out": 14,
"single_transformer_blocks.27.norm.linear": 8,
"single_transformer_blocks.28.attn.to_q": 29,
"single_transformer_blocks.28.attn.to_k": 29,
"single_transformer_blocks.28.attn.to_v": 29,
"single_transformer_blocks.28.proj_mlp": 29,
"single_transformer_blocks.28.proj_out": 14,
"single_transformer_blocks.28.norm.linear": 12,
"single_transformer_blocks.29.attn.to_q": 30,
"single_transformer_blocks.29.attn.to_k": 30,
"single_transformer_blocks.29.attn.to_v": 30,
"single_transformer_blocks.29.proj_mlp": 30,
"single_transformer_blocks.29.proj_out": 14,
"single_transformer_blocks.29.norm.linear": 10,
"single_transformer_blocks.30.attn.to_q": 32,
"single_transformer_blocks.30.attn.to_k": 32,
"single_transformer_blocks.30.attn.to_v": 32,
"single_transformer_blocks.30.proj_mlp": 32,
"single_transformer_blocks.30.proj_out": 15,
"single_transformer_blocks.30.norm.linear": 9,
"single_transformer_blocks.31.attn.to_q": 30,
"single_transformer_blocks.31.attn.to_k": 30,
"single_transformer_blocks.31.attn.to_v": 30,
"single_transformer_blocks.31.proj_mlp": 30,
"single_transformer_blocks.31.proj_out": 16,
"single_transformer_blocks.31.norm.linear": 11,
"single_transformer_blocks.32.attn.to_q": 30,
"single_transformer_blocks.32.attn.to_k": 30,
"single_transformer_blocks.32.attn.to_v": 30,
"single_transformer_blocks.32.proj_mlp": 30,
"single_transformer_blocks.32.proj_out": 15,
"single_transformer_blocks.32.norm.linear": 9,
"single_transformer_blocks.33.attn.to_q": 32,
"single_transformer_blocks.33.attn.to_k": 32,
"single_transformer_blocks.33.attn.to_v": 32,
"single_transformer_blocks.33.proj_mlp": 32,
"single_transformer_blocks.33.proj_out": 16,
"single_transformer_blocks.33.norm.linear": 8,
"single_transformer_blocks.34.attn.to_q": 30,
"single_transformer_blocks.34.attn.to_k": 30,
"single_transformer_blocks.34.attn.to_v": 30,
"single_transformer_blocks.34.proj_mlp": 30,
"single_transformer_blocks.34.proj_out": 16,
"single_transformer_blocks.34.norm.linear": 7,
"single_transformer_blocks.35.attn.to_q": 31,
"single_transformer_blocks.35.attn.to_k": 31,
"single_transformer_blocks.35.attn.to_v": 31,
"single_transformer_blocks.35.proj_mlp": 31,
"single_transformer_blocks.35.proj_out": 14,
"single_transformer_blocks.35.norm.linear": 8,
"single_transformer_blocks.36.attn.to_q": 30,
"single_transformer_blocks.36.attn.to_k": 30,
"single_transformer_blocks.36.attn.to_v": 30,
"single_transformer_blocks.36.proj_mlp": 30,
"single_transformer_blocks.36.proj_out": 12,
"single_transformer_blocks.36.norm.linear": 9,
"single_transformer_blocks.37.attn.to_q": 17,
"single_transformer_blocks.37.attn.to_k": 17,
"single_transformer_blocks.37.attn.to_v": 17,
"single_transformer_blocks.37.proj_mlp": 17,
"single_transformer_blocks.37.proj_out": 7,
"single_transformer_blocks.37.norm.linear": 9,
"transformer_blocks.8.attn.add_v_proj": 4,
"transformer_blocks.18.attn.to_add_out": 4,
"transformer_blocks.6.ff_context.net.2": 4,
"transformer_blocks.8.ff_context.net.0.proj": 4,
"transformer_blocks.13.attn.add_k_proj": 4,
"transformer_blocks.8.attn.add_k_proj": 4,
"transformer_blocks.5.ff_context.net.0.proj": 4,
"transformer_blocks.12.ff_context.net.2": 4,
"transformer_blocks.2.ff_context.net.0.proj": 4,
"transformer_blocks.2.norm1_context.linear": 4,
"transformer_blocks.13.attn.add_q_proj": 4,
"transformer_blocks.15.ff_context.net.0.proj": 4,
"transformer_blocks.18.ff_context.net.2": 4,
"transformer_blocks.5.attn.to_add_out": 4,
"transformer_blocks.7.attn.to_add_out": 4,
"transformer_blocks.13.attn.add_v_proj": 4,
"transformer_blocks.0.attn.add_v_proj": 4,
"transformer_blocks.16.attn.to_add_out": 4,
"transformer_blocks.4.ff_context.net.2": 4,
"transformer_blocks.15.attn.to_add_out": 4,
"transformer_blocks.18.attn.add_q_proj": 4,
"transformer_blocks.0.attn.add_q_proj": 4,
"transformer_blocks.16.ff_context.net.0.proj": 4,
"transformer_blocks.0.ff_context.net.2": 4,
"transformer_blocks.18.attn.add_k_proj": 4,
"transformer_blocks.7.ff_context.net.2": 4,
"transformer_blocks.11.norm1_context.linear": 4,
"transformer_blocks.17.ff_context.net.2": 4,
"transformer_blocks.10.ff_context.net.2": 4,
"transformer_blocks.2.attn.add_k_proj": 4,
"transformer_blocks.9.norm1_context.linear": 4,
"transformer_blocks.15.norm1_context.linear": 4,
"transformer_blocks.13.ff.net.2": 4,
"transformer_blocks.0.attn.add_k_proj": 4,
"transformer_blocks.8.attn.add_q_proj": 4,
"transformer_blocks.2.attn.add_q_proj": 4,
"transformer_blocks.14.ff_context.net.2": 4,
"transformer_blocks.18.attn.add_v_proj": 4,
"transformer_blocks.2.attn.add_v_proj": 4,
"transformer_blocks.15.ff_context.net.2": 4,
"transformer_blocks.2.ff_context.net.2": 4
},
"target_modules": [
"transformer_blocks.11.ff_context.net.2",
"single_transformer_blocks.23.attn.to_k",
"transformer_blocks.8.attn.add_v_proj",
"single_transformer_blocks.5.proj_out",
"transformer_blocks.18.attn.to_add_out",
"single_transformer_blocks.33.attn.to_v",
"transformer_blocks.17.norm1.linear",
"single_transformer_blocks.1.proj_mlp",
"transformer_blocks.4.attn.to_v",
"transformer_blocks.7.attn.to_v",
"transformer_blocks.13.attn.to_add_out",
"transformer_blocks.10.ff.net.2",
"single_transformer_blocks.0.attn.to_v",
"single_transformer_blocks.22.attn.to_v",
"transformer_blocks.6.ff_context.net.2",
"transformer_blocks.15.attn.to_q",
"single_transformer_blocks.25.proj_mlp",
"single_transformer_blocks.17.attn.to_q",
"single_transformer_blocks.26.proj_out",
"single_transformer_blocks.30.attn.to_k",
"single_transformer_blocks.32.attn.to_v",
"transformer_blocks.1.ff.net.2",
"single_transformer_blocks.10.proj_out",
"single_transformer_blocks.14.norm.linear",
"transformer_blocks.15.attn.to_out.0",
"transformer_blocks.17.ff.net.2",
"single_transformer_blocks.34.attn.to_v",
"single_transformer_blocks.4.proj_mlp",
"transformer_blocks.7.attn.to_q",
"transformer_blocks.0.ff.net.0.proj",
"transformer_blocks.14.attn.add_v_proj",
"transformer_blocks.8.ff_context.net.0.proj",
"single_transformer_blocks.28.attn.to_q",
"transformer_blocks.13.attn.add_k_proj",
"single_transformer_blocks.2.proj_mlp",
"single_transformer_blocks.21.attn.to_v",
"single_transformer_blocks.9.attn.to_k",
"single_transformer_blocks.9.attn.to_v",
"transformer_blocks.17.attn.to_v",
"single_transformer_blocks.31.attn.to_k",
"transformer_blocks.10.norm1_context.linear",
"transformer_blocks.17.attn.add_q_proj",
"transformer_blocks.6.ff.net.2",
"transformer_blocks.12.ff.net.0.proj",
"single_transformer_blocks.1.attn.to_k",
"transformer_blocks.1.attn.to_v",
"single_transformer_blocks.23.attn.to_q",
"transformer_blocks.13.ff.net.0.proj",
"single_transformer_blocks.17.proj_mlp",
"transformer_blocks.8.attn.add_k_proj",
"transformer_blocks.5.ff_context.net.0.proj",
"single_transformer_blocks.5.norm.linear",
"transformer_blocks.15.attn.to_k",
"single_transformer_blocks.20.attn.to_v",
"single_transformer_blocks.36.attn.to_q",
"transformer_blocks.9.norm1.linear",
"single_transformer_blocks.5.attn.to_q",
"single_transformer_blocks.10.attn.to_v",
"single_transformer_blocks.30.attn.to_q",
"transformer_blocks.12.attn.to_add_out",
"transformer_blocks.12.ff_context.net.2",
"transformer_blocks.11.ff_context.net.0.proj",
"transformer_blocks.2.ff_context.net.0.proj",
"single_transformer_blocks.6.attn.to_q",
"single_transformer_blocks.3.attn.to_k",
"single_transformer_blocks.13.attn.to_k",
"single_transformer_blocks.16.attn.to_q",
"single_transformer_blocks.27.attn.to_q",
"transformer_blocks.7.ff_context.net.0.proj",
"single_transformer_blocks.23.proj_out",
"transformer_blocks.12.attn.add_k_proj",
"single_transformer_blocks.15.attn.to_q",
"single_transformer_blocks.37.proj_out",
"transformer_blocks.3.attn.to_v",
"transformer_blocks.17.attn.add_v_proj",
"transformer_blocks.2.norm1_context.linear",
"single_transformer_blocks.32.norm.linear",
"transformer_blocks.12.ff.net.2",
"single_transformer_blocks.20.attn.to_k",
"transformer_blocks.2.ff.net.2",
"single_transformer_blocks.3.proj_out",
"transformer_blocks.18.attn.to_q",
"single_transformer_blocks.14.attn.to_q",
"single_transformer_blocks.11.proj_mlp",
"single_transformer_blocks.14.attn.to_v",
"transformer_blocks.13.attn.add_q_proj",
"transformer_blocks.15.ff_context.net.0.proj",
"single_transformer_blocks.8.proj_mlp",
"single_transformer_blocks.16.attn.to_k",
"single_transformer_blocks.27.norm.linear",
"transformer_blocks.8.ff.net.0.proj",
"single_transformer_blocks.33.norm.linear",
"transformer_blocks.10.attn.to_add_out",
"single_transformer_blocks.33.proj_out",
"transformer_blocks.16.attn.add_v_proj",
"transformer_blocks.12.attn.to_out.0",
"transformer_blocks.15.attn.add_q_proj",
"transformer_blocks.2.attn.to_v",
"single_transformer_blocks.17.proj_out",
"transformer_blocks.5.attn.add_k_proj",
"transformer_blocks.12.attn.to_q",
"single_transformer_blocks.0.attn.to_q",
"transformer_blocks.1.attn.add_q_proj",
"transformer_blocks.14.attn.to_out.0",
"single_transformer_blocks.33.attn.to_k",
"transformer_blocks.3.ff.net.0.proj",
"single_transformer_blocks.18.proj_mlp",
"single_transformer_blocks.26.attn.to_k",
"transformer_blocks.4.attn.add_q_proj",
"transformer_blocks.10.attn.to_out.0",
"transformer_blocks.13.ff_context.net.2",
"transformer_blocks.6.ff.net.0.proj",
"transformer_blocks.2.attn.to_q",
"transformer_blocks.17.attn.to_add_out",
"single_transformer_blocks.24.attn.to_q",
"single_transformer_blocks.32.attn.to_q",
"transformer_blocks.16.ff.net.2",
"single_transformer_blocks.12.norm.linear",
"transformer_blocks.4.attn.to_add_out",
"single_transformer_blocks.28.attn.to_k",
"transformer_blocks.18.ff_context.net.2",
"transformer_blocks.9.attn.to_q",
"transformer_blocks.8.norm1.linear",
"transformer_blocks.16.attn.to_k",
"transformer_blocks.13.attn.to_q",
"transformer_blocks.5.attn.to_add_out",
"transformer_blocks.7.attn.to_add_out",
"transformer_blocks.9.attn.add_v_proj",
"transformer_blocks.1.attn.to_add_out",
"transformer_blocks.1.ff_context.net.0.proj",
"transformer_blocks.2.ff.net.0.proj",
"transformer_blocks.7.ff.net.2",
"transformer_blocks.13.attn.add_v_proj",
"single_transformer_blocks.35.attn.to_k",
"single_transformer_blocks.29.norm.linear",
"transformer_blocks.11.attn.to_out.0",
"transformer_blocks.8.ff_context.net.2",
"transformer_blocks.12.norm1.linear",
"transformer_blocks.7.norm1.linear",
"single_transformer_blocks.9.attn.to_q",
"transformer_blocks.9.attn.add_k_proj",
"single_transformer_blocks.14.attn.to_k",
"single_transformer_blocks.3.attn.to_v",
"single_transformer_blocks.37.norm.linear",
"transformer_blocks.13.attn.to_v",
"single_transformer_blocks.0.proj_mlp",
"transformer_blocks.0.attn.to_q",
"transformer_blocks.6.attn.to_add_out",
"transformer_blocks.12.attn.to_v",
"single_transformer_blocks.4.proj_out",
"single_transformer_blocks.24.attn.to_v",
"single_transformer_blocks.4.attn.to_v",
"transformer_blocks.3.attn.to_k",
"transformer_blocks.5.ff_context.net.2",
"single_transformer_blocks.15.proj_out",
"transformer_blocks.14.ff.net.2",
"transformer_blocks.0.attn.add_v_proj",
"single_transformer_blocks.15.norm.linear",
"single_transformer_blocks.19.attn.to_q",
"single_transformer_blocks.35.norm.linear",
"transformer_blocks.11.attn.to_q",
"single_transformer_blocks.6.proj_mlp",
"transformer_blocks.16.attn.to_add_out",
"single_transformer_blocks.14.proj_mlp",
"transformer_blocks.10.attn.add_v_proj",
"transformer_blocks.4.ff_context.net.2",
"transformer_blocks.15.attn.to_add_out",
"single_transformer_blocks.27.attn.to_v",
"transformer_blocks.2.attn.to_k",
"transformer_blocks.7.attn.add_q_proj",
"single_transformer_blocks.1.norm.linear",
"single_transformer_blocks.24.attn.to_k",
"transformer_blocks.18.attn.add_q_proj",
"transformer_blocks.14.ff_context.net.0.proj",
"single_transformer_blocks.6.attn.to_k",
"single_transformer_blocks.37.attn.to_v",
"transformer_blocks.0.attn.add_q_proj",
"transformer_blocks.0.attn.to_out.0",
"single_transformer_blocks.5.attn.to_v",
"single_transformer_blocks.5.attn.to_k",
"transformer_blocks.18.norm1_context.linear",
"transformer_blocks.4.ff.net.2",
"single_transformer_blocks.29.attn.to_k",
"transformer_blocks.6.ff_context.net.0.proj",
"transformer_blocks.9.attn.to_k",
"single_transformer_blocks.12.attn.to_q",
"single_transformer_blocks.37.attn.to_k",
"single_transformer_blocks.22.proj_mlp",
"transformer_blocks.15.attn.add_v_proj",
"transformer_blocks.4.norm1_context.linear",
"single_transformer_blocks.13.norm.linear",
"single_transformer_blocks.36.norm.linear",
"transformer_blocks.5.norm1.linear",
"transformer_blocks.9.ff_context.net.0.proj",
"single_transformer_blocks.36.attn.to_v",
"transformer_blocks.6.norm1.linear",
"transformer_blocks.15.norm1.linear",
"single_transformer_blocks.16.proj_out",
"single_transformer_blocks.15.attn.to_v",
"transformer_blocks.10.attn.add_q_proj",
"transformer_blocks.2.norm1.linear",
"single_transformer_blocks.29.proj_out",
"single_transformer_blocks.28.attn.to_v",
"single_transformer_blocks.0.proj_out",
"single_transformer_blocks.28.proj_out",
"single_transformer_blocks.21.norm.linear",
"single_transformer_blocks.16.proj_mlp",
"single_transformer_blocks.33.attn.to_q",
"single_transformer_blocks.36.proj_mlp",
"single_transformer_blocks.36.proj_out",
"transformer_blocks.11.attn.add_v_proj",
"single_transformer_blocks.26.attn.to_q",
"transformer_blocks.8.attn.to_add_out",
"transformer_blocks.18.attn.to_k",
"single_transformer_blocks.31.attn.to_q",
"single_transformer_blocks.2.attn.to_k",
"single_transformer_blocks.10.attn.to_k",
"transformer_blocks.4.ff_context.net.0.proj",
"single_transformer_blocks.37.proj_mlp",
"transformer_blocks.5.attn.add_q_proj",
"single_transformer_blocks.27.proj_out",
"transformer_blocks.5.ff.net.2",
"transformer_blocks.4.attn.add_v_proj",
"transformer_blocks.17.attn.to_out.0",
"transformer_blocks.0.ff_context.net.0.proj",
"transformer_blocks.11.ff.net.2",
"transformer_blocks.9.attn.to_out.0",
"single_transformer_blocks.32.proj_mlp",
"transformer_blocks.3.norm1.linear",
"single_transformer_blocks.2.attn.to_v",
"single_transformer_blocks.12.attn.to_v",
"single_transformer_blocks.21.proj_out",
"transformer_blocks.5.attn.add_v_proj",
"transformer_blocks.14.attn.to_add_out",
"single_transformer_blocks.7.attn.to_k",
"single_transformer_blocks.11.attn.to_v",
"single_transformer_blocks.4.attn.to_q",
"transformer_blocks.10.attn.to_k",
"single_transformer_blocks.17.attn.to_k",
"single_transformer_blocks.30.norm.linear",
"transformer_blocks.9.attn.add_q_proj",
"transformer_blocks.4.norm1.linear",
"single_transformer_blocks.37.attn.to_q",
"transformer_blocks.16.ff_context.net.0.proj",
"single_transformer_blocks.34.attn.to_q",
"transformer_blocks.13.attn.to_k",
"transformer_blocks.12.norm1_context.linear",
"single_transformer_blocks.29.attn.to_v",
"transformer_blocks.4.attn.to_out.0",
"single_transformer_blocks.35.proj_out",
"transformer_blocks.8.attn.to_q",
"single_transformer_blocks.23.norm.linear",
"transformer_blocks.0.ff_context.net.2",
"transformer_blocks.14.norm1_context.linear",
"transformer_blocks.16.ff.net.0.proj",
"transformer_blocks.10.attn.add_k_proj",
"single_transformer_blocks.34.norm.linear",
"transformer_blocks.17.attn.to_q",
"single_transformer_blocks.6.norm.linear",
"transformer_blocks.1.ff.net.0.proj",
"single_transformer_blocks.15.proj_mlp",
"transformer_blocks.16.ff_context.net.2",
"single_transformer_blocks.8.attn.to_q",
"transformer_blocks.3.attn.to_add_out",
"single_transformer_blocks.20.proj_out",
"single_transformer_blocks.26.attn.to_v",
"single_transformer_blocks.32.attn.to_k",
"single_transformer_blocks.16.attn.to_v",
"single_transformer_blocks.17.norm.linear",
"transformer_blocks.13.norm1_context.linear",
"transformer_blocks.3.ff.net.2",
"transformer_blocks.10.ff_context.net.0.proj",
"transformer_blocks.18.attn.add_k_proj",
"transformer_blocks.0.attn.to_v",
"single_transformer_blocks.12.attn.to_k",
"single_transformer_blocks.19.proj_mlp",
"transformer_blocks.7.ff_context.net.2",
"single_transformer_blocks.25.norm.linear",
"transformer_blocks.0.norm1_context.linear",
"transformer_blocks.9.attn.to_v",
"transformer_blocks.7.ff.net.0.proj",
"single_transformer_blocks.30.proj_out",
"single_transformer_blocks.7.norm.linear",
"single_transformer_blocks.31.attn.to_v",
"transformer_blocks.11.norm1_context.linear",
"transformer_blocks.15.ff.net.2",
"single_transformer_blocks.28.norm.linear",
"transformer_blocks.17.ff_context.net.2",
"single_transformer_blocks.2.proj_out",
"single_transformer_blocks.2.norm.linear",
"transformer_blocks.14.attn.to_v",
"single_transformer_blocks.19.attn.to_k",
"single_transformer_blocks.18.attn.to_k",
"transformer_blocks.8.attn.to_k",
"transformer_blocks.8.norm1_context.linear",
"transformer_blocks.11.attn.add_k_proj",
"single_transformer_blocks.1.proj_out",
"single_transformer_blocks.25.attn.to_k",
"transformer_blocks.8.attn.to_v",
"single_transformer_blocks.28.proj_mlp",
"transformer_blocks.4.ff.net.0.proj",
"transformer_blocks.15.attn.to_v",
"single_transformer_blocks.24.proj_mlp",
"transformer_blocks.5.attn.to_q",
"transformer_blocks.10.ff_context.net.2",
"single_transformer_blocks.9.proj_mlp",
"single_transformer_blocks.25.attn.to_q",
"single_transformer_blocks.16.norm.linear",
"single_transformer_blocks.26.proj_mlp",
"transformer_blocks.14.attn.to_k",
"single_transformer_blocks.6.attn.to_v",
"single_transformer_blocks.24.norm.linear",
"transformer_blocks.2.attn.add_k_proj",
"single_transformer_blocks.17.attn.to_v",
"transformer_blocks.10.attn.to_v",
"single_transformer_blocks.1.attn.to_q",
"transformer_blocks.18.ff.net.2",
"single_transformer_blocks.4.norm.linear",
"transformer_blocks.9.norm1_context.linear",
"single_transformer_blocks.8.proj_out",
"transformer_blocks.1.norm1.linear",
"single_transformer_blocks.18.attn.to_v",
"single_transformer_blocks.20.proj_mlp",
"transformer_blocks.14.norm1.linear",
"single_transformer_blocks.30.attn.to_v",
"transformer_blocks.16.attn.to_q",
"transformer_blocks.17.ff_context.net.0.proj",
"transformer_blocks.6.attn.add_k_proj",
"transformer_blocks.15.norm1_context.linear",
"transformer_blocks.3.attn.to_q",
"transformer_blocks.13.ff.net.2",
"single_transformer_blocks.26.norm.linear",
"single_transformer_blocks.31.proj_mlp",
"single_transformer_blocks.25.proj_out",
"transformer_blocks.1.attn.add_k_proj",
"single_transformer_blocks.5.proj_mlp",
"single_transformer_blocks.29.attn.to_q",
"single_transformer_blocks.13.proj_out",
"transformer_blocks.0.ff.net.2",
"transformer_blocks.2.attn.to_out.0",
"transformer_blocks.6.attn.to_k",
"transformer_blocks.6.attn.add_q_proj",
"single_transformer_blocks.33.proj_mlp",
"transformer_blocks.5.norm1_context.linear",
"transformer_blocks.7.attn.add_k_proj",
"single_transformer_blocks.0.norm.linear",
"transformer_blocks.9.ff_context.net.2",
"transformer_blocks.16.attn.to_out.0",
"single_transformer_blocks.35.attn.to_v",
"single_transformer_blocks.0.attn.to_k",
"single_transformer_blocks.21.attn.to_k",
"transformer_blocks.3.attn.add_k_proj",
"transformer_blocks.7.norm1_context.linear",
"transformer_blocks.18.norm1.linear",
"single_transformer_blocks.22.proj_out",
"single_transformer_blocks.23.proj_mlp",
"single_transformer_blocks.11.attn.to_q",
"single_transformer_blocks.22.norm.linear",
"transformer_blocks.1.attn.to_q",
"transformer_blocks.0.attn.add_k_proj",
"transformer_blocks.13.ff_context.net.0.proj",
"single_transformer_blocks.27.proj_mlp",
"single_transformer_blocks.3.norm.linear",
"transformer_blocks.8.attn.add_q_proj",
"single_transformer_blocks.8.attn.to_k",
"single_transformer_blocks.11.proj_out",
"transformer_blocks.14.ff.net.0.proj",
"transformer_blocks.17.attn.to_k",
"transformer_blocks.6.attn.to_q",
"single_transformer_blocks.24.proj_out",
"transformer_blocks.11.attn.to_v",
"transformer_blocks.2.attn.to_add_out",
"single_transformer_blocks.8.norm.linear",
"transformer_blocks.3.attn.add_v_proj",
"transformer_blocks.12.attn.add_v_proj",
"single_transformer_blocks.12.proj_mlp",
"transformer_blocks.3.ff_context.net.2",
"single_transformer_blocks.30.proj_mlp",
"transformer_blocks.6.attn.add_v_proj",
"transformer_blocks.17.attn.add_k_proj",
"transformer_blocks.15.attn.add_k_proj",
"transformer_blocks.11.norm1.linear",
"transformer_blocks.1.attn.add_v_proj",
"single_transformer_blocks.7.attn.to_v",
"single_transformer_blocks.1.attn.to_v",
"single_transformer_blocks.7.proj_out",
"transformer_blocks.5.ff.net.0.proj",
"single_transformer_blocks.21.proj_mlp",
"single_transformer_blocks.18.attn.to_q",
"single_transformer_blocks.7.attn.to_q",
"transformer_blocks.12.ff_context.net.0.proj",
"single_transformer_blocks.19.attn.to_v",
"transformer_blocks.5.attn.to_out.0",
"single_transformer_blocks.18.norm.linear",
"transformer_blocks.10.norm1.linear",
"single_transformer_blocks.11.norm.linear",
"single_transformer_blocks.32.proj_out",
"single_transformer_blocks.6.proj_out",
"transformer_blocks.12.attn.to_k",
"single_transformer_blocks.2.attn.to_q",
"transformer_blocks.3.norm1_context.linear",
"transformer_blocks.8.attn.to_out.0",
"transformer_blocks.13.attn.to_out.0",
"transformer_blocks.16.norm1.linear",
"single_transformer_blocks.35.attn.to_q",
"transformer_blocks.7.attn.to_out.0",
"transformer_blocks.4.attn.to_q",
"transformer_blocks.12.attn.add_q_proj",
"transformer_blocks.2.attn.add_q_proj",
"transformer_blocks.14.ff_context.net.2",
"transformer_blocks.14.attn.to_q",
"single_transformer_blocks.13.attn.to_q",
"transformer_blocks.18.attn.add_v_proj",
"transformer_blocks.2.attn.add_v_proj",
"transformer_blocks.11.attn.add_q_proj",
"single_transformer_blocks.13.attn.to_v",
"single_transformer_blocks.19.norm.linear",
"single_transformer_blocks.31.norm.linear",
"single_transformer_blocks.25.attn.to_v",
"transformer_blocks.18.ff_context.net.0.proj",
"single_transformer_blocks.10.attn.to_q",
"single_transformer_blocks.13.proj_mlp",
"transformer_blocks.7.attn.add_v_proj",
"single_transformer_blocks.7.proj_mlp",
"single_transformer_blocks.3.attn.to_q",
"single_transformer_blocks.22.attn.to_k",
"single_transformer_blocks.21.attn.to_q",
"transformer_blocks.18.attn.to_v",
"transformer_blocks.13.norm1.linear",
"single_transformer_blocks.10.norm.linear",
"transformer_blocks.14.attn.add_k_proj",
"transformer_blocks.16.attn.add_q_proj",
"transformer_blocks.15.ff_context.net.2",
"transformer_blocks.10.ff.net.0.proj",
"single_transformer_blocks.8.attn.to_v",
"single_transformer_blocks.29.proj_mlp",
"transformer_blocks.1.attn.to_k",
"transformer_blocks.9.ff.net.0.proj",
"single_transformer_blocks.34.attn.to_k",
"transformer_blocks.16.attn.add_k_proj",
"transformer_blocks.16.norm1_context.linear",
"transformer_blocks.18.ff.net.0.proj",
"transformer_blocks.6.attn.to_out.0",
"transformer_blocks.11.attn.to_add_out",
"transformer_blocks.1.ff_context.net.2",
"single_transformer_blocks.20.norm.linear",
"transformer_blocks.8.ff.net.2",
"single_transformer_blocks.19.proj_out",
"transformer_blocks.4.attn.add_k_proj",
"transformer_blocks.5.attn.to_v",
"transformer_blocks.10.attn.to_q",
"transformer_blocks.11.ff.net.0.proj",
"transformer_blocks.4.attn.to_k",
"transformer_blocks.1.norm1_context.linear",
"single_transformer_blocks.31.proj_out",
"single_transformer_blocks.11.attn.to_k",
"single_transformer_blocks.23.attn.to_v",
"single_transformer_blocks.3.proj_mlp",
"single_transformer_blocks.4.attn.to_k",
"transformer_blocks.15.ff.net.0.proj",
"single_transformer_blocks.27.attn.to_k",
"transformer_blocks.3.attn.add_q_proj",
"transformer_blocks.3.attn.to_out.0",
"transformer_blocks.0.attn.to_add_out",
"transformer_blocks.9.attn.to_add_out",
"transformer_blocks.2.ff_context.net.2",
"transformer_blocks.0.attn.to_k",
"transformer_blocks.11.attn.to_k",
"transformer_blocks.6.attn.to_v",
"transformer_blocks.3.ff_context.net.0.proj",
"single_transformer_blocks.14.proj_out",
"transformer_blocks.17.norm1_context.linear",
"single_transformer_blocks.12.proj_out",
"transformer_blocks.18.attn.to_out.0",
"single_transformer_blocks.10.proj_mlp",
"transformer_blocks.0.norm1.linear",
"single_transformer_blocks.22.attn.to_q",
"transformer_blocks.16.attn.to_v",
"transformer_blocks.7.attn.to_k",
"single_transformer_blocks.34.proj_mlp",
"transformer_blocks.9.ff.net.2",
"transformer_blocks.14.attn.add_q_proj",
"single_transformer_blocks.20.attn.to_q",
"transformer_blocks.1.attn.to_out.0",
"single_transformer_blocks.9.norm.linear",
"single_transformer_blocks.34.proj_out",
"single_transformer_blocks.9.proj_out",
"transformer_blocks.5.attn.to_k",
"transformer_blocks.17.ff.net.0.proj",
"transformer_blocks.6.norm1_context.linear",
"single_transformer_blocks.36.attn.to_k",
"single_transformer_blocks.35.proj_mlp",
"single_transformer_blocks.15.attn.to_k",
"single_transformer_blocks.18.proj_out"
],
"use_dora": false,
"lora_bias": false
} We see the QKV modules for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sayakpaul
Co-authored-by: hlky <[email protected]>
@BenjaminBossan I asked for a review from you for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally LGTM, thanks. I added some comments but they're not blockers.
|
||
has_norm_diff = any("norm" in k and "diff" in k for k in state_dict) | ||
if has_norm_diff: | ||
zero_status_diff = state_dict_all_zero(state_dict, "diff") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not familiar with those other state dict formats, just wanted to ask whether it would be safer to use dots in the filter keys, e.g. .diff.
instead of diff
to prevent accidental matches.
Hey @sayakpaul, I have tested this on quite some LoRas and it works on most of them. Great news 🎉 I did find some that still does not work. I am sharing them here for visibility. https://civitai.com/models/832858?modelVersionId=951509 |
@lordsoffallen just pushed some changes that should make it possible to load the first two LoRAs you mentioned. Will work on fixing the third one later. Since the PR is already getting delayed a bit I would prefer doing that in a separate PR. Hope that is fine. @DN6 @BenjaminBossan I have made the change requested by @BenjaminBossan in #10985 (comment). I have also run the integration tests and they pass. LMK if you would like to do another pass before we merge this. |
What does this PR do?
Support more ComfyUI Flux LoRAs. Context: #10954 (comment). Depends on huggingface/peft#2419.
This PR has a breaking change explained here:
#10985 (comment)
I have run the LoRA integration tests and they are passing.