Skip to content

[Research Project] Add AnyText: Multilingual Visual Text Generation And Editing #8998

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 130 commits into from
Mar 10, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
130 commits
Select commit Hold shift + click to select a range
6e8088f
Add initial template
tolgacangoz Jul 28, 2024
98c2d6e
Second template
tolgacangoz Jul 29, 2024
867bbbf
feat: Add TextEmbeddingModule to AnyTextPipeline
tolgacangoz Jul 30, 2024
8818372
feat: Add AuxiliaryLatentModule template to AnyTextPipeline
tolgacangoz Jul 30, 2024
37c46d8
Merge branch 'main' into Add-AnyText
tolgacangoz Jul 30, 2024
64c63eb
Add bert tokenizer from the anytext repo for now
tolgacangoz Jul 30, 2024
92f8b79
feat: Update AnyTextPipeline's modify_prompt method
tolgacangoz Jul 30, 2024
e9c688c
Fill in the `forward` pass of `AuxiliaryLatentModule`
tolgacangoz Jul 30, 2024
42a41d0
`make style && make quality`
tolgacangoz Jul 30, 2024
9d50f80
`chore: Update bert_tokenizer.py with a TODO comment suggesting the u…
tolgacangoz Jul 30, 2024
22aa69a
Merge branch 'main' into Add-AnyText
tolgacangoz Jul 30, 2024
5e1e515
Update error handling to raise and logging
tolgacangoz Jul 30, 2024
2d10f0c
Add `create_glyph_lines` function into `TextEmbeddingModule`
tolgacangoz Jul 30, 2024
bc197a9
make style
tolgacangoz Jul 30, 2024
e52d8cc
Up
tolgacangoz Jul 31, 2024
8c69d83
Up
tolgacangoz Jul 31, 2024
4a413aa
Up
tolgacangoz Jul 31, 2024
571608b
Up
tolgacangoz Aug 1, 2024
2b1b50d
Merge branch 'main' into Add-AnyText
tolgacangoz Aug 1, 2024
a7d025f
Remove several comments
tolgacangoz Aug 1, 2024
d2c5a65
refactor: Remove ControlNetConditioningEmbedding and update code acco…
tolgacangoz Aug 1, 2024
c1f538c
Merge branch 'main' into Add-AnyText
tolgacangoz Aug 1, 2024
2607b6b
Up
tolgacangoz Aug 1, 2024
a9fe4a0
Up
tolgacangoz Aug 1, 2024
567f553
up
tolgacangoz Aug 1, 2024
a9991d0
refactor: Update AnyTextPipeline to include new optional parameters
tolgacangoz Aug 1, 2024
e69c51e
Merge branch 'main' into Add-AnyText
tolgacangoz Aug 1, 2024
91252e0
up
tolgacangoz Aug 1, 2024
e54f876
Merge branch 'main' into Add-AnyText
tolgacangoz Aug 2, 2024
b9164e3
feat: Add OCR model and its components
tolgacangoz Aug 2, 2024
cd4c9c2
chore: Update `TextEmbeddingModule` to include OCR model components a…
tolgacangoz Aug 2, 2024
0918cbd
chore: Update `AuxiliaryLatentModule` to include VAE model and its de…
tolgacangoz Aug 2, 2024
37ae99f
`make style`
tolgacangoz Aug 2, 2024
15fd4df
Merge branch 'main' into Add-AnyText
tolgacangoz Aug 2, 2024
2e40224
Merge branch 'main' into Add-AnyText
tolgacangoz Aug 4, 2024
b475a3b
refactor: Update `AnyTextPipeline`'s docstring
tolgacangoz Aug 4, 2024
ea957f0
Update `AuxiliaryLatentModule` to include info dictionary so that tex…
tolgacangoz Aug 4, 2024
cc0c6e5
simplify
tolgacangoz Aug 4, 2024
52fb0b4
`make style`
tolgacangoz Aug 4, 2024
187473d
Merge branch 'main' into Add-AnyText
tolgacangoz Aug 4, 2024
9dd4ee9
Converting `TextEmbeddingModule` to ordinary `encode_prompt()` function
tolgacangoz Aug 4, 2024
7dbd4bc
Simplify for now
tolgacangoz Aug 5, 2024
f422423
`make style`
tolgacangoz Aug 5, 2024
62bb2a0
Merge branch 'main' into Add-AnyText
tolgacangoz Aug 5, 2024
8466009
Up
tolgacangoz Aug 5, 2024
2b4be7a
feat: Add scripts to convert AnyText controlnet to diffusers
tolgacangoz Aug 5, 2024
d4718fd
Merge branch 'main' into Add-AnyText
tolgacangoz Aug 5, 2024
1cdbb55
`make style`
tolgacangoz Aug 5, 2024
da67ff7
Fix: Move glyph rendering to `TextEmbeddingModule` from `AuxiliaryLat…
tolgacangoz Aug 6, 2024
af30f0f
make style
tolgacangoz Aug 6, 2024
a8dbbe2
Up
tolgacangoz Aug 6, 2024
12fca1c
Merge branch 'main' of github.com:huggingface/diffusers into Add-AnyText
tolgacangoz Aug 6, 2024
bbfe8f2
Merge branch 'main' into Add-AnyText
tolgacangoz Aug 6, 2024
936c2ff
Simplify
tolgacangoz Aug 6, 2024
73d8144
Merge branch 'main' into Add-AnyText
tolgacangoz Aug 7, 2024
cffa036
Up
tolgacangoz Aug 7, 2024
8b43bc3
feat: Add safetensors module for loading model file
tolgacangoz Aug 7, 2024
f60a72b
Fix device issues
tolgacangoz Aug 7, 2024
be4a319
Up
tolgacangoz Aug 8, 2024
18d3f60
Merge branch 'main' into Add-AnyText
tolgacangoz Aug 8, 2024
f713171
Up
tolgacangoz Aug 8, 2024
da9adbb
merge
tolgacangoz Aug 9, 2024
fdf0275
Merge branch 'main' into Add-AnyText
tolgacangoz Aug 9, 2024
f347ff2
refactor: Simplify
tolgacangoz Aug 9, 2024
d52e973
refactor: Simplify code for loading models and handling data types
tolgacangoz Aug 9, 2024
a3b493f
`make style`
tolgacangoz Aug 9, 2024
020074a
Merge branch 'main' into Add-AnyText
tolgacangoz Aug 9, 2024
4267c84
refactor: Update to() method in FrozenCLIPEmbedderT3 and TextEmbeddin…
tolgacangoz Aug 9, 2024
ab51226
refactor: Update dtype in embedding_manager.py to match proj.weight
tolgacangoz Aug 9, 2024
c961a96
Merge branch 'main' of github.com:huggingface/diffusers
tolgacangoz Aug 9, 2024
3873d02
Merge branch 'main' into Add-AnyText
tolgacangoz Aug 9, 2024
5041d40
Merge branch 'main' of github.com:huggingface/diffusers
tolgacangoz Aug 10, 2024
1521e8f
Up
tolgacangoz Aug 10, 2024
1aa17bb
Merge branch 'main' into Add-AnyText
tolgacangoz Aug 10, 2024
c13c61d
Merge branch 'main' into Add-AnyText
tolgacangoz Aug 12, 2024
1d18f1d
Merge branch 'main' into Add-AnyText
tolgacangoz Sep 7, 2024
44a3a70
Merge branch 'main' into Add-AnyText
tolgacangoz Sep 27, 2024
56992d1
Add attribution and adaptation information to pipeline_anytext.py
tolgacangoz Oct 7, 2024
7ad6865
Update usage example
tolgacangoz Oct 11, 2024
a5edca5
Will refactor `controlnet_cond_embedding` initialization
tolgacangoz Oct 11, 2024
48e88eb
Merge branch 'main' into Add-AnyText
tolgacangoz Oct 13, 2024
2f42e40
Add `AnyTextControlNetConditioningEmbedding` template
tolgacangoz Oct 13, 2024
670fef5
Refactor organization
tolgacangoz Oct 18, 2024
930c37a
style
tolgacangoz Oct 18, 2024
923da7b
Merge branch 'main' into Add-AnyText
tolgacangoz Oct 20, 2024
21c0c35
style
tolgacangoz Oct 20, 2024
c4db96a
Move custom blocks from `AuxiliaryLatentModule` to `AnyTextControlNet…
tolgacangoz Oct 20, 2024
e2e7160
Merge branch 'main' into Add-AnyText
tolgacangoz Oct 21, 2024
4335ebd
Merge branch 'main' into Add-AnyText
tolgacangoz Nov 3, 2024
6bd0b4c
Follow one-file policy
tolgacangoz Nov 3, 2024
b3f98a7
style
tolgacangoz Nov 3, 2024
cccf0f4
Merge branch 'main' into Add-AnyText
tolgacangoz Dec 13, 2024
b5856a6
Merge branch 'main' into Add-AnyText
tolgacangoz Dec 22, 2024
b04d015
Merge branch 'main' into Add-AnyText
tolgacangoz Jan 1, 2025
67f8839
Merge branch 'main' into Add-AnyText
tolgacangoz Jan 6, 2025
d75508e
Merge branch 'main' into Add-AnyText
tolgacangoz Jan 13, 2025
75a0f1f
[Docs] Update README and pipeline_anytext.py to use AnyTextControlNet…
tolgacangoz Jan 13, 2025
d3dcf57
[Docs] Update import statement for AnyTextControlNetModel in pipeline…
tolgacangoz Jan 13, 2025
963fac0
[Fix] Update import path for ControlNetModel, ControlNetOutput in any…
tolgacangoz Jan 13, 2025
0c94143
Merge branch 'main' of github.com:huggingface/diffusers into Add-AnyText
tolgacangoz Jan 13, 2025
2b6f08b
Refactor AnyTextControlNet to use configurable conditioning embedding…
tolgacangoz Jan 13, 2025
971d6ad
Merge branch 'main' into Add-AnyText
tolgacangoz Jan 13, 2025
9c43a65
Complete control net conditioning embedding in AnyTextControlNetModel
tolgacangoz Jan 20, 2025
d46ac3e
Merge branch 'main' into Add-AnyText
tolgacangoz Feb 13, 2025
2ffb80b
Merge branch 'main' into Add-AnyText
tolgacangoz Feb 19, 2025
b8ca0d6
up
tolgacangoz Feb 20, 2025
9657980
[FIX] Ensure embeddings use correct device in AnyTextControlNetModel
tolgacangoz Feb 21, 2025
25ea8be
up
tolgacangoz Feb 21, 2025
2be7bca
up
tolgacangoz Feb 21, 2025
0fc4aab
style
tolgacangoz Feb 21, 2025
5345702
[UPDATE] Revise README and example code for AnyTextPipeline integrati…
tolgacangoz Feb 22, 2025
5b73a1d
[UPDATE] Update example code in anytext.py to use correct font file a…
tolgacangoz Feb 22, 2025
7f87755
down
tolgacangoz Feb 22, 2025
61693a5
[UPDATE] Refactor BasicTokenizer usage to a new Checker class for tex…
tolgacangoz Feb 22, 2025
3b2435f
update pillow
tolgacangoz Feb 22, 2025
3ea49c1
[UPDATE] Remove commented-out code and unnecessary docstring in anyte…
tolgacangoz Feb 22, 2025
299a646
[REMOVE] Delete frozen_clip_embedder_t3.py as it is in the anytext.py…
tolgacangoz Feb 22, 2025
0d44b5b
[UPDATE] Replace edict with dict for configuration in anytext.py and …
tolgacangoz Feb 24, 2025
13b7ecf
🆙
tolgacangoz Feb 24, 2025
d5a6e5f
style
tolgacangoz Feb 26, 2025
09fdd22
[UPDATE] Revise README.md for clarity, remove unused imports in anyte…
tolgacangoz Feb 26, 2025
f8f5edd
Merge branch 'main' into Add-AnyText
tolgacangoz Feb 26, 2025
9495ddb
style
tolgacangoz Feb 26, 2025
13ab248
Merge branch 'main' into Add-AnyText
tolgacangoz Mar 1, 2025
f4abaf2
Update examples/research_projects/anytext/README.md
tolgacangoz Mar 1, 2025
8d313bc
Remove commented-out image preparation code in AnyTextPipeline
tolgacangoz Mar 2, 2025
d02615f
Remove unnecessary blank line in README.md
tolgacangoz Mar 2, 2025
52aa924
Merge branch 'main' into Add-AnyText
tolgacangoz Mar 2, 2025
78447b2
Merge branch 'main' into Add-AnyText
tolgacangoz Mar 7, 2025
3e3f972
Merge branch 'main' into Add-AnyText
tolgacangoz Mar 8, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 32 additions & 0 deletions examples/research_projects/anytext/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# AnyTextPipeline Pipeline

Project page: https://aigcdesigngroup.github.io/homepage_anytext

"AnyText comprises a diffusion pipeline with two primary elements: an auxiliary latent module and a text embedding module. The former uses inputs like text glyph, position, and masked image to generate latent features for text generation or editing. The latter employs an OCR model for encoding stroke data as embeddings, which blend with image caption embeddings from the tokenizer to generate texts that seamlessly integrate with the background. We employed text-control diffusion loss and text perceptual loss for training to further enhance writing accuracy."

Each text line that needs to be generated should be enclosed in double quotes. For any usage questions, please refer to the [paper](https://arxiv.org/abs/2311.03054).


```py
import torch
from diffusers import DiffusionPipeline
from anytext_controlnet import AnyTextControlNetModel
from diffusers.utils import load_image

# I chose a font file shared by an HF staff:
# !wget https://huggingface.co/spaces/ysharma/TranslateQuotesInImageForwards/resolve/main/arial-unicode-ms.ttf

anytext_controlnet = AnyTextControlNetModel.from_pretrained("tolgacangoz/anytext-controlnet", torch_dtype=torch.float16,
variant="fp16",)
pipe = DiffusionPipeline.from_pretrained("tolgacangoz/anytext", font_path="arial-unicode-ms.ttf",
controlnet=anytext_controlnet, torch_dtype=torch.float16,
trust_remote_code=False, # One needs to give permission to run this pipeline's code
).to("cuda")

# generate image
prompt = 'photo of caramel macchiato coffee on the table, top-down perspective, with "Any" "Text" written on it using cream'
draw_pos = load_image("https://raw.githubusercontent.com/tyxsspa/AnyText/refs/heads/main/example_images/gen9.png")
image = pipe(prompt, num_inference_steps=20, mode="generate", draw_pos=draw_pos,
).images[0]
image
```
Loading
Loading