AnyText: Multilingual Visual Text Generation And Editing

### Model/Pipeline/Scheduler description

From the [repository](https://github.com/tyxsspa/AnyText): 

> AnyText comprises a diffusion pipeline with two primary elements: an auxiliary latent module and a text embedding module. The former uses inputs like text glyph, position, and masked image to generate latent features for text generation or editing. The latter employs an OCR model for encoding stroke data as embeddings, which blend with image caption embeddings from the tokenizer to generate texts that seamlessly integrate with the background. We employed text-control diffusion loss and text perceptual loss for training to further enhance writing accuracy.

![image](https://github.com/huggingface/diffusers/assets/22957388/e9876733-fa17-4884-a351-c83820ecd77e)


### Open source status

- [X] The model implementation is available.
- [X] The model weights are available (Only relevant if addition is not a scheduler).

### Provide useful links for the implementation

Repository: https://github.com/tyxsspa/AnyText

Paper: https://arxiv.org/abs/2311.03054

Weights and inference code: https://modelscope.cn/models/damo/cv_anytext_text_generation_editing/summary

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

AnyText: Multilingual Visual Text Generation And Editing #6407

Model/Pipeline/Scheduler description

Open source status

Provide useful links for the implementation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

AnyText: Multilingual Visual Text Generation And Editing #6407

Description

Model/Pipeline/Scheduler description

Open source status

Provide useful links for the implementation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions