Sign In

Kirazuri (Anima)

Download

1 variant available

bf16 SafeTensor

BF16, good balance • 3.89 GB

Verified:

Type

Checkpoint Trained

Stats

416

Reviews

Published

Mar 26, 2026

Base Model

Anima

Hash

AutoV2
A83699BC48
Supporter Badge March 2024
motimalu's Avatar

motimalu

License:

Anima

Kirazuri (Anima)

Version 2 (Latest)

A full finetune of the Anima preview3-base predominantly trained on high-resolution 1536x1536 AR buckets.

Expanded the dataset with more recent data and included the full dataset used for my previous model Kirazuri Lazuli (Noobai V-Pred).

Total training dataset of 35,537 non-synthetic images manually curated including quality and aesthetic ratings with a dataset cutoff now of 2026/04/15.

Training Details

Main training with diffusion-pipe commit: d5b78a2c49a07db8f7d9a4c795e4cfe7ba1c3dfe

Final stage for high-res used fix in commit: b0aa4f1e03169f3280c8518d37570a448420f8be

  • Samples seen(unbatched steps): ~680,000

  • Training time: ~220 hrs

  • Learning Rate: 4e-6 (General Training) and 2e-6 (Aesthetic)

  • Text Encoder Learning Rate: 8e-7 (General Training) and 2e-7 (Aesthetic)

  • Per-resolution Effective Batch size: 128 (512p), 96 (1024p), and 48 (1536p)

  • Precision: Mixed BF16

  • Optimizer: AdamW8bit with Kahan Summation

  • Weight Decay: 0.01

  • Timestep Sampling Strategy: Logit-Normal

  • Tag Dropout: 30% with protected first 8 tags

Additional Features used:

  • Structured dataset by resolutions and manual ratings for staged training

  • multiscale_loss_weight=0.5 and flux_shift=true for high-resolution training

  • Mixed Natural Language captions with diffusion-pipe captions.json format:

    "image_1.jpg": [
        "{tags}",
        "{first_n_tags}.\n{nl_caption}",
        "{dropout_tags1}.\n{nl_caption}",
        "{nl_caption}\n{dropout_tags2}"
    ]

Installing and running

Workflow:

Reference the anima preview base instructions. The model is natively supported in ComfyUI. The above image contains a workflow; you can open it in ComfyUI or drag-and-drop to get the workflow.

Note: Most preview images on the model card additionally use the custom comfyui-prompt-control node for schedule prompting syntax to mix concepts i.e. [word1|word2]
This custom node is entirely optional but required to exactly recreate the outputs in ComfyUI.

The model files go in their respective folders inside your model directory:

Generation Settings

Trained in mixed resolutions for the majority of training, and finished with dedicated high resolution training. Previews are generated mostly at 1536x1024 or 1024x1536.

1280 resolutions. E.g. 1280x1280, 1536x1024, 1024x1536, etc.

1024 resolutions. E.g. 1024x1024, 896x1152, 1152x896, etc.

30-50 steps, CFG 4-5.

Same samplers as recommended for the base model work, I like to use:

  • er_sde: the recommended default for 30-50 steps.

  • sa_solver_pece: can converge with good detail in 15-20 steps.

Prompting

Like the base model, this model is trained on Danbooru-style tags, natural language captions, and combinations of tags and captions.

Tag order

[quality/meta/safety tags] [character] [series] [artist] [1girl/1boy/1other etc] [general tags]

Mostly the same order as the base model, only the [1girl/1boy/other etc] groups position is towards the end in this models dataset.

[quality/meta/safety tags] [character] [series] [artist] tag groups are also not shuffled, so their order may have some influence on generations.

Quality and Aesthetic tags

Human score based: masterpiece, best quality, very aesthetic, aesthetic

The very aesthetic and aesthetic tags are where this model diverges from the base, with the intent these can be used to guide the model toward a different aesthetic - a kind of house model bias.

Meta tags

absurdres, official art, etc

Styles

painterly, chiaroscuro, ligne claire, flat color, no lineart, blending, etc

traditional media, oil painting \(medium\), watercolor \(medium\), etc

Known Limitations & Issues:

Concept Bleeding

Some character/outfit details and concept bleeding is noticeable when using short prompts.

Longer tag strings and natural language prompts describing appearance should help somewhat with this.

Intent for future training is to find the right balance to converge faster on new data while preserving more of the existing knowledge.

Recognitions

  • Thanks to CircleStone Labs for the Anima Preview base model.

  • Thanks to tdrussell of CircleStone Labs for the diffusion-pipe trainer.

  • Thanks to bluvoll for support using their fork of diffusion-pipe.

  • Thanks to narugo1992 and the deepghs team for open-sourcing various training sets, image processing tools, and models.

License

This model is released under the same license as the base model.

See the base model for details of the CircleStone Labs Non-Commercial License.

Built on NVIDIA Cosmos