Prompting in Anima Preview 3 — Using Natural English as Your Foundation

This isn't a rejection of booru tags

This guide isn’t about telling you to stop using booru tags or saying they’re “wrong.” Tags absolutely work, and for a lot of people they’re faster, more controllable, and easier to debug. If that’s your workflow, keep using it.

What I’m sharing here is simply my own process — what worked for me after testing Anima Preview 3 extensively. I personally get more consistent and intentional results using natural English with a structured style block, especially when leveraging Qwen as a text encoder.

Anima can handle both approaches. You’re not locked into one or the other, and mixing tags with natural language is completely valid too.

I'm speaking from my perspective and what works in my workflow.

Anima isn't exactly like Illustrious

Booru tags aren't the best way to prompt in Anima Preview 3.

They work. You can absolutely get good images with them. But if you're treating them as your main method, you're basically letting the model guess what you want instead of actually telling it.

After a bunch of testing, the biggest change for me was switching to straight natural English prompts — especially when I pair them with a solid style block to leverage Qwen capabilites as a text encoder. That's when things stopped feeling random and started feeling intentional.

What's actually wrong with tags?

Tags are just pattern triggers. They're great at kicking off the usual stuff:

character types
clothing
common anime tropes

But the second you want something more specific, they fall apart. They don't handle:

how things relate to each other in space
what the mood should feel like
how the scene should be composed
what actually matters in the image

So you end up stacking tags like:

1girl, sitting, window, cup, happy, sunlight, room

And yeah… you'll get something. But it's not really your image. It's just a pile of signals the model tries to make sense of. That's why tag prompts often feel inconsistent or generic.

Writing prompts like you actually mean it

What changed everything for me was treating the prompt like I'm describing a scene to another person, not feeding keywords into a parser.

Instead of listing things, I describe what's happening.

Tag version: 1girl, sitting, window, cup, happy, sunlight, room

Natural language version: A quiet indoor scene of a woman sitting beside a sunlit window, holding a cracked porcelain cup in both hands. She gazes outside with a happy expression. Soft light spills across the room, casting long shadows and emphasizing the stillness of the moment.

Same idea — but now the model actually understands what the focus is, how the elements relate, and what kind of feeling I'm going for. You're not hinting anymore. You're directing.

Style blocks — this is where it gets good

This is the part I don't see enough people using properly.

A style block isn't just "throw some style words at the end." It's a set of instructions for how the image should be rendered. Not what's in the scene — but how everything looks once it's there.

I treat my prompts like two layers:

Scene block (changes every time): what's happening, who's in it, where it takes place, the mood
Style block (stays consistent): how lines are drawn, how color is handled, how textures behave, how the composition feels

Here's one:

Stylized graphic-novel illustration blending clean ink linework with watercolor splatter textures. Uses a limited warm earth palette with sharp red accents and muted metallic tones, combining cel-shaded forms with painterly washes.

That's not fluff. It's doing real work. It's telling the model: keep edges clean and inked, stay controlled with color, mix hard shading with painterly texture, and prioritize dynamic composition and negative space.

Once you start thinking of it like that, everything clicks.

Why Qwen makes this even better

It's just better at understanding actual language instead of just tokens. So when you give it a full scene plus a detailed style block, it keeps the relationships intact, doesn't drop half your intent, and applies the style across the whole image — not just random parts. It feels less like you're fighting the model and more like you're guiding it.

You don't need weights to control style strength

One thing that surprised me: you don't really need to mess with weights to control how strong the style comes through. You can do it just by how you write.

If your style block is very specific, very constrained, and very intentional → the output locks into that identity hard.

If it's shorter and looser → you get more variation and freedom.

So instead of tweaking numbers, just adjust how "opinionated" your style description is.

Something I realized along the way

A lot of people lean on finetunes or LoRAs to get a certain look. And yeah, they work. But they also lock you into someone else's aesthetic.

With a strong style block, you're doing something different — you're building your own style in plain language and letting the base model interpret it. It's way more flexible, and honestly, way more interesting.

Final note

If you like the results you're getting with tags, that's totally fine. But if things feel inconsistent, generic, or out of your control, try this:

Take one of your usual prompts and rewrite it entirely in natural English. Then add a proper style block.

You'll feel the difference immediately.