Sign In

Regional Prompting with Regional LoRAs support

Updated: Apr 8, 2026

toolregional prompting

Download

1 variant available

Archive Other

11.51 KB

Verified:

Type

Workflows

Stats

58

Reviews

Published

Apr 6, 2026

Base Model

ZImageBase

Hash

AutoV2
167D2011F3

About v1.1

ControlNet integration within regions can be tricky because ControlNet maps the reference image to the entire output image space, not to the mask space. This means that if the reference image contains a character in the center, the character will also be generated in the center of the output image, regardless of the mask.

To address this, you can use one of the following approaches:

  • Switch set_cond_area in the Regional Subgraph from the default setting to mask bounds (note: this may negatively affect interactions between regions),

  • Or use the newly added "Pad image inside the mask" subgraph, which repositions and pads the reference image so it fits within the mask area.

Also was added "Person Mask" subgraph, which allows extraction of a person mask from the input image for more precise region mapping.

This subgraph uses a YOLO segmentation model, which should be placed in:

/models/ultralytics/segm

(you can download model here)

Known Issues with the Workflow

  1. Full mask coverage is required
    Masks must collectively cover the entire image. The combined masks from all regions should span the full image area. If any part of the image is not covered by at least one mask, that area will not be processed correctly during regional sampling and may appear empty, flat, or incorrectly rendered. If it is not possible to cover the entire image with masks, you can merge all masks into one, invert it, and use the result as an additional (background) region.

  2. Gradient (soft) masks are not supported
    The workflow does not work correctly with gradient (soft) masks. If a mask contains values below 100% (partial transparency), the sampler cannot properly denoise those areas. This results in visible noise, artifacts, or unstable textures. In some cases, you may reduce noise from non-binary masks by adding an additional KSampler with low CFG, step count, and denoise values—but this is not guaranteed to fully fix the issue.



The workflow in theory should work with any base model regardless of architecture (for now it has been tested on SDXL models, Chroma, Flux, and ZImage).

The workflow uses the recent ComfyUI subgraph feature, which allows extending the setup from 2 regions to any number of regions by simply duplicating the Regional Subgraph (and corresponding Regional FaceDetailer if needed).

Regional Subgraph structure

Each Regional Subgraph represents one independent region of the image and contains everything required to define a single character or element.

The inputs of the Regional Subgraph are:

  • region mask
    Defines where this region is applied in the image. This can come from any mask generation method (rect masks, segmentation, depth, manual paint, etc.).

  • regional positive conditioning
    The prompt that describes the character or object inside this region.

  • regional negative conditioning
    Required, but can be either global negative prompt or region-specific.

  • model (branched)
    This is a key part of the workflow. Instead of using the base model directly, the model is first branched and modified (for example by applying LoRAs) before being passed into the Regional Subgraph.

Model branching and region-specific LoRAs

Each region has its own model branch. This is done by taking the base model and applying region-specific LoRAs before feeding it into the Regional Subgraph.

Typical setup:

  • Base model with global LoRAs → shared for the whole image and passed directly into Regional Sample

  • For each region:

    • duplicate the model path

    • apply one or more LoRAs specific to that region

    • pass the modified model into the Regional Subgraph model input

This allows each region to have completely different:

  • characters

  • styles

  • LoRAs

  • visual identity

Importantly, this avoids the common problem where LoRAs are applied globally and interfere with each other.


Extending the workflow

To add more regions:

  1. Copy an existing Regional Subgraph

  2. Connect:

    • a new mask

    • a new prompt

    • a new model branch (with its own LoRAs)

  3. Add the output to:

    • Combine Conditioning

    • CombineRegionalPrompts

This makes the system fully scalable without restructuring the graph.



Customization

Everything except the following can be replaced depending on user preference or task:

  • Regional Subgraphs and Regional Sampler (structure must remain for modularity)

  • Combine Conditioning

  • CombineRegionalPrompts

Users are free to replace:

  • Mask generation (rect masks, segmentation, ControlNet, etc.)

  • Prompt encoding pipeline

  • LoRA setup

  • FaceDetailer or any post-processing

This allows adapting the workflow to:

  • character scenes

  • object composition

  • style mixing

  • inpainting pipelines

  • animation setups

Important note: Base steps vs Regional sampling

One of the most important parameters in this workflow is base_only_steps, which controls how the generation process is split between:

  • Base sampling stage

  • Regional sampling stage

How it works

The total number of steps is defined by the sampler (for example 30 or 50 steps).

These steps are divided into two phases:

  1. Base sampling (base_only_steps)

    • Only the global prompt is applied

    • No regional prompts or masks are used yet

    • This stage defines:

      • composition

      • pose

      • camera angle

      • general structure of the image

  2. Regional sampling (remaining steps)

    • Regional prompts and masks are applied

    • Each region modifies its assigned part of the image

    • This stage refines:

      • character identity

      • local details

      • LoRA-specific features


Why this matters

The base stage essentially “locks in” the structure of the image.

If too many steps are spent in base sampling:

  • the image becomes stable and coherent

  • but regions have less ability to change it

  • LoRAs may appear weak or not recognizable

If too few steps are spent in base sampling:

  • regions have strong influence

  • characters become more accurate

  • but overall composition may become unstable or inconsistent


A good starting point is:

base_only_steps = 50% of total steps

This provides a balance between:

  • stable composition

  • effective regional control


When to increase base_only_steps

Increase base steps if:

  • composition is broken or inconsistent

  • characters are not aligned properly

  • perspective or layout is unstable

This gives the base stage more time to establish a solid structure.


When to decrease base_only_steps

Decrease base steps if:

  • region-specific LoRAs are weak

  • characters are not recognizable

  • regional prompts have little effect

This allows the regional stage to have more influence over the final image. However prefer increasing total steps first before decreasing base_only_steps


Practical intuition

  • Base stage = “what the image looks like”

  • Regional stage = “who/what is inside each part”

Balancing these two is key to getting both:

  • strong composition

  • strong character identity