Updated: May 9, 2026
characterThis ComfyUI workflow is designed for Z-Image Base image upscaling, detail refinement, and high-resolution restoration. It combines a traditional 4x upscale model with Z-Image Base latent refinement, Florence2 automatic captioning, tiled processing, and final image reconstruction. The goal is to turn a lower-resolution or softer image into a cleaner, sharper, and more detailed high-resolution result while keeping the original composition and overall visual identity stable.
This is not a simple one-click ESRGAN upscale workflow. It uses a multi-stage enhancement structure. First, the input image is enlarged with a classic upscale model. Then the image is scaled to a target megapixel size. After that, it is divided into tiles, automatically captioned with Florence2, refined through Z-Image Base, decoded with tiled VAE decoding, and finally stitched back into one complete image. This makes the workflow more useful for large images where direct full-frame processing may be unstable or too memory-heavy.
The workflow uses z_image_bf16.safetensors as the main Z-Image model, qwen_3_4b.safetensors as the text encoder, and ae.safetensors as the VAE. It also uses 4x_NMKD-Siax_200k.pth as the first-stage upscale model. This gives the workflow a hybrid design: the traditional upscaler provides fast resolution expansion, while Z-Image Base adds AI-driven detail reconstruction, texture polishing, and local refinement.
A key part of the workflow is the ImageUpscaleWithModel stage. This step uses the 4x_NMKD-Siax model to enlarge the input image before the Z-Image refinement stage. Traditional upscale models are useful because they preserve the original structure and provide a stable high-resolution base. However, pure traditional upscaling can sometimes look too smooth, too artificial, or lacking in new detail. That is why this workflow continues with a Z-Image refinement pass.
After the first upscale, the workflow uses ImageScaleToTotalPixels to bring the image to a target output size. In the included setup, the image is scaled toward a high megapixel target using Lanczos scaling. This gives users a predictable way to control final resolution without manually calculating width and height. It is useful for social media covers, Civitai showcase images, posters, product visuals, portrait enhancement, and high-resolution AI artwork output.
The workflow then uses TTP tile tools for large-image processing. TTP_Tile_image_size calculates tile size based on image dimensions, width factor, height factor, and overlap rate. TTP_Image_Tile_Batch splits the image into manageable tiles. This is important because high-resolution images can be too large to refine in one pass, especially on limited VRAM systems. Tiled processing allows the workflow to enhance large images while keeping memory usage more manageable.
Florence2 is used to automatically generate captions for the tiled images. The workflow includes Florence2ModelLoader and Florence2Run, using a caption task to describe the content of each tile. The generated captions are then passed into CLIPTextEncode as the positive prompt for Z-Image refinement. This is useful because each tile may contain different visual content. Automatic captioning helps the model understand what is inside each tile, instead of using one vague global prompt for the entire image.
This caption-guided refinement is one of the most practical parts of the workflow. For example, if one tile contains a face, another tile contains clothing, and another tile contains background lights, Florence2 can produce local descriptions that guide Z-Image to refine each region more appropriately. This helps improve details without forcing the whole image into one single prompt interpretation.
The Z-Image refinement stage uses VAEEncode, KSampler, and VAEDecodeTiled. The enlarged tile image is encoded into latent space, then processed by Z-Image Base with a low denoise setting. In the included setup, the KSampler uses 20 steps, CFG 3, Euler sampler, simple scheduler, and a denoise value around 0.25. This is a conservative refinement setting. It is designed to improve texture and clarity without heavily changing the original image.
Low denoise is important for upscale workflows. If denoise is too high, the model may redraw the image too aggressively and change faces, clothing, background details, or object structure. If denoise is too low, the result may not gain enough new detail. A value around 0.25 is a good starting point for faithful enhancement. Users can increase it slightly for stronger redraw or reduce it for more conservative preservation.
The workflow uses ModelSamplingAuraFlow with a shift setting to match the model’s sampling behavior. This helps the Z-Image Base route work correctly in the refinement stage. The negative prompt includes simple artifact suppression terms such as blurry, ugly, and bad. This keeps the refinement direction clean without overloading the model with excessive negative tags.
After each tile is processed, VAEDecodeTiled decodes the latent result. Tiled decoding helps reduce memory load and supports higher-resolution output. The tiles are then converted back into a batch and reconstructed with TTP_Image_Assy. The padding value helps reduce visible seams between tiles. This final assembly stage is important because tile-based workflows can produce edge artifacts if overlap and padding are not handled properly.
The workflow also includes Image Comparer. This allows users to compare the original upscaled image and the final reconstructed image with a slide comparison view. This is useful for checking whether the Z-Image refinement actually improved the result. Good upscale evaluation should look at face detail, hair texture, fabric clarity, edge sharpness, background stability, seam visibility, and whether the image identity has changed too much.
This workflow is suitable for creators who want more than basic resolution expansion. It is useful for AI-generated images that look slightly soft, screenshots that need enhancement, portraits that need more texture, fashion images that need sharper fabric detail, stage photos that need cleaner lights, product images that need better surface clarity, and Civitai examples that need high-resolution polish.
Main features:
- Z-Image Base upscale and refinement workflow
- Uses z_image_bf16.safetensors
- Qwen 3 4B text encoder support
- AE VAE support
- 4x_NMKD-Siax first-stage upscaling
- ImageScaleToTotalPixels for target megapixel control
- Florence2 automatic tile captioning
- TTP tiled image splitting
- Z-Image latent refinement per tile
- Low-denoise detail enhancement
- VAEDecodeTiled for high-resolution decoding
- TTP_Image_Assy tile reconstruction
- Padding and overlap support to reduce seams
- Image Comparer for before/after checking
- Suitable for high-resolution artwork, portraits, covers, and product images
Recommended use cases:
AI image upscaling, high-resolution restoration, portrait enhancement, anime artwork polishing, realistic photo refinement, product image cleanup, fashion image sharpening, social media cover enhancement, Civitai showcase image preparation, poster output, screenshot restoration, background detail recovery, texture improvement, and before/after comparison testing.
Suggested workflow:
Start by loading the image you want to upscale. Use an image with a clear subject and stable composition. The workflow can improve soft or lower-resolution images, but it cannot fully recover information from extremely damaged or heavily compressed images. A clean source image will always produce better results.
Run the first-stage upscale with the 4x_NMKD-Siax model. This creates a larger base image while preserving the overall structure. Then use ImageScaleToTotalPixels to control the final target resolution. If your image becomes too large for your GPU, reduce the megapixel target before running the Z-Image refinement stage.
Use the tiled processing section for large outputs. The workflow splits the image into tiles and processes them separately. If seams appear after reconstruction, increase overlap or padding. If memory usage is too high, use smaller tiles or reduce the final resolution. If details look inconsistent across tiles, try reducing denoise or using more stable caption settings.
Let Florence2 generate captions for the tiles. These captions help guide the Z-Image refinement process. You can review the generated text in the ShowText node. If the captions are inaccurate, you can manually replace or adjust the prompt. For professional output, checking captions is useful because wrong tile descriptions can cause unwanted texture changes.
Use low denoise for faithful enhancement. The included denoise setting around 0.25 is suitable for preserving the original image. If you want stronger detail reconstruction, increase denoise slightly. If faces or objects begin to change too much, reduce denoise and keep the prompt simpler.
Check the final output with Image Comparer. Compare the original enlarged image with the refined image. Look for improvements in sharpness, texture, local detail, and clarity. Also check whether the model introduced artifacts, changed identity, damaged hands or faces, or created visible tile seams.
For portrait images, focus on preserving facial identity and natural skin texture. For anime images, the workflow can be pushed slightly stronger because stylized linework often benefits from sharper redraw. For product images, keep denoise conservative so product shape and branding remain stable. For background-heavy images, inspect seams carefully after tile assembly.
This workflow is designed as a practical Z-Image Base high-resolution enhancement pipeline for ComfyUI users. It combines classic upscaling, target-size control, automatic captioning, tile-based refinement, low-denoise Z-Image polishing, and final reconstruction into one usable graph. It is especially useful for creators who need cleaner, sharper, and more publishable images for Civitai, RunningHub, YouTube thumbnails, Bilibili posts, product visuals, and AIGC content production.
🎥 YouTube Video Tutorial
Want to know what this workflow actually does and how to start fast?
This video explains what the tool is, how to launch the workflow instantly, and shares my core design logic — no local setup, no complicated environment.
Everything starts directly on RunningHub, so you can experience it in action first.
👉 YouTube Tutorial: https://youtu.be/JPA_qq5YusE
Before you begin, I recommend watching the video thoroughly — getting the full context helps you understand the tool faster and avoid common detours.
⚙️ RunningHub Workflow
Try the workflow online right now — no installation required.
👉 Workflow: https://www.runninghub.ai/post/2016768153547710466/?inviteCode=rh-v1111
If the results meet your expectations, you can later deploy it locally for customization.
🎁 Fan Benefits: Register to get 1000 points + daily login 100 points — enjoy 4090 performance and 48 GB super power!
📺 Bilibili Updates (Mainland China & Asia-Pacific)
If you’re in the Asia-Pacific region, you can watch the video below to see the workflow demonstration and creative breakdown.
📺 Bilibili Video: https://www.bilibili.com/video/BV1D96wBxECM/
☕ Support Me on Ko-fi
If you find my content helpful and want to support future creations, you can buy me a coffee ☕.
Every bit of support helps me keep creating — just like a spark that can ignite a blazing flame.
👉 Ko-fi: https://ko-fi.com/aiksk
💼 Business Contact
For collaboration or inquiries, please contact aiksk95 on WeChat.
🎥 YouTube 视频教程
想了解这个工作流到底是怎样的工具,以及如何快速启动?
视频主要介绍 工具定位、快速启动方法 和 我的构筑思路。
我们会直接在 RunningHub 上进行演示,让你第一时间看到实际效果。
👉 YouTube 教程: https://youtu.be/JPA_qq5YusE
开始前建议尽量完整地观看视频 —— 把握整体思路会更快上手,也能少走常见弯路。
⚙️ 在线体验工作流
现在就可以在线体验,无需安装。
👉 工作流: https://www.runninghub.ai/post/2016768153547710466/?inviteCode=rh-v1111
打开上方链接即可直接运行该工作流,实时查看生成效果。
如果觉得效果理想,你也可以在本地进行自定义部署。
🎁 粉丝福利: 注册即送 1000 积分,每日登录 100 积分,畅玩 4090 体验 48 G 超级性能!
📺 Bilibili 更新(中国大陆及南亚太地区)
如果你在中国大陆或南亚太地区,可以通过下方视频查看该工作流的实测效果与构思讲解。
📺 B站视频: https://www.bilibili.com/video/BV1D96wBxECM/
我会在 夸克网盘 持续更新模型资源:
👉 https://pan.quark.cn/s/20c6f6f8d87b
这些资源主要面向本地用户,方便进行创作与学习。

