home models images videos posts articles comics bounties challenges events updates shop

Flux/Zit 2x speedup for GTX 1080ti/ Pascal (ForgeUI)

Name: Flux/Zit 2x speedup for GTX 1080ti/ Pascal (ForgeUI)
Author: stonedbaby

Updated: May 5, 2026

tool

guide forge optimization fp8 flux

Download

1 variant available

Archive Other

463.69 KB

Verified: 4 days ago

Download (463.69 KB)

Details

Type

Other

Stats

Reviews

No reviews yet

Published

May 5, 2026

Base Model

Flux.2 Klein 9B

Hash

AutoV2

0019AEE162

About this version

default creator card background decoration

stonedbaby

The FLUX.1 [dev] Model is licensed by Black Forest Labs. Inc. under the FLUX.1 [dev] Non-Commercial License. Copyright Black Forest Labs. Inc.

IN NO EVENT SHALL BLACK FOREST LABS, INC. BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH USE OF THIS MODEL.

This is a custom drop-in CUDA kernel designed to bring older Pascal GPUs back to life when running heavy FP8 models in ForgeUI.

Since the Pascal architecture lacks Tensor Cores, PyTorch defaults to a painfully slow fallback path when handling FP8 weights. This mod intercepts that process, converting FP8 to INT8 on the fly and computing it using native __dp4a instructions.

The Result: Roughly 2x faster generation speed with zero visual quality loss.

And Pytorch profiler :
FP8 Stock

INT8

Tested Setup

Hardware: Titan X Pascal (sm_61)
Models tested: Z Image Turbo (ZiT) and Flux 2 Klein 9B (both in FP8)
UI: ForgeUI (Neo branch). It might work on the original main branch if it supports these models, but Neo is fully tested.

IMPORTANT WARNING

Do NOT use this mod if you are on a Turing, Ampere, Ada, or newer GPU (RTX 20xx / 30xx / 40xx). Starting with Turing, NVIDIA introduced Tensor Cores, meaning your native hardware path is already significantly faster than this implementation. This kernel is strictly a rescue patch for Pascal (sm_61) architecture!

Installation Instructions

No compilation required. Just follow these steps:

Go to your ForgeUI backend folder.
Find operations.py and make a backup of it (just in case you want to revert later).
Replace the original operations.py with my modified version.
Inside the backend folder, create a new folder named ext.
Inside ext, create another folder named zimage_ext (The path should look like this: backend/ext/zimage_ext/).
Drop the provided library file (.pyd) into the zimage_ext folder.

That's it! Restart your ForgeUI. Any FP8 models will now automatically convert and compute in INT8, giving your 10-series card a massive speed boost.