Download
1 variant available
The FLUX.1 [dev] Model is licensed by Black Forest Labs. Inc. under the FLUX.1 [dev] Non-Commercial License. Copyright Black Forest Labs. Inc.
IN NO EVENT SHALL BLACK FOREST LABS, INC. BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH USE OF THIS MODEL.
This is a custom drop-in CUDA kernel designed to bring older Pascal GPUs back to life when running heavy FP8 models in ForgeUI.
Since the Pascal architecture lacks Tensor Cores, PyTorch defaults to a painfully slow fallback path when handling FP8 weights. This mod intercepts that process, converting FP8 to INT8 on the fly and computing it using native __dp4a instructions.
The Result: Roughly 2x faster generation speed with zero visual quality loss.

And Pytorch profiler :
FP8 Stock
INT8
Tested Setup
Hardware: Titan X Pascal (
sm_61)Models tested: Z Image Turbo (ZiT) and Flux 2 Klein 9B (both in FP8)
UI: ForgeUI (Neo branch). It might work on the original main branch if it supports these models, but Neo is fully tested.
IMPORTANT WARNING
Do NOT use this mod if you are on a Turing, Ampere, Ada, or newer GPU (RTX 20xx / 30xx / 40xx). Starting with Turing, NVIDIA introduced Tensor Cores, meaning your native hardware path is already significantly faster than this implementation. This kernel is strictly a rescue patch for Pascal (sm_61) architecture!
Installation Instructions
No compilation required. Just follow these steps:
Go to your ForgeUI
backendfolder.Find
operations.pyand make a backup of it (just in case you want to revert later).Replace the original
operations.pywith my modified version.Inside the
backendfolder, create a new folder namedext.Inside
ext, create another folder namedzimage_ext(The path should look like this:backend/ext/zimage_ext/).Drop the provided library file (
.pyd) into thezimage_extfolder.
That's it! Restart your ForgeUI. Any FP8 models will now automatically convert and compute in INT8, giving your 10-series card a massive speed boost.
