Intro
The goal of this tool is to help you easily convert those big models into something your measly machine can run.
I have successfully converted many Flux 2 Klein 9b (and a few 4b) models to run on my 8gb VRAM + 32gb RAM setup.
The script is the same at each version number (You can run the same one for Flux 2 Klein 4b and 9b models) - I just posted it to each to get more eyes on.
This script is built for Linux. Once run, it will make a directory in your home directory (usually something like /var/usr/home/username/gguf-convertor or ~/gguf-convertor will be the path - this is where it will setup and build the tooling).
Once it runs, you will have some tools in the setup directory, the entry point is gguf_convert.sh you will run it like: gguf_convert.sh path/to/model.safetensors
The default Quant is Q5_K_M - if you want a different quality, pass that as a third option: gguf_convert.sh path/to/model.safetensors Q8_0
The script will create a BF16 GGUF version, then Quantize to the level specified. You will need to have space to store the BF16 AND the Quantized version. For example, a Flux 2 Klein 9b will need 16gb for the BF16 and 6gb for the Q5_K_M.
Expectations are that you have Python 3 installed - possibly more. I built and tested this on Bazzite (Linux). You will likely need to feed this script to AI to help it run on your Mac or Windows machines - each rig is different.
Use / Installation
As noted, this is an INSTALL script. Meaning, you download it, then run it. It will build out a patched version of llama.cpp (locally, shouldn't mess with your system version if you have one installed - it patches around a bug when handling model weight conversions), set up a python environment for some of the conversion scripts, and then leave some core files behind for use in the script.
Run
./gguf_setup.shin your shell (you may need to runchmod +x gguf_setup.shto make it executable).Wait for the installs / setups
cd ~/gguf-convertor./gguf_convert.sh path/to/model.safetensorsoptionally pass a 3rd argument if you want a different Quant than Q5_K_MWait for the conversion
Move your new
model-Q5_K_M.gguffile to wherever you keep your models for you toolset
Troubleshooting / Warning
Use at your own risk! Just because it works for me, doesn't mean your setup will work exactly. Please read through the script or ask an AI if you have any worries.
I will not provide support here, as this was mostly a personal setup script. I wanted to share it, as I have seen others asking for Quants of models.
Feel free to post here and ask question if you run into trouble, but honestly - knowing some Python, a little bit about system admin, and how to copy-paste errors into an AI will get you going faster!
DEVELOPERS: You are not allowed to use this or the resulting scripts in your tools without permission - happy to approve open source scripts or tools (just ask and please give credit). This tool is not allow to be resold or bundled in a service or application that is for sale.


