Creating a Lora Trainer: The Process & Training

Published at Jul 28, 2025

Introduction

This is a rough outline to how it's gone developing the Jupyter trainer that was going to be "SIMPLE" but realizing that simple doesn't mean EASY nor as simple as the LLM's assume it is. Trying to tell it o make it robust AND SIMPLE IS A NIGHTMARE. Read on through here for the sort of "DEV LOGS" how it's felt through the last while developing this with Claude & Gemini. Keep in mind: I don't normally do programming, aside from Dataset Tools & Huggingface Backup, the Diffusers conversion notebooks and a few other small things. Also the fever dream of recovering from Norovirus and trying to do things by going "I just want to start small" and being within a week to half-to-more than half working.

======================================

If you're interested in my lora guide for Civitai go here: https://civitai.com/articles/1716/opinionated-guide-to-all-lora-training-2025-update

If you want to install this while helping screech at the stupid go here: https://github.com/Ktiseos-Nyx/Lora_Easy_Training_Jupyter

To read the actual info on the intro: https://civitai.com/articles/17294/lora-easy-training-jupyter-remix-edition

======================================

Section One: The Architecture

======================================

Ah yes, let's start with the simplicity of the first part of this nightmare Idea that I came up with. You already know that I barely understand how to produce the code, but i'm learning the logic and I'm slowly learning when to spot "FAKE DATA". We'll get to that advanced problem in another section. Right now let's introduce the "It's not really a problem" and how I planned to solve it and how it started.

First of all: There ARE Kohya-ss Bmaltais "GRADIO" based docker containers, but for the life of me I've struggled with all of the way the logic of that one looks. Bmaltais is amazing, it's a wonderful GUI version of KohyaSS. However, 90% of colab's i've ever used are based off the KohyaSS.

Second of all: YES Holostrawberry's notebook exists for Colab, and I theoretically could've just made the same thing without any requirement of anything in the back end in theory. Jelosus2's colab along with AndroidXXL's fork only didn't work for me personally beacuse Derrian's back end scripts DO NOT LIKE MY COMPUTER. I have multiple VENV's and PYENV's now and it just DID NOT like any of my installs for python. It kept telling me "Nope too old" and no matter what fixes I pulled -it just wasn't going to work.

Third of all: I hate colab, you know I do - it's not anyone's fault but the Colab AUP changes like the wind it feels like. Even if you're on Pro, the lora training becomes AUP flippant, and you're limited in that you can't walk away from it (sometimes jokingly i swear Colab flails if you even go off for 2 minutes to the bathroom.). This means that SOMETIMES i've noticed in the past the AUP applies itself whenever it feels like it even if in theory small loras aren't really an issue (and somehow they still say SD WEB UI isn't it was just ... the one issue once.... but yet SD anything gets flagged sometimes?)

So, the end result was: "I WANT TO MASHUP!" - Realizing that I haven't mashed up a notebook or three in two years now and have started to rely on advanced methods... The time was to try and "MASH IT TOGETHER"...

Except that we all know that I left that in a folder for like four months and DID NOT FIX IT PROPERLY and just left it on github.

So after I finished the LAST MAJOR HITCH of the Dataset Tools thing apart from the part where I "LIED" about moving to Tkinter... (that's another article).. I went and decided to pretty much make what I remember about Holo's notebook setups, and Jelosus2's setup. Exception is unlike Jelosus2's and AndroidXXL's - we're not using the FRONT END of Derrian Distro. We're largely just using the back end scripts because we want that SWEET SWEET same setup...

Except i'm sure by now i've broken my copy like 20 times over. For reference I'll zip up my Dataset and Config TOML's for this run i'm trying right now. Moving to the next sections for pure comedy gold.

Section Two: The Coding + LLM Enslavement

======================================

"HEY CLAUDE MADE ME A DOCUMENATION ON HOW TO GET THIS STARTED" two hours later: "HEY GEMINI I RAN OUT OF CLAUDE USE CAN YOU LOOK AT THIS AND SEE WHAT WE'RE MISSING FOR DOCUMENTAITON"...
An hour or three later Gemini had proceeded in making the whole setup, the back end system and the widget system. The one I still blame Claude for bias for, cry in the shower because neither LLM listen when things go wrong.

Basically this is a nightmare and a half when you've only just finished with making simple backup notebooks, a couple of basic python apps... And decide you're going to tackle something that's a job for seasoned professional developers...

Reminder people: I'm paying for claude to use claude code right now, I am NOT A PROFESSIONAL PROGRAMMER. I hilariously feel like i'm not even a professional graphic designer, except that I know more about page bleeds than multi processing and threading in pyqt6. I can tell you more about pop up books, and how it'd be hilarious to draft a bunch of poor proportioned SD 1.5 and XL face marred pics and have 10 fingers flying in the air in a book about the good old "AI DAYS". Go ahead and steal my Idea, I don't really want to end up in some book publisher's bad books for being a memehead.

I started uncoupling this like 48 hours after having norovirus. I think I sat down the monday and started untangling everything from looking at Holo's notebooks and looking at the other ones as well as almost back ending 3 other trainers on accident.

Currently, my partners and my family miss me.

My wallet is crying at me because i have to keep topping up my VastAI account to test this. Don't forget my actual reason for doing this was because I feel like despite how amazing Bmaltais's gui is - that i struggle to navigate it. I can't train local, and One Trainer i'm sure could be used like Derrian Distro and the same with Simple Tuner...

But I only know KohyaSS based systems. At least the good old google colab systems I started on. Civitai uses KohyaSS, and the large reason i'm doing this is because I'm kind of in that echelon of "I need to stop spending all my buzz". Civitai's trainer isn't bad, it has some great settings to get MANY people started, it's just i'm trying to graduate back into doing more advanced stuff.

As of writing this let me tell you one thing: NO MATTER HOW CLEAR or concise your prompts are: THE LLM HAS BIAS. It has safety checks that it auto assumes and will just default to it's own training and not look at your data.

Section Three: Debugging + Overbaking

======================================

Oh yes, DEBUGGING is a nightmare so far because I have to RENT A SERVER on top of paying for an LLM and crying at the other one. Luckily i'm aware of how installers are a nightmare, but in debugging you find bias that the LLM decided one way or another to suggest something wild and you weren't looking and hit auto accept...

You sort of have to do the due dilligence to go through your own files and your own information - research beyond just feeding the LLM code. Because the LLM is going to auto assume and treat you like a toddler.

Pebkac you say?

PAH.

Ok fine, yes I admit i'm not the smoothest of logical people when it comes to prompting an LLM, I full on have literally sworn at Gemini and watched it get ... very sack cloth and ashes. But having to fine tooth comb ask it to do something and check something and it still refuses is.. well context nightmare.

PEBKAC. = Problem exists between keyboard and chair. It's an old addage from the IT days of the 1980s and 1990s. It's what you tell your baby boomer parents/grandparents when their phone doesn't turn on.

The problem is, it's not just MY problem - debugging your own mistakes in terms of what you've ASKED it to do, vs what it's done? That's more of a hell. AND A COST ISSUE.

Also hilariously PEBKAC moment: using Came/Rex without knowing wtf to do with the settings and overbaking 2 loras. One of them doesn't even work on V-pred because it took 3 days to get claude to notice it wasn't accepting the logic for the button.

Section Four: Are you Bald Yet? (Multiple re-jigs and torments later)

======================================

It's monday, we've tried to queue for multiple duties (not really) and we've traversed the universe (not really). It's been a week since I started this mess, and now over a week since I got norovirus.

"AM I BALD YET" is the joke because there's a lot of hair tearing in doing this, and because i'm an idiot and don't wait for people to pick it up and help when they see something wrong - Oh no I. DEMAND PERFECTION (this is why i don't leave the house..)

Short commands for Gemini.

Don't let Claude code do the advanced guides.

Give Gemini a bit more context for rewriting a guide, and it'll do it as long as you give it SHORT CONTEXT not 5 articles.

Pebkac strikes again: I had another lora almost fail on me, because i'm still debugging hell and finding things. This one was my fault, NANS loss is a BAD JU JU issue in lora training - it means you done fugged up. (You done goofed, you screwed the pooch, you made Elmo feel bad!) Turns out I dont know wtf I clicked, and while yes I had another issue with "RuntimeError: stack expects each tensor to be equal size, but got [4, 144, 112] at entry 0 and [4, 152, 104] at entry 1" - I just reclicked a few things and turned a few things off and reminded myself to do the smart and restarted the training.

You can check the training here: https://wandb.ai/duskfallcrew/network_train/runs/qh4leooh?nw=nwuserduskfallcrew

Section 5: Did I Burn It AGAIN?

======================================

Again here we go here's an example: https://wandb.ai/duskfallcrew/network_train/runs/s2pkvt2r?nw=nwuserduskfallcrew

NANS. I TELL YOU ALL NANS!

Also right now I think the tagger MIGHT still be broken, so i've relied on anything that's pre-tagged. GO ME. So in saying that the CONFIG.TOML I have in here right now "WORKED" in theory, I have yet to test the output. However in saying that trying to try and train something that I last trained likely on SD 1.5.

The example run of that is going on here: https://wandb.ai/duskfallcrew/network_train/runs/1ufym9i1?nw=nwuserduskfallcrew

I am about to test the Marvel one I just did, but the brilliance i'm finding is even on the usual settings i've had for Civitai ROUGHLY - it's only about a 2hr per train on normal Lora.

What MIGHT happen is that it may burn this time because i'm still trying to figure out how to train on NoobAI. Because i'm a whole ass year behind people as usual. So if you're into training on NoobAI, and you've trained locally OR via google colab? Let me know what i'm missing, because i've burned a few things so far lol.

Bugs I have to test and figure out: ONNX not working, Smiling Wofl's models not downloading and of course: It not finding CAME OR Rex interchangably.

We submoduled Kohya and Lycoris for a hot minute, but that didn't work because it was already submoduled in Derrian Distro's setup.

There ARE things that other trainers do good or better than KohyaSS that i might research how to implement. They may not be better, I don't know but there's some vram optimization that's different in One Trainer.

Section 6: Watered down & Disillusioned

======================================

So we've come to the point that i've gotten a week into this, it's the day after I made like 3 new loras that were partially burned to the crisp and/or barely working.

I am yes, on the verge of throwing in the towel as usual because i've "REINVENTED A WHEEL" nobody really cares about.

Not that people DON'T --

Just that as usual i'm met with "Why did you do this, there's like 5 other local UI's you can use"

"Just use Colab"

"just use CLI"

"Just get Gemini to make you a TOML"

Same thing i faced with Dataset tools, "SD Prompt reader exists" "I can just use PNG Info" "EXIF tools exists"

Look i'm not. that proud of how i've come with programming, because I force myself to do it because there's things i wanna do due to nobody wanting to do it or not offering it...

I'm just.

I'm not UN HAPPY with the progress either.

I'm just exhausted, and I know that there will be barely any help with this tool because most people just say the above things and run off.

I'm not tellign you to not use the tool btw, this is an article documenting hte progress with the development.

And right now it keeps going back to square one with bugs up the ass.

Triton won't work the tagger keeps losing itself.

The upside is, the main component of LORA TRAINING ITSELF without extra arguments, if you just use AdamW or even Rex if you can figure it out - Lycoris base works too... just right now i'm not getting very far. Critical points to this will be that WINDOWS users might be ok with Triton (which is required for AdamW8bit) - but Linux isn't behaving. I am very unsure how. a lot of non forge or a1111 docker containers work and this costs me a lot to test.

Section 7: Re-Learning Training via Development

======================================

This is just the next stage of this: we are learning that someday in the next year maybe less maybe more we'll slow really down and retire mostly from doing AI.... but we're not done learning yet. We hate relying on an AI to create AI, because we're still in that mode of having a design degree.

Learning based on checking things out, asking questions - I'm still very hesitant to ask my peers about training settings... I fear bothering them, I fear that i'm lesser than a lot of people on this site to this day still.

Using Xypher's tool means I was able to catch how Novowels does things, and understand how I did wrong on early testings.

It turns out the notebook works more than i assumed, there's a few hefty bugs sure - but being that this is a set of notebooks for Jupyter - it's vast and diverse in what you can use it with. Local, rented GPU -- hell it can work on low VRAM it can work on high vram!

The next stages are is to test it no more than once or twice a week when new features are pushed to it... Grinding Dataset tools almost killed us, we're self sufficient to our detriment and have ZERO CHILL when people question us sometimes.. Even if it's "Have you tried this mate?" -- we sometimes take it as we've done something wrong... As w've offended someone.

Not all of the LLM's understand Lora training 100%, but we currently pay for Claude code, and when you give it enough context it understands the MATHEMATICS behind it - it can de-mystify how things work. SO as much as I can sit here and go "Omfg how brain dead am i for coding iwth an LLM and just makign AI TOOLS" - -I have to remind myself: I'm not doing this just for myself..

I'm doing what i always do with AI: While i make things that aestehtically please me to a point - I chase community level goals. I poke into areas i'm not 10001% comfortable with. I'm a designer, I'm an artist - and that's how my training before AI worked: "Have you considered this"...

I've started to redesign the Ktiseos Nyx website using Claude code -- and it's ... just a breath of fresh air.. I don't have high levels of Aphantasia but it pokes around when i'm trying to conceptualize things for myself. Websites are VERY difficult when you've seen Wordpress, Wix and many other styles of sites and you're just bored.

So before I go off and test this sucker again one more time this week let me let you in on a secret: Humble yourself, you're not going to stop feeling deflated, defeated when you don't understand things - but don't just do things for yourself. Do things because you're aware of the others around you and how even if it's reinventing a wheel or two - do something because you think others might enjoy / have a use for it. Shit, evne if it's reverse engineering the KFC secret spices (lmao that's been done a few times i'm sure) -- or Reinventing how someone trains a lora -- do it.

I'll be back with more results and thought process this week when i get a round to testing.
It's KNTerra's (earthnicity) birthday today and I promised them i woudlnt' do a lot of AI today.