home models images videos posts articles comics bounties challenges events updates shop

ACE-Step 1.5 Psytrance LoRA Ver 2.0

Name: ACE-Step 1.5 Psytrance LoRA Ver 2.0
Rating: 5 (1 reviews)
Author: boulderdash666

Updated: May 15, 2026

style

music psytrance audio ace-step bassline

Download

1 variant available

SafeTensor

320.06 MB

Unverified: Scan requested

Download (320.06 MB)

This checkpoint includes a config file, download and place it along side the checkpoint.

Optional Files

Details

Type

LoRA

Stats

Reviews

Positive

(1)

Published

May 15, 2026

Base Model

ACE Audio

About this version

pretrained_model_path: "checkpoints/acestep-v15-xl-sft"

output_dir: "output/psytrance_lora_xl_sft"

# --- DATASET ---

dataset:

path: "PSY_MASTER_DATASET"

sample_rate: 44100

slice_duration: 30.0

# --- HYPERPARAMETER ---

train_batch_size: 2

gradient_accumulation_steps: 1

epochs: 30

learning_rate: 0.00007

lr_scheduler: "cosine"

lr_warmup_steps: 100

# --- LORA PARAMS ---

lora_rank: 64

lora_alpha: 32

target_modules: ["to_q", "to_k", "to_v", "to_out.0", "ff.net.0.proj", "ff.net.2"]

# --- OPTIMIERUNGEN ---

mixed_precision: "bf16"

gradient_checkpointing: true

optimizer: "adamw_8bit"

# --- SAVING ---

save_every_epochs: 5

checkpointing_steps: 4000

This is a high-quality Psytrance Audio LoRA trained on ACE-Step 1.5 SFT-XL. It is optimized for generating punchy rolling triplet basslines, crisp percussion, and sharp acid synth leads.

Training Details:

- Base Model: ACE-Step 1.5 XL SFT

- Dataset: 44.1 kHz Studio Masters (30-second seamless zero-crossing slices)

- Hyperparameters: Rank 64, Alpha 32, Learning Rate 7e-5

Recommended Inference Settings:

- Inference Steps: 50

- CFG Scale: 4.5 - 5.5

- LoRA Scale: 0.85 - 1.0

- Audio Length: Exactly 30.0 seconds per chunk

Example Prompt:

psytrance style, rolling bassline, acid squelch, metallic leads, high quality, 142 bpm, key of G# minor, studio master quality, clean professional mix

default creator card background decoration

boulderdash666

## Technical Guide: Training ACE-Step 1.5 LoRA for Psytrance on an RTX 3090## 1.

Dataset Preprocessing & Surgical Slicing

Standard audio slicing methods destroy the rhythm and phase alignment of dense audio

material like Psytrance. Follow these strict preprocessing rules:

* Zero-Crossing Slicing: Cuts must occur exactly at zero-crossing points ($0\text{ dB}$

amplitude) to avoid digital clicks, pops, and phase cancellation.

* Zero Fade/Crossfade Rule: Never use standard fade-ins or fade-outs. The diffusion model

will interpret this as a musical instruction and learn to fade the track out every 30 seconds.

* 1-Second Crossfade Overlap: Implement a 1-second crossfade overlap between chunks to

maintain continuity across sample boundaries.

* Fixed Chunk Length: Slice the source material into exact 30.0-second segments. This

captures complete musical phrases while fitting comfortably into the 24 GB VRAM limit.

* Format Constraints: Export all slices at 44.1 kHz, 16-bit or 24-bit PCM WAV. Avoid MP3

compression to prevent codec artifacts from muddying the high frequencies.

## 2. Text-Caption Tagging Strategy

Audio diffusion models require metadata to isolate tempo and key. Each audio slice requires

a matching .txt file with identical naming.

* BPM & Key Isolation: Explicitly tag the precise BPM and musical key (e.g., 142 bpm, G#

minor). This prevents the model from blending different tempos and scales into a dissonant

mix.

* Sub-Genre Descriptor: Start every caption with a unified anchor tag (e.g., psytrance track).

* Structural Elements: Document specific sonic elements present in that chunk (e.g., rolling

triplet bassline, punchy energetic kickdrum, sharp acid synth leads, rhythmic percussion,

crisp hi-hats).

* Quality Tokens: Append production quality tags at the end of the text file (e.g., studio

master quality, clean professional mix).

## 3. Training Hyperparameters & VRAM Optimization (RTX 3090)

To maximize the 24 GB VRAM of an RTX 3090 without triggering CUDA out of memory

errors, use these exact network dimensions and pipeline settings:

## Network Architecture (LoRA)

* LoRA Rank ($r$): 64 (Provides sufficient capacity to map distinct keys and tempos into

separate internal slots).

* LoRA Alpha: 32 (Ensures stable weight scaling).

* LoRA Dropout: 0.05 (Prevents overfitting while retaining rapid pattern recognition).

* Target Modules: ["to_q", "to_k", "to_v", "to_out.0", "ff.net.0.proj", "ff.net.2"]

## Optimization & Precision

* Mixed Precision: bf16 (Mandatory for modern GPU compute stability).

* Optimizer: bitsandbytes 8-bit AdamW (Compresses the optimizer states to halve VRAM

allocation).

* Gradient Checkpointing: True (Recomputes activations during the backward pass to save

massive amounts of VRAM).

* Hardware Allocation: Set num_workers=4, pin_memory=True, and

persistent_workers=True.

## Training Schedule

* Batch Configuration: Set train_batch_size: 2 and gradient_accumulation_steps: 2 (Creates

an effective total batch size of 4, ensuring smooth gradient updates for complex audio

signals).

* Learning Rate: 0.00007 ($7\cdot10^{-5}$) with a cosine scheduler and 100 warmup steps.

A lower learning rate preserves sharp transient structures like tight kick drums.

* Seed: Set to -1 (Random Seed) across later epochs to shuffle data blocks and improve

generalization.

## 4. Training Phases & Loss Graph Analysis

The training graph demonstrates a mathematically ideal convergence curve for a dense

audio dataset under a randomized training seed:

Loss

0.55 | \

0.50 | \

0.45 | \_________

0.40 | \________ [Plateau / Saturated Fine-Tuning]

0.35 |______________________

+-----------------------

3100 3300 3500 3700 Step

* Phase 1 (Epoch 0 - 30): Macro-Structure Acquisition: The initial loss drops rapidly from

$\sim0.60$ down to $\sim0.45$. The model identifies coarse structural features, including

noise floors, fundamental frequencies, and the main percussive grid.

* Phase 2 (Epoch 30 - 35): Mid-Frequency Stabilization: The curve forms a gentle slope

between step 3100 and 3400. The random data seed (-1) introduces acoustic variety, forcing

the optimizer to consolidate structural patterns across different BPM/Key signatures

simultaneously.

* Phase 3 (Step 3400 - 3800): Micro-Optimization & Transients: The Loss (smoothed) forms

a textbook plateau between $0.36$ and $0.38$. The raw loss values variance narrows down

significantly, occasionally hitting micro-troughs near $0.31$. This indicates that the model

has fully saturated its learning capacity for the dataset and is purely refining micro-details

like phase alignment and crisp transient sharpness. Pushing the model below $0.30$ is

highly discouraged as it triggers immediate acoustic degradation (overfitting).

## 5. Inference & Audio Generation Configuration

Once training concludes at Epoch 40, halt the script and configure the Inference tab using

these precise generation parameters:

* Inference Backend: Set to PyTorch (Do not use vLLM or Triton on native Windows

environments due to library compatibility issues).

* Base Model Path: Point to checkpoints/acestep-v15-xl-sft.

* LoRA Model Path: Load the target checkpoint (e.g., epoch_35 or epoch_40).

* LoRA Scale: 0.85 to 1.0 (Start at 0.85 to maintain flexibility; increase to 1.0 if the synthetic

output lacks the driving weight of the original data).

* Inference Steps: 50 (Provides clean diffusion generation without blurring the fast

transients).

* CFG Scale: 4.5 to 5.5 (Higher values force strict adherence to the prompt tags, lower

values add acoustic variation).

* Audio Length: Exact 30.0 seconds (Must match the training slice length; generating beyond

this window causes structural collapse).

* Target Generation Prompt: Feed the explicit tokens used during tagging to extract the

clean, isolated style:

A high-energy psychedelic trance track, 142 BPM, fast driving rolling bassline, punchy

energetic kickdrum, sharp acid synth leads, rhythmic percussion, crisp hi-hats, studio master

quality, clean professional mix