Trained on 88(x5) images. Changes compared to v1:
no trigger needed anymore;
more accurate captions, thoroughly revised dataset, no style pollutions in captions - it's believed to result in better prompt following and milder "imagination";
no encoder training.
See showcase for examples on mixing it with other character and/or style LoRAs.