This part three of a series of articles I will write on this subject. This one is about Image to Image and Prompt Assist
You can find part one here:
workflows-a-beginners-tutorial-and-hands-on-walkthrough-part-1
You can find part two here:
workflows-a-beginners-tutorial-and-hands-on-walkthrough-part-2
You can find part four here:
workflows-a-beginners-tutorial-and-hands-on-walkthrough-part-4
You can find part five here:
workflows-a-beginners-tutorial-and-hands-on-walkthrough-part-5
Introduction:
I love building stuff. I'm an engineer and a carpenter and now build waterparks for a living. How things work and creating is my passion. I create workflows that get used by a lot of people and I get asked a lot of questions, especially about ComfyUi in general and how to build with them, so I thought I'd take the time to put together a few articles over the next few (weeks, month, however long) to help those understand the basics, the move on to more advanced workflows.
In this part, I will go through the following:
Image to Image
Sizing/ Resizing explained
practical application
Image encoding
Setting up for I2I
Prompt Assist
The fundamentals
Using Florence
Using an LLM version (Qwen)
Joining it with your prompt
Working with Images:
Start by downloading the "Example 1.json" and open it in ComfyUi
Make sure you set the KSAMPLER "control after generate" to Fixed and do not change it for this entire exercise
Verify you have a checkpoint loaded. For this example, I used juggernaut-xl
For this exercise, let's use the prompt "photorealistic image of a squirrel eating a plate of spaghetti on a sunday in church with a fork"
Make sure the height and width are set to 1024 x 1024 in the empty latent node (in purple)
Hit run and our little friend should pop out:

Pay attention to the numbers at the bottom of the image. This is your image size (width x height)
Resizing an Image:
During your journey down the rabbit hole I call Image Generation, there will often be times that you will need to resize your image. This can be in the form of shrinking, growing, or upscaling. We will focus on resizing and discuss upscaling later.
Open the workflow and connect the "resize Image V2" node to the "load image" and "preview image" nodes (empty latent node moved so you could see the connection)

💡bypass the KSampler (above this area) for this exercise by right clicking and choosing bypass or clicking on it and hitting Crtl+b so you don't keep generating during this example.
Load the image you generated earlier
Note the size (512 x 512). Hit "run". a 512x 512 version of the image is generated
Width and Height:
Sometimes when you are going through several operations, you will need to use the same width and height for consistency.
Size matters. The biggest issue I see with bad images is the size they are generated in or mismatched sizing. All models are trained on certain sizes or ratios. Get to know those sizes for optimal image generation. Generating in the wrong size forces the model to have to guess what to do. For example, using a 9:16 ratio in SDXL will stretch a torso out. In ZIT it will cause the image to clone itself or add objects. When using masks, they will misalign, or when upscaling, the image will strech or skew. Pay attention to this at each and every step!
There are several ways to keep your sizing consistent.

Instead of using an "empty latent image" use a node that has width and height connections
Use the connections on the resize node
Use a get resolution node (my favorite)
These connections can then be hooked directly into the length and width inputs on any node that has that option.
💡Pro Tip: width and height are just integer numbers. They can be manipulated with simple math and controlled numerous ways. Below are some examples I use to modify them

Understanding Sizing methods:

when an image needs to be resized, it does it in several ways. Most above (strech, resize, crop, nearest-exact) are fairly obvious. Pad edge just fills in the edge with a pad_color, or adds pixels or blur. Let's discuss upscale-methods as this is important. Here's what Google has to say about the different upscalers:
Key Differences and Usage Examples:
Bilinear (Average 4 pixels):
Usage: Fast, real-time scaling; good for simple resizing, fast previewing, or upscaling where smoothing is preferred.
Result: Smooth, slightly blurry, fast processing.
Bicubic (Average 16 pixels):
Usage: Best "all-around" option for general downscaling and upscaling, especially for screen captures or gaming graphics.
Result: Sharp and smooth, better edge preservation than bilinear.
Lanczos (Average 36+ pixels/Sinc-based):
Usage: Highest quality downscaling for high-resolution video/photos, particularly effective when scaling down to significantly smaller sizes.
Result: Extremely sharp with high detail preservation, but higher CPU usage.
Obviously, Lanczos is best unless working with very large images.
Image Encoding (I2I)
Understanding Latents & Noise
Watch This short video for an explanation of what a latent image
Make sure you set the KSAMPLER "control after generate" to Fixed and do not change it for this entire exercise
Using an existing Image as a latent
⚠️ it is important to understand that each model handles this process very differently. Some models are not affected at all (like Ernie or Anima), SDXL models do a good job using it as reference, and models like Flux Klein_9b oe Qwen use conditioning nodes to extract the information from the image. Basically, do not expect it to mimic every part of the image.
👋 If you are wondering how to get an exact image (i.e. strict pose, shape, face, etc.), that is either controlnet, IP adapter, or masking. We will talk about those in a different article.
Practical I2I example:
Lets use it in our example to better explain how encoding of an existing Image works:
Change the prompt to: "photorealistic image of a cat eating a plate of spaghetti on a sunday in church with a fork" and hit run

The image is similar, but not much like the original squirrel image.
Lets use I2I

Locate the "VAE Encode" node in the center of the latent area
Hook the image up to it
go to the checkpoint and hook up the VAE (note: it should still remain hooked up to the "vae decode")
hook the latent up to the Ksampler
Hit run

Look at our friend now. Notice how I2I copied the style and composition of the image, but not the image itself? Even the fork is still upside down. This is what I2I does.
💡Take some time and play with other prompts. Hook the empty latent up and go back and forth to better understand what it does.
Prompt Assist:
This section will focus on the prompt assist/ extraction methods from an image
Start by downloading the "Example 2.json" and open it in ComfyUi

What is prompt assist?
Basically, an LLM analyzes yor image and writes a prompt based on it.
It is important to note that this is not metadata extraction, a method where the original prompt is taken from the data embedded in the image.
This works on almost every image, but understand that just like looking through dense fog or at something far away, the less detail or quality, the less accurate the prompt will be. That is why it is important to use both a good quality image and look at the text output of the prompt generator.
Practical Prompt Assist Example:
Florence:

There are several different models and settings for this
Download and load Florence:
Several versions of PromtGen. All work well. The larger the version, the more memory it takes to process.
💡I'm a huge fan of Florence because once it downloads the model the first time (first use), it stays local and runs very quickly (as compared to other LLMS)
Settings:
task: detailed and more detailed caption: good for natural language mnodels (Fluz, Zit)
task: prompt_gen_mixed_caption: good all around. best suted ror SDXL. Illustrious, Pony, etc.
Here's the result:

Qwen

Hook the image and the response up accordingly
use 2b-instruct (the other models are heavy).
💡Tip: if accuracy is your thing this node excels at it. change the setting to a higher VL instruct model, use 8-bit, and "ultra detailed description". Pack a lunch though. it takes a while to process 🥪
Set it to 4-bit
It downloads the first time, so expect a wait.
Heres the result:

Note the difference between the two LLMs. Look at the image and read the prompts.
Adding your own to the prompt (joining Strings)
sometimes you want to add your own prompt (or not have to disconnect every time you want to write your own. That's where the "join strings" node comes into place.
Note: There is also a "text concat" series of nodes. I prefer this method, but that is personal preference.

Use either Qwen or Florence.
disconnect it from the reroute and hook it into the second connection of the "join strings" node
Hook the join strings node into the other reroute where the LLM was hooked up.
Join strings: The delineater is what seperates the two strings. Put a space or comma there.
In the prompt area, type "on a red table"
hit run
Here is the result. Note how my prompt I added turned the table red yet the original prompt remained

Summary:
We have gone over how to generate images using existing images by means of I2I as well as Prompts. I hope this article was helpful.
Please comment and let me know what you think
Instagram: https://www.instagram.com/synth.studio.models/
Buy me a☕ https://ko-fi.com/lonecatone
This represents Many of hours of work. If you enjoy it, please 👍like, 💬 comment , and feel free to ⚡tip 😉

