Deploying a ComfyUI Workflow on a Serverless Runpod Worker
About two months ago, I started working on a project to run a ComfyUI workflow on a serverless platform (runpod.io). My goal was to use a combination of models, custom nodes, and workflows that originally worked on a basic ComfyUI server setup.
Step 1: Setting Up Reproducible Builds
The first challenge was creating reproducible builds. This meant packaging all nodes and models (including custom ones) together. At the start, I manually fetched every node and model using a shell script with wget
commands. This allowed me to boot up an image from the Runpod registry, run the commands, and prepare the instance for usage.
Roadblock: Heavy Images
The image sizes exceeded 50GB, taking a long time to install. To address this, I moved the models to a Runpod volume that could be mounted on any instance I boot up.
After installing the required models, I found that the versions of ComfyUI and ComfyUI-Manager in the Runpod image were outdated. To fix this, I added a small shell script to update them automatically.
Deploying on Serverless
While researching solutions, I discovered this project:
GitHub - blib-la/runpod-worker-comfy: ComfyUI as a serverless API on RunPod
These folks provided great documentation and a Dockerfile
for building your own image. Initially, I thought I could just drop my scripts into the Dockerfile
and call it a day. However, the resulting Docker image was far too heavy.
To resolve this, I added a Runpod volume and updated the paths in the extra_model_paths.yaml
file. I also found the snapshot feature in newer ComfyUI-Manager versions. A helpful pull request in the runpod-worker-comfy repository added support for loading custom nodes from snapshots, along with the required Python packages.
After taking a snapshot and updating the paths, I hit another snag: I couldn’t build the image locally on my M1 MacBook Pro. So, I rented a GPU instance from cloud.google.com. After several attempts, I finally built the image and pushed it to my DockerHub repository.
Before running it on a serverless endpoint (which can be difficult to debug), I tested the image on a standard Pod. If you’re doing this yourself, make sure to update the mount volume path to match the values in extra_model_paths.yaml
.
The runpod-worker-comfy project conveniently provides an environment variable to simulate serverless endpoint behavior.
During testing, I discovered that insightface models couldn’t load from extra_model_paths.yaml
. To fix this, I added a symlink to the start.sh
file, pointing to my insightface models.
After this change, the workflow ran seamlessly.
Moving to Serverless
Once the workflow was running, I deployed it to a serverless endpoint and added my S3 bucket credentials.
*Redacted for privacy.
However, there was an issue: the first few generations timed out.
runpod-worker-comfy provides an environment variable for polling max retries, but as noted in this issue, these variables fail to cast to integers when set via the Runpod configuration. I resolved this by hardcoding higher defaults and rebuilding the image.
Error waiting for image generation: '<' not supported between instances of 'int' and 'str'
Eventually, it worked… kind of!
Backend
With the Runpod endpoint set up to boot an image on request, generating output was possible—but impractical. Each time I wanted to generate an image, I had to push a large JSON file to the /run
endpoint, like this:
To streamline this, I wrote a simple Flask REST API. It includes an upload endpoint to send the image to S3 and a generate endpoint that:
- Passes the source image to GPT-4o for a basic description of the person (needed for the workflow).
- Adds custom parameters like background color, aggressiveness, and body complexity.
- Builds the payload and forwards the request to the Runpod serverless endpoint.
This resulted in a clean endpoint for uploading images with generation parameters and a status endpoint for updates.
User Interface
Finally, I wrapped the API into a React-based user interface.
With some React magic and long polling, I created a neat interface to manage the serverless endpoint and run concurrent generations.
Overall, setting this up was a fun challenge, but there’s still plenty to improve, including cleaning up the code. I hope this helps anyone attempting something similar have an easier time!