Mike de GeofroySoftware Engineer

Deploying a ComfyUI Workflow on a Serverless Runpod Worker

About two months ago, I started working on a project to run a ComfyUI workflow on a serverless platform (runpod.io). My goal was to use a combination of models, custom nodes, and workflows that originally worked on a basic ComfyUI server setup.

  /blog/photo_2024-10-26 19.46.44.jpeg

Step 1: Setting Up Reproducible Builds

The first challenge was creating reproducible builds. This meant packaging all nodes and models (including custom ones) together. At the start, I manually fetched every node and model using a shell script with wget commands. This allowed me to boot up an image from the Runpod registry, run the commands, and prepare the instance for usage.

image.png

Roadblock: Heavy Images

The image sizes exceeded 50GB, taking a long time to install. To address this, I moved the models to a Runpod volume that could be mounted on any instance I boot up.

image.png

After installing the required models, I found that the versions of ComfyUI and ComfyUI-Manager in the Runpod image were outdated. To fix this, I added a small shell script to update them automatically.

Deploying on Serverless

While researching solutions, I discovered this project:

GitHub - blib-la/runpod-worker-comfy: ComfyUI as a serverless API on RunPod

These folks provided great documentation and a Dockerfile for building your own image. Initially, I thought I could just drop my scripts into the Dockerfile and call it a day. However, the resulting Docker image was far too heavy.

To resolve this, I added a Runpod volume and updated the paths in the extra_model_paths.yaml file. I also found the snapshot feature in newer ComfyUI-Manager versions. A helpful pull request in the runpod-worker-comfy repository added support for loading custom nodes from snapshots, along with the required Python packages.

After taking a snapshot and updating the paths, I hit another snag: I couldn’t build the image locally on my M1 MacBook Pro. So, I rented a GPU instance from cloud.google.com. After several attempts, I finally built the image and pushed it to my DockerHub repository.

Before running it on a serverless endpoint (which can be difficult to debug), I tested the image on a standard Pod. If you’re doing this yourself, make sure to update the mount volume path to match the values in extra_model_paths.yaml.

image.png

The runpod-worker-comfy project conveniently provides an environment variable to simulate serverless endpoint behavior.

image.png

During testing, I discovered that insightface models couldn’t load from extra_model_paths.yaml. To fix this, I added a symlink to the start.sh file, pointing to my insightface models.

image.png

After this change, the workflow ran seamlessly.

Moving to Serverless

Once the workflow was running, I deployed it to a serverless endpoint and added my S3 bucket credentials.

image.png *Redacted for privacy.

However, there was an issue: the first few generations timed out.

runpod-worker-comfy provides an environment variable for polling max retries, but as noted in this issue, these variables fail to cast to integers when set via the Runpod configuration. I resolved this by hardcoding higher defaults and rebuilding the image.

Error waiting for image generation: '<' not supported between instances of 'int' and 'str'

image.png

Eventually, it worked… kind of!

Backend

With the Runpod endpoint set up to boot an image on request, generating output was possible—but impractical. Each time I wanted to generate an image, I had to push a large JSON file to the /run endpoint, like this:

image.png

To streamline this, I wrote a simple Flask REST API. It includes an upload endpoint to send the image to S3 and a generate endpoint that:

  1. Passes the source image to GPT-4o for a basic description of the person (needed for the workflow).
  2. Adds custom parameters like background color, aggressiveness, and body complexity.
  3. Builds the payload and forwards the request to the Runpod serverless endpoint.

This resulted in a clean endpoint for uploading images with generation parameters and a status endpoint for updates.

image.png

User Interface

Finally, I wrapped the API into a React-based user interface.

With some React magic and long polling, I created a neat interface to manage the serverless endpoint and run concurrent generations.


Overall, setting this up was a fun challenge, but there’s still plenty to improve, including cleaning up the code. I hope this helps anyone attempting something similar have an easier time!

More

Contact me on Telegram. You can see more of my code on GitHub.