feat(initial): initial commit

This is the very first version of my LLM tools for ComfyUI. DCO-1.1 Signed-off-by: Patrick Wagstrom <[email protected]>
pridkett · Aug 7, 2024 · 7c89d8e · 7c89d8e
commit 7c89d8e
Show file tree

Hide file tree

Showing 6 changed files with 295 additions and 0 deletions.
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1 @@
+openai.key
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2024 Patrick Wagstrom
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
@@ -0,0 +1,57 @@
+ComfyUI-llm-tools
+==================
+
+Patrick Wagstrom &lt;[email protected]&gt;
+
+August 2024
+
+Overview
+========
+
+This project is inspired by [this Reddit post, where /u/lazyspock used ChatGPT 4o to describe pictures and then fed that to Flux for image generation](https://old.reddit.com/r/StableDiffusion/comments/1elb3mp/got_pictures_from_pexelscom_asked_chatgpt_4o_to/). Being inspired and having some free time on my hands, I decided to see if I could turn that into a node for ComfyUI, so you can use GPT-4o to generate your prompts and then do whatever else you want to do with it.
+
+Example Output
+==============
+
+I used an image of the [Village Weaver bird from Wikipedia](https://en.wikipedia.org/wiki/Village_weaver#/media/File:Village_weaver_(Ploceus_cucullatus_cucullatus)_male_with_leaf.jpg) as my example image. Using that and the default prompt, created this output image below:
+
+![Output Image](docs/output.png)
+
+Nodes
+=====
+
+Currently this package provides four different nodes.
+
+* **OpenAI Vision**: The reason why I started this in the first place. Given an image, an API key, model name, and prompt, it will invoke the GPT 4o model to execute the prompt. My normal use case for this is to experiment with feeding Flux LLM generated prompts.
+
+* **Load Environment Variable:** I needed a way to load the API key into a ComfyUI workflow. This seemed like the easiest way for me. I store my API key in the `OPENAI_API_KEY` environment variable when I run ComfyUI, making it easy to grab the variable and pipe it into the `OpenAI Vision` node.
+
+* **Image Dimensions:** This wasn't strictly needed for this package, but I wanted to find a way to ensure that I was generating images with the same aspect ratio as the original image. Normally what I do is I use the Constrain Image node from ComfyUI-Custom-Scripts to make sure that no dimension is greater than 1024. I then use the `Image Dimensions` node to grab the width and height and pipe those into the `EmptySD3LatentImage` node.
+
+* **Side by Side Images:** Finally, I wanted a way to visualize the original and generated image side-by-side. This node takes a single input base image and a potential array of images and generates composite images with the left side as the base image and the right side as the generated image.
+
+Example Workflow
+================
+
+You can see all of these in action by dragging [this example output PNG](docs/output.png) into ComfyUI. It's a modified version of the default Flux UI that should bring in all the nodes for you. Note, you must have `OPENAI_API_KEY` set as an environment variable with your key for this to work.
+
+Default Prompt
+==============
+
+I'm not 100% sure if this is the best prompt, but here's what I've found works pretty well. I've noticed that GPT-4o often times does not like identifying celebrities, so even if Flux could recreate them, it might not have the name of the celebrity to go from. That's not a bad thing, but considering it missed things like "Queen Elizabeth II" in some of my testing, that's a little non-ideal.
+
+```
+Describe the image as a prompt than an AI image generator could use to generate a similar image. Be detailed. Note if the image is a photograph, diagram, illustration, painting, etc. Provide details about paintings including the style of art and possible artist. Describe attributes of the photograph, such as being award winning, nature photography, portrait, snapshot, etc. Make note of specific brands, logos, locations, scenes, and individuals if it is possible to identify them. Describe any text in the image - including the size, location, color, and style of the text. Do not include any superfluous text in the output, such as headers or statements like "Create an image" or "This image describes" - the generation model does not need those.
+```
+
+Caveats / Bugs / etc
+====================
+
+This is my first time creating nodes for ComfyUI, so it's likely that I made some silly mistakes and that these things are not well fleshed out. I may eventually do more of that as I play with ComfyUI more in the future.
+
+License
+=======
+
+Copyright (c) 2024 Patrick Wagstrom
+
+Licensed under the terms of the MIT license
diff --git a/__init__.py b/__init__.py
@@ -0,0 +1,8 @@
+from .nodes.nodes import *
+
+NODE_CLASS_MAPPINGS = {
+    "OpenAI Vision": OpenAIVision,
+    "Load Environment Variable": LoadEnvironmentVariable,
+    "Image Dimensions": ImageDimensions,
+    "Side by Side Images": SideBySideImage,
+}
diff --git a/docs/output.png b/docs/output.png
diff --git a/nodes/nodes.py b/nodes/nodes.py
@@ -0,0 +1,208 @@
+import base64
+import torch
+import numpy as np
+import requests
+import os
+from PIL import Image
+from io import BytesIO
+from typing import Tuple, List
+
+DEFAULT_PROMPT = """Describe the image as a prompt that an AI image generator could use to generate a similar image. Be detailed. Note if the image is a photograph, diagram, illustration, painting, etc. Provide details about paintings including the style of art and possible artist. Describe attributes of the photograph, such as being award winning, nature photography, portrait, snapshot, etc. Make note of specific brands, logos, locations, scenes, and individuals if it is possible to identify them. Describe any text in the image - including the size, location, color, and style of the text. Do not include any superfluous text in the output, such as headers or statements like "Create an image" or "This image describes" - the generation model does not need those."""
+
+def resize_image(image: Image, max_size: Tuple[int, int] = (512, 512)) -> Image:
+    """
+    Resize the given image while retaining the aspect ratio.
+    Args:
+        image (Image): The image to be resized.
+        max_size (Tuple[int, int], optional): The maximum size for the image as a tuple of (max_width, max_height). Defaults to (512, 512).
+    Returns:
+        Image: The resized image.
+    """
+    # Rest of the code...
+    # Get the current size of the image
+    original_width, original_height = image.size
+
+    # Check if the image is already within the desired size
+    if original_width <= max_size[0] and original_height <= max_size[1]:
+        return image
+
+    # Calculate the new size while retaining the aspect ratio
+    aspect_ratio = original_width / original_height
+    if aspect_ratio > 1:
+        # Landscape orientation
+        new_width = max_size[0]
+        new_height = int(max_size[0] / aspect_ratio)
+    else:
+        # Portrait orientation
+        new_width = int(max_size[1] * aspect_ratio)
+        new_height = max_size[1]
+
+    # Resize the image
+    resized_image = image.resize((new_width, new_height))
+
+    return resized_image
+
+def pil2tensor(image: Image) -> torch.Tensor:
+    """
+    Converts a PIL image to a PyTorch tensor.
+
+    Args:
+        image: A PIL image object.
+
+    Returns:
+        A PyTorch tensor representing the image.
+
+    """
+    return torch.from_numpy(np.array(image).astype(np.float32) / 255.0).unsqueeze(0) 
+
+def tensor2pil(image: torch.Tensor) -> Image:
+    """
+    Converts a tensor image to a PIL image.
+
+    Parameters:
+        image (torch.Tensor): The input tensor image.
+
+    Returns:
+        PIL.Image: The converted PIL image.
+    """
+    return Image.fromarray((image.squeeze().numpy() * 255).astype(np.uint8))
+
+def tensor2base64(image: torch.Tensor, resize: bool = True) -> str:
+    """
+    Converts a tensor image to a base64 encoded string.
+    Args:
+        image (torch.Tensor): The input tensor image.
+        resize (bool, optional): Whether to resize the image to a maximum size of 512x512. Defaults to True.
+    Returns:
+        A base64 encoded string representation of the image.
+    """
+    image = tensor2pil(image)
+
+    if resize:
+        image = resize_image(image)
+
+    buffer = BytesIO()
+    image.save(buffer, format="JPEG")
+
+    image_bytes = buffer.getvalue()
+    base64_string = base64.b64encode(image_bytes).decode('utf-8')
+
+    return base64_string
+
+
+class LoadEnvironmentVariable:
+    @classmethod
+    def INPUT_TYPES(cls):
+        return {"required": {       
+                    "env_var": ("STRING", {"multiline": False, "default": "OPENAI_API_KEY"}),
+                    }
+                }
+
+    RETURN_TYPES=("STRING",)
+    FUNCTION = "get_env_var"
+    CATEGORY = "LLMs"
+
+    def get_env_var(self, env_var):
+        return (os.environ[env_var],)
+
+class OpenAIVision:
+    @classmethod
+    def INPUT_TYPES(cls):
+        return {"required": {   
+                    "image": ("IMAGE",),
+                    "api_key": ("STRING", {"forceInput": True}),  
+                    "prompt": ("STRING", {"multiline": True, "default": DEFAULT_PROMPT}),
+                    "model": ("STRING", {"multiline": False, "default": "gpt-4o-mini"}),
+                    }
+                }
+
+    RETURN_TYPES=("STRING",)
+    FUNCTION = "invoke_gpt"
+    CATEGORY = "LLMs"
+
+    def invoke_gpt(self, image, api_key, prompt, model):
+        # Getting the base64 string
+        base64_image = tensor2base64(image)
+
+        headers = {
+            "Content-Type": "application/json",
+            "Authorization": f"Bearer {api_key}"
+        }
+
+        payload = {
+            "model": model,
+            "messages": [
+                {
+                "role": "user",
+                "content": [
+                    {
+                    "type": "text",
+                    "text": prompt
+                    },
+                    {
+                    "type": "image_url",
+                    "image_url": {
+                        "url": f"data:image/jpeg;base64,{base64_image}"
+                    }
+                    }
+                ]
+                }
+            ],
+            "max_tokens": 500
+        }
+
+        response = requests.post("https://api.openai.com/v1/chat/completions", headers=headers, json=payload)
+
+        print(response.json())
+        try:
+            return (response.json()["choices"][0]["message"]["content"],)
+        except:
+            return ("An error occurred.",)
+
+class ImageDimensions:
+    @classmethod
+    def INPUT_TYPES(cls):
+        return {"required": {   
+                    "image": ("IMAGE",),
+                    }
+                }
+
+    RETURN_TYPES=("INT", "INT")
+    FUNCTION = "get_image_dimensions"
+    CATEGORY = "Image Processing"
+
+    def get_image_dimensions(self, image):
+        image = tensor2pil(image)
+        return (image.width, image.height)
+
+class SideBySideImage:
+    @classmethod
+    def INPUT_TYPES(cls):
+        return {"required": {   
+                    "base_image": ("IMAGE",),
+                    "images": ("IMAGE",),
+                    }
+                }
+
+    RETURN_TYPES=("IMAGE",)
+    FUNCTION = "side_by_side"
+    CATEGORY = "Image Processing"
+    OUTPUT_IS_LIST = (True,)
+
+    def side_by_side(self, base_image, images) -> List[torch.Tensor]:
+        results = []
+        base_img = tensor2pil(base_image)
+
+        for (batch_number, image) in enumerate(images):
+            image2 = tensor2pil(image)
+
+            # Resize images to the same height
+            new_width = base_img.width + image2.width
+            new_height = max(base_img.height, image2.height)
+            new_image = Image.new("RGB", (new_width, new_height))
+
+            new_image.paste(base_img, (0, 0))
+            new_image.paste(image2, (base_img.width, 0))
+            results.append(pil2tensor(new_image))
+
+        return (results,)