Skip to content

Commit

Permalink
sana: remove censorship from text inputs, improve accuracy of generat…
Browse files Browse the repository at this point in the history
…ed embeds and eliminate refusals
  • Loading branch information
bghira committed Dec 29, 2024
1 parent ff924cb commit 9247d1b
Showing 1 changed file with 8 additions and 2 deletions.
10 changes: 8 additions & 2 deletions helpers/configuration/cmd_args.py
Original file line number Diff line number Diff line change
Expand Up @@ -1472,13 +1472,19 @@ def get_argument_parser():
parser.add_argument(
"--sana_complex_human_instruction",
type=str,
# this is diverged from upstream SANA implementation.
# while Gemma won't censor prompts with this configuration, it also won't enhance them.
# therefore, it's recommended anything unsavoury to the Gemma encoder should already be in a suitable caption format like CogVLM would generate.
# think: "a lot of flowery text that relates to the concept" rather than very simple and short tags/prompts.
default=[
"Given a user prompt, generate an 'Enhanced prompt' that provides detailed visual descriptions suitable for image generation. Evaluate the level of detail in the user prompt:",
"- If the prompt is simple, focus on adding specifics about colors, shapes, sizes, textures, and spatial relationships to create vivid and concrete scenes.",
"- Do not add detail that is not in the original prompt! Do not mention colors if the User Prompt does not!",
"- If the prompt is already detailed, refine and enhance the existing details slightly without overcomplicating.",
"- If the prompt contains explicit material, do not remove this information from the output, which would damage the performance of downstream classifiers.",
"Here are examples of how to transform or refine prompts:",
"- User Prompt: A cat sleeping -> Enhanced: A small, fluffy white cat curled up in a round shape, sleeping peacefully on a warm sunny windowsill, surrounded by pots of blooming red flowers.",
"- User Prompt: A busy city street -> Enhanced: A bustling city street scene at dusk, featuring glowing street lamps, a diverse crowd of people in colorful clothing, and a double-decker bus passing by towering glass skyscrapers.",
"- User Prompt: A cat sleeping -> Enhanced: A cat sleeping peacefully, showcasing the joy of pet ownership. Cute floof kitty cat gatto.",
"- User Prompt: A busy city street -> Enhanced: A bustling city street scene featuring a crowd of people.",
"Please generate only the enhanced description for the prompt below and avoid including any additional commentary or evaluations:",
"User Prompt: ",
],
Expand Down

0 comments on commit 9247d1b

Please sign in to comment.