scenes
: Descriptions of scenes you want generated, separated by ||
. Each scene can contain multiple prompts, separated by |
. See for details on scene specification syntax and usage examples.
scene_prefix : Prompts prepended to the beginning of each scene.
scene_suffix : prompts appended to the end of each scene.
interpolation_steps
: Number of steps to use smoothly transitioning from the last scene at the start of each scene.
steps_per_scene
: Total number of steps to spend rendering each scene. Should be at least interpolation_steps
. Along with save_every
, this will control the total length of an animation.
direct_image_prompts
: Paths or urls of images that you want your image to look like in a literal sense, along with weight_mask
and stop
values, separated by |
.
Apply masks to direct image prompts with path or url of image:weight_path or url of mask
For video masks it must be a path to an mp4 file.
init_image : Path or url to an image that will be used to seed the initialization of the image generation process. Useful for creating a central focus or imposing a particular layout on the generated images. If not provided, random noise will be used instead
direct_init_weight
: Defaults to init_image:direct_init_weight
as a direct_image_prompt
. Supports weights, masks, and stops.
semantic_init_weight
: Defaults to [init_image]:direct_init_weight
as a prompt to each scene in scenes
. Supports weights, masks, and stops.
:::{important} Since this is a semantic prompt, you still need to put the mask in [
]
to denote it as a path or url, otherwise it will be read as text instead of a file.
:::
width, height
: Image size. Set one of these
pixel_size
: Integer image scale factor. Makes the image bigger. Set to
smoothing_weight
: Makes the image smoother. Defaults to
image_model : Select how your image will be represented. Supported image models are:
- Limited Palette - Use CLIP to optimize image pixels directly, constrained to a fix number of colors. Generally used for pixel art.
- Unlimited Palette - Use CLIP to optimize image pixels directly
- VQGAN - Use CLIP to optimize a VQGAN's latent representation of an image
vqgan_model
: Select which VQGAN model to use (only considered for image_model: VQGAN
)
random_initial_palette
: If checked, palettes will start out with random colors. Otherwise they will start out as grayscale. (only for image_model: Limited Palette
)
palette_size
: Number of colors in each palette. (only for image_model: Limited Palette
)
palettes
: total number of palettes. The image will have palette_size*palettes
colors total. (only for image_model: Limited Palette
)
gamma
: Relative gamma value. Higher values make the image darker and higher contrast, lower values make the image lighter and lower contrast. (only for image_model: Limited Palette
).
hdr_weight
: How strongly the optimizer will maintain the gamma
. Set to image_model: Limited Palette
)
palette_normalization_weight
: How strongly the optimizer will maintain the palettes' presence in the image. Prevents the image from losing palettes. (only for image_model: Limited Palette
)
show_palette
: Display a palette sample each time the image is displayed. (only for image_model: Limited Palette
)
target_pallete : Path or url of an image which the model will use to make the palette it uses.
lock_pallete : Force the model to use the initial palette (most useful from restore, but will force a grayscale image or a wonky palette otherwise).
animation_mode : Select animation mode or disable animation. Supported animation modes are:
- off
- 2D
- 3D
- Video Source
sampling_mode
: How pixels are sampled during animation. nearest
will keep the image sharp, but may look bad. bilinear
will smooth the image out, and bicubic
is untested :)
infill_mode : Select how new pixels should be filled if they come in from the edge.
- mirror: reflect image over boundary
- wrap: pull pixels from opposite side
- black: fill with black
- smear: sample closest pixel in image
pre_animation_steps
: Number of steps to run before animation starts, to begin with a stable image.
steps_per_frame
: number of steps between each image move.
frames_per_second
: Number of frames to render each second. Controls how
direct_stabilization_weight
: Keeps the current frame as a direct image prompt. For Video Source
this will use the current frame of the video as a direct image prompt. For 2D
and 3D
this will use the shifted version of the previous frame. Also supports masks: weight_mask.mp4
.
semantic_stabilization_weight
: Keeps the current frame as a semantic image prompt. For Video Source
this will use the current frame of the video as a direct image prompt. For 2D
and 3D
this will use the shifted version of the previous frame. Also supports masks: weight_[mask.mp4]
or weight_mask phrase
.
depth_stabilization_weight
: Keeps the depth model output somewhat consistent at a VERY steep performance cost. For Video Source
this will use the current frame of the video as a semantic image prompt. For 2D
and 3D
this will use the shifted version of the previous frame. Also supports masks: weight_mask.mp4
.
edge_stabilization_weight
: Keeps the images contours somewhat consistent at very little performance cost. For Video Source
this will use the current frame of the video as a direct image prompt with a sobel filter. For 2D
and 3D
this will use the shifted version of the previous frame. Also supports masks: weight_mask.mp4
.
flow_stabilization_weight
: Used for animation_mode: 3D
and Video Source
to prevent flickering. Comes with a slight performance cost for Video Source
, and a great one for 3D
, due to implementation differences. Also supports masks: weight_mask.mp4
. For video source, the mask should select the part of the frame you want to move, and the rest will be treated as a still background.
video_path
: path to mp4 file for Video Source
frame_stride
: Advance this many frames in the video for each output frame. This is surprisingly useful. Set to
reencode_each_frame
: Use each video frame as an init_image
instead of warping each output frame into the init for the next. Cuts will still be detected and trigger a reencode.
flow_long_term_samples
: Sample multiple frames into the past for consistent interpolation even with disocclusion, as described by Manuel Ruder, Alexey Dosovitskiy, and Thomas Brox (2016). Each sample is twice as far back in the past as the last, so the earliest sampled frame is
translate_x
: Horizontal image motion as a function of time
translate_y
: Vertical image motion as a function of time
translate_z_3d
: Forward image motion as a function of time animation_mode:3D
)
rotate_3d
: Image rotation as a quaternion animation_mode:3D
)
rotate_2d
: Image rotation in degrees as a function of time animation_mode:2D
)
zoom_x_2d
: Horizontal image zoom as a function of time animation_mode:2D
)
zoom_y_2d
: Vertical image zoom as a function of time animation_mode:2D
)
lock_camera
: Prevents scrolling or drifting. Makes for more stable 3D rotations. (only for animation_mode:3D
)
field_of_view
: Vertical field of view in degrees. (only for animation_mode:3D
)
near_plane
: Closest depth distance in pixels. (only for animation_mode:3D
)
far_plane
: Farthest depth distance in pixels. (only for animation_mode:3D
)
:::{admonition} Experimental Feature As of 2022-04-24, this section describes features that are available on the 'test' branch but have not yet been merged into the main release :::
input_audio : path to audio file.
input_audio_offset
: timestamp (in seconds) where pytti should start reading audio. Defaults to 0
.
input_audio_filters : list of specifications for individual Butterworth bandpass filters.
For technical details on how these filters work, see: Butterworth Bandpass Filters
variable_name
: the variable name through which the value of the filter will be referenced in the weight
expression of the prompt. Subject to rules of python variable naming.
f_center : The target frequency of the bandpass filter.
f_width : the range of frequencies about the central frequency which the filter will be responsive to.
order : the slope of the frequency response. Default is 5. The higher the "order" of the filter, the more closely the frequency response will resemble a square/step function. Decreasing order will make the filter more permissive of frequencies outside of the range strictly specified by the center and width above. See https://en.wikipedia.org/wiki/Butterworth_filter#Transfer_function for details.
:::{admonition} Example: Audio reactivity specification
scenes:"
a photograph of a beautiful spring day:2 |
flowers blooming: 10*fHi |
coloful sparks: (fHi+fLo) |
sun rays: fHi |
forest: fLo |
ominous: fLo/(fLo + fHi) |
hopeful: fHi/(fLo + fHi) |
"
input_audio: '/path/to/audio/source.mp3'
input_audio_offset: 0
input_audio_filters:
- variable_name: fLo
f_center: 105
f_width: 65
order: 5
- variable_name: fHi
f_center: 900
f_width: 600
order: 5
frames_per_second: 30
Would create two filters named fLo
and fHi
, which could then be referenced in the scene specification DSL to tie prompt weights to properties of the input audio at the appropriate time stamp per the specified FPS.
:::
file_namespace : Output directory name.
allow_overwrite
: Check to overwrite existing files in file_namespace
.
display_every : How many steps between each time the image is displayed in the notebook.
clear_every : How many steps between each time notebook console is cleared.
display_scale
: Image display scale in notebook.
save_every
: How many steps between each time the image is saved. Set to steps_per_frame
for consistent animation.
backups
: Number of backups to keep (only the oldest backups are deleted). Large images make very large backups, so be warned. Set to all
to save all backups. These are used for the flow_long_term_samples
so be sure that this is at least Video Source
mode.
show_graphs : Display graphs of the loss values each time the image is displayed. Disable this for local runtimes.
approximate_vram_usage : Currently broken. Don't believe its lies.
ViTB32, ViTB16, RN50, RN50x4... : Select which CLIP models to use for semantic perception. Multiple models may be selected. Each model requires significant VRAM.
learning_rate : How quickly the image changes.
reset_lr_each_frame : The optimizer will adaptively change the learning rate, so this will thwart it.
seed : Pseudorandom seed. Using a fixed seed will make your process more deterministic, which can be useful for comparing how change specific settings impacts the generated images
cutouts : The number of cutouts from the image that will be scored by the perceiver. Think of each cutout as a "glimpse" at the image. The more glimpses you give the perceptor, the better it will understand what it is looking at. Reduce this to use less VRAM at the cost of quality and speed.
cut_pow
: Should be positive. Large values shrink cutouts, making the image more detailed, small values expand the cutouts, making it more coherent.
cutout_border
: Should be between
border_mode
: how to fill cutouts that stick out over the edge of the image. Match with infill_mode
for consistent infill.
- clamp: move cutouts back onto image
- mirror: reflect image over boundary
- wrap: pull pixels from opposite side
- black: fill with black
- smear: sample closest pixel in image
gradient_accumulation_steps
: How many batches to use to process cutouts. Must divide cutouts
evenly, defaults to gradient_accumulation_steps
may permit you to generate images without reducing the cutouts setting. Setting this higher than
models_parent_dir
: Parent directory beneath which models will be downloaded. Defaults to ~/.cache/
, a hidden folder in your user namespace. E.g. the default storage location for the AdaBins model is ~/.cache/adabins/AdaBins_nyu.pt