Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 25 additions & 23 deletions docs/docs/install/environment-variables.md
Original file line number Diff line number Diff line change
Expand Up @@ -149,29 +149,31 @@ Redis (Sentinel) URL example JSON before encoding:

## Machine Learning

| Variable | Description | Default | Containers |
| :---------------------------------------------------------- | :-------------------------------------------------------------------------------------------------- | :-----------------------------: | :--------------- |
| `MACHINE_LEARNING_MODEL_TTL` | Inactivity time (s) before a model is unloaded (disabled if \<= 0) | `300` | machine learning |
| `MACHINE_LEARNING_MODEL_TTL_POLL_S` | Interval (s) between checks for the model TTL (disabled if \<= 0) | `10` | machine learning |
| `MACHINE_LEARNING_CACHE_FOLDER` | Directory where models are downloaded | `/cache` | machine learning |
| `MACHINE_LEARNING_REQUEST_THREADS`<sup>\*1</sup> | Thread count of the request thread pool (disabled if \<= 0) | number of CPU cores | machine learning |
| `MACHINE_LEARNING_MODEL_INTER_OP_THREADS` | Number of parallel model operations | `1` | machine learning |
| `MACHINE_LEARNING_MODEL_INTRA_OP_THREADS` | Number of threads for each model operation | `2` | machine learning |
| `MACHINE_LEARNING_WORKERS`<sup>\*2</sup> | Number of worker processes to spawn | `1` | machine learning |
| `MACHINE_LEARNING_HTTP_KEEPALIVE_TIMEOUT_S`<sup>\*3</sup> | HTTP Keep-alive time in seconds | `2` | machine learning |
| `MACHINE_LEARNING_WORKER_TIMEOUT` | Maximum time (s) of unresponsiveness before a worker is killed | `120` (`300` if using OpenVINO) | machine learning |
| `MACHINE_LEARNING_PRELOAD__CLIP__TEXTUAL` | Comma-separated list of (textual) CLIP model(s) to preload and cache | | machine learning |
| `MACHINE_LEARNING_PRELOAD__CLIP__VISUAL` | Comma-separated list of (visual) CLIP model(s) to preload and cache | | machine learning |
| `MACHINE_LEARNING_PRELOAD__FACIAL_RECOGNITION__RECOGNITION` | Comma-separated list of (recognition) facial recognition model(s) to preload and cache | | machine learning |
| `MACHINE_LEARNING_PRELOAD__FACIAL_RECOGNITION__DETECTION` | Comma-separated list of (detection) facial recognition model(s) to preload and cache | | machine learning |
| `MACHINE_LEARNING_ANN` | Enable ARM-NN hardware acceleration if supported | `True` | machine learning |
| `MACHINE_LEARNING_ANN_FP16_TURBO` | Execute operations in FP16 precision: increasing speed, reducing precision (applies only to ARM-NN) | `False` | machine learning |
| `MACHINE_LEARNING_ANN_TUNING_LEVEL` | ARM-NN GPU tuning level (1: rapid, 2: normal, 3: exhaustive) | `2` | machine learning |
| `MACHINE_LEARNING_DEVICE_IDS`<sup>\*4</sup> | Device IDs to use in multi-GPU environments | `0` | machine learning |
| `MACHINE_LEARNING_MAX_BATCH_SIZE__FACIAL_RECOGNITION` | Set the maximum number of faces that will be processed at once by the facial recognition model | None (`1` if using OpenVINO) | machine learning |
| `MACHINE_LEARNING_RKNN` | Enable RKNN hardware acceleration if supported | `True` | machine learning |
| `MACHINE_LEARNING_RKNN_THREADS` | How many threads of RKNN runtime should be spinned up while inferencing. | `1` | machine learning |
| `MACHINE_LEARNING_MODEL_ARENA` | Pre-allocates CPU memory to avoid memory fragmentation | true | machine learning |
| Variable | Description | Default | Containers |
| :---------------------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------- | :-----------------------------: | :--------------- |
| `MACHINE_LEARNING_MODEL_TTL` | Inactivity time (s) before a model is unloaded (disabled if \<= 0) | `300` | machine learning |
| `MACHINE_LEARNING_MODEL_TTL_POLL_S` | Interval (s) between checks for the model TTL (disabled if \<= 0) | `10` | machine learning |
| `MACHINE_LEARNING_CACHE_FOLDER` | Directory where models are downloaded | `/cache` | machine learning |
| `MACHINE_LEARNING_REQUEST_THREADS`<sup>\*1</sup> | Thread count of the request thread pool (disabled if \<= 0) | number of CPU cores | machine learning |
| `MACHINE_LEARNING_MODEL_INTER_OP_THREADS` | Number of parallel model operations | `1` | machine learning |
| `MACHINE_LEARNING_MODEL_INTRA_OP_THREADS` | Number of threads for each model operation | `2` | machine learning |
| `MACHINE_LEARNING_WORKERS`<sup>\*2</sup> | Number of worker processes to spawn | `1` | machine learning |
| `MACHINE_LEARNING_HTTP_KEEPALIVE_TIMEOUT_S`<sup>\*3</sup> | HTTP Keep-alive time in seconds | `2` | machine learning |
| `MACHINE_LEARNING_WORKER_TIMEOUT` | Maximum time (s) of unresponsiveness before a worker is killed | `120` (`300` if using OpenVINO) | machine learning |
| `MACHINE_LEARNING_PRELOAD__CLIP__TEXTUAL` | Comma-separated list of (textual) CLIP model(s) to preload and cache | | machine learning |
| `MACHINE_LEARNING_PRELOAD__CLIP__VISUAL` | Comma-separated list of (visual) CLIP model(s) to preload and cache | | machine learning |
| `MACHINE_LEARNING_PRELOAD__FACIAL_RECOGNITION__RECOGNITION` | Comma-separated list of (recognition) facial recognition model(s) to preload and cache | | machine learning |
| `MACHINE_LEARNING_PRELOAD__FACIAL_RECOGNITION__DETECTION` | Comma-separated list of (detection) facial recognition model(s) to preload and cache | | machine learning |
| `MACHINE_LEARNING_ANN` | Enable ARM-NN hardware acceleration if supported | `True` | machine learning |
| `MACHINE_LEARNING_ANN_FP16_TURBO` | Execute operations in FP16 precision: increasing speed, reducing precision (applies only to ARM-NN) | `False` | machine learning |
| `MACHINE_LEARNING_ANN_TUNING_LEVEL` | ARM-NN GPU tuning level (1: rapid, 2: normal, 3: exhaustive) | `2` | machine learning |
| `MACHINE_LEARNING_DEVICE_IDS`<sup>\*4</sup> | Device IDs to use in multi-GPU environments | `0` | machine learning |
| `MACHINE_LEARNING_MAX_BATCH_SIZE__FACIAL_RECOGNITION` | Set the maximum number of faces that will be processed at once by the facial recognition model | None (`1` if using OpenVINO) | machine learning |
| `MACHINE_LEARNING_MAX_BATCH_SIZE__OCR` | Set the maximum number of boxes that will be processed at once by the OCR model | `6` | machine learning |
| `MACHINE_LEARNING_RKNN` | Enable RKNN hardware acceleration if supported | `True` | machine learning |
| `MACHINE_LEARNING_RKNN_THREADS` | How many threads of RKNN runtime should be spun up while inferencing. | `1` | machine learning |
| `MACHINE_LEARNING_MODEL_ARENA` | Pre-allocates CPU memory to avoid memory fragmentation | true | machine learning |
| `MACHINE_LEARNING_OPENVINO_PRECISION` | If set to FP16, uses half-precision floating-point operations for faster inference with reduced accuracy (one of [`FP16`, `FP32`], applies only to OpenVINO) | `FP32` | machine learning |

\*1: It is recommended to begin with this parameter when changing the concurrency levels of the machine learning service and then tune the other ones.

Expand Down
9 changes: 9 additions & 0 deletions machine-learning/immich_ml/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@
from uvicorn import Server
from uvicorn.workers import UvicornWorker

from .schemas import ModelPrecision


class ClipSettings(BaseModel):
textual: str | None = None
Expand All @@ -24,6 +26,11 @@ class FacialRecognitionSettings(BaseModel):
detection: str | None = None


class OcrSettings(BaseModel):
recognition: str | None = None
detection: str | None = None


class PreloadModelData(BaseModel):
clip_fallback: str | None = os.getenv("MACHINE_LEARNING_PRELOAD__CLIP", None)
facial_recognition_fallback: str | None = os.getenv("MACHINE_LEARNING_PRELOAD__FACIAL_RECOGNITION", None)
Expand All @@ -37,6 +44,7 @@ class PreloadModelData(BaseModel):
del os.environ["MACHINE_LEARNING_PRELOAD__FACIAL_RECOGNITION"]
clip: ClipSettings = ClipSettings()
facial_recognition: FacialRecognitionSettings = FacialRecognitionSettings()
ocr: OcrSettings = OcrSettings()


class MaxBatchSize(BaseModel):
Expand Down Expand Up @@ -70,6 +78,7 @@ class Settings(BaseSettings):
rknn_threads: int = 1
preload: PreloadModelData | None = None
max_batch_size: MaxBatchSize | None = None
openvino_precision: ModelPrecision = ModelPrecision.FP32

@property
def device_id(self) -> str:
Expand Down
14 changes: 14 additions & 0 deletions machine-learning/immich_ml/main.py
Original file line number Diff line number Diff line change
Expand Up @@ -103,6 +103,20 @@ async def load_models(model_string: str, model_type: ModelType, model_task: Mode
ModelTask.FACIAL_RECOGNITION,
)

if preload.ocr.detection is not None:
await load_models(
preload.ocr.detection,
ModelType.DETECTION,
ModelTask.OCR,
)

if preload.ocr.recognition is not None:
await load_models(
preload.ocr.recognition,
ModelType.RECOGNITION,
ModelTask.OCR,
)

if preload.clip_fallback is not None:
log.warning(
"Deprecated env variable: 'MACHINE_LEARNING_PRELOAD__CLIP'. "
Expand Down
5 changes: 5 additions & 0 deletions machine-learning/immich_ml/schemas.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,11 @@ class ModelSource(StrEnum):
PADDLE = "paddle"


class ModelPrecision(StrEnum):
FP16 = "FP16"
FP32 = "FP32"


ModelIdentity = tuple[ModelType, ModelTask]


Expand Down
8 changes: 5 additions & 3 deletions machine-learning/immich_ml/sessions/ort.py
Original file line number Diff line number Diff line change
Expand Up @@ -93,10 +93,12 @@ def _provider_options_default(self) -> list[dict[str, Any]]:
case "CUDAExecutionProvider" | "ROCMExecutionProvider":
options = {"arena_extend_strategy": "kSameAsRequested", "device_id": settings.device_id}
case "OpenVINOExecutionProvider":
openvino_dir = self.model_path.parent / "openvino"
device = f"GPU.{settings.device_id}"
options = {
"device_type": f"GPU.{settings.device_id}",
"precision": "FP32",
"cache_dir": (self.model_path.parent / "openvino").as_posix(),
"device_type": device,
"precision": settings.openvino_precision.value,
"cache_dir": openvino_dir.as_posix(),
}
case "CoreMLExecutionProvider":
options = {
Expand Down
Loading
Loading