Perform lightspeed-stack migrations #235

omertuc · 2025-10-24T09:59:29Z

Background

lightspeed-stack does not currently handle database migrations on its own. This means that when we update lightspeed-stack to a new version that requires database schema changes, we need to handle the migrations ourselves. Otherwise lightspeed-stack will not start or will function wrong.

This happened to us recently when we upgraded lightspeed-stack to 0.3.0 and since they added a new column, list conversation calls were failing.

Solution

We will do the migrations using a new migrate.py file defined in this repo which we will add to our containers and run right before starting lightspeed-stack.

Other changes

We need to set the LIGHTSPEED_STACK_POSTGRES_SSL_MODE environment variable in the pod spec so that our migration script can connect to the database correctly.
We need to set the PostgreSQL data volume mount to :Z so that SELinux does not block the database from writing to it.
Added optional comments in the dev pod that can be uncommented to persist PostgreSQL data between pod restarts by using a hostPath volume. Also modified the run.sh script to map the PostgreSQL user inside the container to the current host user so that there are no permission issues when using a hostPath volume. This is conditional so that users who do not want to persist data do not suffer the slowdowns of using userns mapping.
We need to source the template-params.dev.env file in our run.sh script so that the LIGHTSPEED_STACK_POSTGRES_SSL_MODE variable is available when we run podman play kube.

Summary by CodeRabbit

New Features
- Database migrations run automatically on container startup.
- PostgreSQL SSL mode is now configurable via environment.
- Optional hostPath persistence for database data and host port exposure for Postgres.
- Pod deployment can remap container user IDs when hostPath persistence is enabled.
Documentation
- Deployment config updated with optional persistence and user-mapping instructions.

openshift-ci · 2025-10-24T09:59:36Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: omertuc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [omertuc]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai · 2025-10-24T09:59:43Z

Walkthrough

Adds an idempotent PostgreSQL migration script and runs it at container startup; updates container images to copy and invoke the migration, modifies pod spec for Postgres mounting and SSL mode, and alters deployment script to optionally remap user namespaces for hostPath volumes.

Changes

Cohort / File(s)	Summary
Container Entrypoint Updates `Containerfile.add_llama_to_lightspeed`, `Containerfile.assisted-chat`	Copies `migrate.py` into the image and replaces/updates `ENTRYPOINT` to run `python3.12 /app/migrate.py` before starting `src/lightspeed_stack.py`.
PostgreSQL Migration Script `migrate.py`	New script that retries connecting to Postgres (up to 30 attempts, 2s backoff) and performs an idempotent migration to add `topic_summary` text column to `lightspeed-stack.user_conversation` if the schema exists.
Pod Configuration `assisted-chat-pod.yaml`	Adds `LIGHTSPEED_STACK_POSTGRES_SSL_MODE` env var, exposes hostPort `5432`, changes `pgdata` mountPath to `/var/lib/pgsql/data:Z`, and includes a commented `hostPath` persistence option.
Deployment Script Enhancement `scripts/run.sh`	Sources `template-params.dev.env` and conditionally applies user namespace remapping (`--userns=keep-id:uid=26,gid=26`) and POSTGRES_USER_ID/POSTGRES_GROUP_ID when `assisted-chat-pod.yaml` contains a `hostPath` `pgdata` volume; otherwise runs original pod play command.

Sequence Diagram(s)

sequenceDiagram
    rect rgba(200,230,255,0.3)
    participant Container as Container Startup
    participant Migrate as migrate.py
    participant DB as PostgreSQL
    participant App as lightspeed_stack.py
    end

    Container->>Migrate: Run migrate.py (ENTRYPOINT)
    Migrate->>Migrate: Retry connect (30 attempts, 2s backoff)
    Migrate->>DB: Connect and check schema
    alt schema exists
        Migrate->>DB: ALTER TABLE add topic_summary IF NOT EXISTS
        DB-->>Migrate: OK
    else schema missing
        Migrate-->>Migrate: Skip migrations (fresh DB)
    end
    Migrate->>Migrate: Commit and cleanup
    Migrate-->>Container: Migration completed
    Container->>App: Start lightspeed_stack.py
    App-->>Container: App running

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

enable SSL and CA cert for lightspeed-stack #133 — Related changes to Postgres connection configuration and SSL/CA handling that may overlap with the new LIGHTSPEED_STACK_POSTGRES_SSL_MODE usage.
Bump lightspeed version to 0.3.0 #233 — Modifies Containerfile.add_llama_to_lightspeed; touches the same container build/entrypoint area as this change.

Suggested labels

lgtm

Suggested reviewers

eranco74
jhernand

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check	✅ Passed	The title "Perform lightspeed-stack migrations" directly and accurately captures the main objective of the pull request. The changeset's primary focus is implementing a migration mechanism for the lightspeed-stack database, which is demonstrated by the addition of the migrate.py script, updates to container startup sequences in both Containerfiles, and supporting infrastructure changes in the pod configuration and deployment script. The title is concise, specific, and clearly communicates the core change without vague language or unnecessary details, making it immediately understandable to anyone reviewing the commit history.

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (8)

Containerfile.assisted-chat (1)
6-6: Ensure clean signal handling: use exec for the final process.

Without exec, /bin/sh stays PID 1 and may swallow signals. Switch to exec so the app is PID 1.
-ENTRYPOINT ["/bin/sh", "-c", "python3.12 /app/migrate.py && python3.12 src/lightspeed_stack.py"]
+ENTRYPOINT ["/bin/sh", "-c", "python3.12 /app/migrate.py && exec python3.12 src/lightspeed_stack.py"]
assisted-chat-pod.yaml (3)

24-25: Avoid empty sslmode; define or default it.

If LIGHTSPEED_STACK_POSTGRES_SSL_MODE is unset, migrate.py passes an empty sslmode which can fail. Either ensure this env is always set, or let migrate.py default to "disable" for local dev.

114-114: Binding postgres to hostPort 5432 can conflict locally.

Intra-pod traffic doesn’t need hostPort. Unless you must access Postgres from the host, drop hostPort: 5432 to avoid bind conflicts with system Postgres.

117-117: Confirm :Z in mountPath is honored by podman play kube.

Kubernetes mountPath doesn’t accept :Z; podman run uses :Z as a volume option. Please verify podman play kube applies SELinux relabel via this syntax; otherwise use an alternative (e.g., pre-create labeled dir or appropriate securityContext).
scripts/run.sh (1)
27-27: Source template-params.dev.env only if present.

Avoid hard failure when the file is missing.
-set -a && source "$PROJECT_ROOT/template-params.dev.env" && set +a
+if [[ -f "$PROJECT_ROOT/template-params.dev.env" ]]; then
+  set -a && source "$PROJECT_ROOT/template-params.dev.env" && set +a
+fi
Containerfile.add_llama_to_lightspeed (2)
15-15: Use exec so the app becomes PID 1.

Improve signal handling and shutdown.
-ENTRYPOINT ["/bin/sh", "-c", "python3.12 /app/migrate.py && python3.12 src/lightspeed_stack.py"]
+ENTRYPOINT ["/bin/sh", "-c", "python3.12 /app/migrate.py && exec python3.12 src/lightspeed_stack.py"]
14-16: Install psycopg2 for migrate.py if not already present.

If the base doesn’t include psycopg2, add it here.
 RUN cd /app-root/ && python3.12 -m pip install .
+RUN python3.12 -m pip install --no-cache-dir psycopg2-binary==2.9.9
migrate.py (1)

1-14: Docstring nit: grammar and clarity.

Minor wording fix: “can be run multiple times” (not “ran”), and clarify that migrations run at every startup.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1638a68 and da74fd0.

📒 Files selected for processing (5)

Containerfile.add_llama_to_lightspeed (1 hunks)
Containerfile.assisted-chat (1 hunks)
assisted-chat-pod.yaml (2 hunks)
migrate.py (1 hunks)
scripts/run.sh (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

scripts/run.sh (1)

utils/ocm-token.sh (1)

export_ocm_token (29-38)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Red Hat Konflux / assisted-chat-test-image-saas-main-on-pull-request
GitHub Check: Red Hat Konflux / assisted-chat-saas-main-on-pull-request

🔇 Additional comments (1)

Containerfile.assisted-chat (1)

5-6: Verify or add psycopg2 to container image.

The review concern is valid: migrate.py imports psycopg2, but Containerfile.assisted-chat contains no explicit installation of it. Either confirm that the base image (quay.io/lightspeed-core/lightspeed-stack:0.3.0) includes psycopg2, or add an installation step before the ENTRYPOINT.

migrate.py

scripts/run.sh

# Background lightspeed-stack does not currently handle database migrations on its own. This means that when we update lightspeed-stack to a new version that requires database schema changes, we need to handle the migrations ourselves. Otherwise lightspeed-stack will not start or will function wrong. This happened to us recently when we upgraded lightspeed-stack to 0.3.0 and since they added a new column, list conversation calls were failing. # Solution We will do the migrations using a new `migrate.py` file defined in this repo which we will add to our containers and run right before starting lightspeed-stack. # Other changes - We need to set the `LIGHTSPEED_STACK_POSTGRES_SSL_MODE` environment variable in the pod spec so that our migration script can connect to the database correctly. - We need to set the PostgreSQL data volume mount to `:Z` so that SELinux does not block the database from writing to it. - Added optional comments in the dev pod that can be uncommented to persist PostgreSQL data between pod restarts by using a hostPath volume. Also modified the `run.sh` script to map the PostgreSQL user inside the container to the current host user so that there are no permission issues when using a hostPath volume. This is conditional so that users who do not want to persist data do not suffer the slowdowns of using userns mapping. - We need to source the `template-params.dev.env` file in our `run.sh` script so that the `LIGHTSPEED_STACK_POSTGRES_SSL_MODE` variable is available when we run `podman play kube`.

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (2)

migrate.py (2)

59-65: Add table existence check and query timeouts.

The migration assumes the user_conversation table exists if the schema exists, but lightspeed-stack might create the schema before creating tables, causing ALTER TABLE to fail on partially-initialized databases. Additionally, without timeouts, the migration could block indefinitely on locked tables.

Apply this diff:

-cur = conn.cursor()
-cur.execute(
-    'ALTER TABLE "lightspeed-stack"."user_conversation" ADD COLUMN IF NOT EXISTS topic_summary text'
-)
-conn.commit()
-cur.close()
+with conn:
+    with conn.cursor() as cur:
+        # Set timeouts to avoid indefinite blocking
+        cur.execute("SET lock_timeout = '5s'; SET statement_timeout = '15s';")
+        
+        # Check if table exists
+        cur.execute(
+            "SELECT to_regclass(%s)", 
+            ('"lightspeed-stack"."user_conversation"',)
+        )
+        if cur.fetchone()[0] is None:
+            print('Table "user_conversation" not found, skipping column migration', file=sys.stderr)
+        else:
+            cur.execute(
+                'ALTER TABLE "lightspeed-stack"."user_conversation" '
+                'ADD COLUMN IF NOT EXISTS topic_summary text'
+            )
+            print("Migration completed")
+
 conn.close()
-print("Migration completed")

27-42: Add sslmode default and connection timeout.

The connection loop lacks resilience:

sslmode will be None if LIGHTSPEED_STACK_POSTGRES_SSL_MODE is unset, which may cause connection failures.
No connect_timeout means each attempt could hang indefinitely.
Printing the full exception may expose sensitive environment details.

Apply this diff:

+sslmode = os.getenv("LIGHTSPEED_STACK_POSTGRES_SSL_MODE", "disable")
+connect_timeout = 5
+
 for _ in range(30):
     try:
         conn = psycopg2.connect(
             host=os.getenv("ASSISTED_CHAT_POSTGRES_HOST"),
             port=os.getenv("ASSISTED_CHAT_POSTGRES_PORT"),
             dbname=os.getenv("ASSISTED_CHAT_POSTGRES_NAME"),
             user=os.getenv("ASSISTED_CHAT_POSTGRES_USER"),
             password=os.getenv("ASSISTED_CHAT_POSTGRES_PASSWORD"),
-            sslmode=os.getenv("LIGHTSPEED_STACK_POSTGRES_SSL_MODE"),
+            sslmode=sslmode,
+            connect_timeout=connect_timeout,
         )
         break
     except psycopg2.OperationalError as e:
-        print("Waiting for Postgres...", e, file=sys.stderr)
+        print("Waiting for Postgres...", file=sys.stderr)
         time.sleep(2)

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between da74fd0 and 2d7f643.

📒 Files selected for processing (5)

Containerfile.add_llama_to_lightspeed (1 hunks)
Containerfile.assisted-chat (1 hunks)
assisted-chat-pod.yaml (2 hunks)
migrate.py (1 hunks)
scripts/run.sh (1 hunks)

🚧 Files skipped from review as they are similar to previous changes (1)

Containerfile.assisted-chat

🧰 Additional context used

🧬 Code graph analysis (1)

scripts/run.sh (1)

utils/ocm-token.sh (1)

export_ocm_token (29-38)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)

GitHub Check: Red Hat Konflux / assisted-chat-saas-main-on-pull-request
GitHub Check: Red Hat Konflux / assisted-chat-test-image-saas-main-on-pull-request

🔇 Additional comments (7)

assisted-chat-pod.yaml (2)

24-25: LGTM: SSL mode configuration for migration script.

This environment variable is correctly added to support the migration script's database connection.

117-117: LGTM: SELinux label for volume mount.

The :Z flag correctly allows the container to write to the volume by applying the appropriate SELinux label.
scripts/run.sh (2)
27-27: LGTM: Sources template parameters.

Correctly sources template-params.dev.env to make LIGHTSPEED_STACK_POSTGRES_SSL_MODE available for pod deployment.

36-46: Remove the review comment; the code is correct as-is.

The review comment misidentifies the yq tool in use. The project explicitly requires kislyuk's py-yq (pip install yq), not mikefarah's Go yq. kislyuk's py-yq reads YAML from stdin and outputs JSON by default, making the current pipeline syntax correct:
<"$PROJECT_ROOT/assisted-chat-pod.yaml" yq | jq '.spec.volumes[] | select(.name == "pgdata").hostPath != null' --exit-status
The proposed fix uses -o=json, which is mikefarah's yq syntax and is not supported by kislyuk's py-yq. The existing code already works as intended with the project's configured dependencies.

Likely an incorrect or invalid review comment.
Containerfile.add_llama_to_lightspeed (1)

14-15: LGTM: Migration integrated into startup sequence.

The migration script is correctly copied and executed before lightspeed_stack.py. The use of && ensures that if the migration fails, the main application won't start, preventing runtime errors from schema mismatches.

migrate.py (2)

1-19: LGTM: Clear documentation.

The docstring clearly explains the purpose, idempotency requirement, and current migrations.

45-56: LGTM: Schema existence check.

Properly checks if the schema exists before attempting migrations, correctly handling fresh database installations.

coderabbitai · 2025-10-24T11:25:11Z

assisted-chat-pod.yaml

          value: assisted-chat
      ports:
        - containerPort: 5432
+          hostPort: 5432


⚠️ Potential issue | 🟡 Minor

Consider security implications of exposing PostgreSQL port.

Exposing PostgreSQL on hostPort: 5432 makes the database accessible from the host network. This is acceptable for local development but ensure this configuration is not used in production or shared environments.

🤖 Prompt for AI Agents

In assisted-chat-pod.yaml around line 114, exposing PostgreSQL via hostPort: 5432 opens the DB to the host network; remove the hostPort mapping (or make it conditional via a dev-only overlay/helm value) and instead expose the DB as a ClusterIP service with containerPort set, using kubectl port-forward or a controlled Service (NodePort with firewall rules or LoadBalancer behind an ingress for controlled access) for non-local use; if you must keep hostPort for development, gate it behind an explicit dev flag and add a clear comment/README warning not to use that config in production.

assisted-chat-pod.yaml

openshift-ci · 2025-10-24T12:17:31Z

@omertuc: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/eval-test	`2d7f643`	link	false	`/test eval-test`

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

omertuc · 2025-10-27T11:16:39Z

/retest

eranco74

Just a nit
/lgtm

eranco74 · 2025-10-27T11:32:40Z

scripts/run.sh

+if <"$PROJECT_ROOT/assisted-chat-pod.yaml" yq | jq '.spec.volumes[] | select(.name == "pgdata").hostPath != null' --exit-status; then
+    # Map the PostgreSQL user (UID 26) inside the container to the current host user
+    # This allows the PostgreSQL container to write to host-mounted volumes without permission issues
+    POSTGRES_USER_ID=26
+    POSTGRES_GROUP_ID=26
+    podman play kube --build=false --userns=keep-id:uid=$POSTGRES_USER_ID,gid=$POSTGRES_GROUP_ID <(envsubst <"$PROJECT_ROOT"/assisted-chat-pod.yaml)
+else
+    podman play kube --build=false <(envsubst <"$PROJECT_ROOT"/assisted-chat-pod.yaml)
+fi


Try to replace this with:

securityContext: fsGroup: 26

in the assisted-chat-pod.yaml

OK tracking in #237

Didn't help

openshift-ci bot requested review from eranco74 and maorfr October 24, 2025 09:59

openshift-ci bot added the approved label Oct 24, 2025

coderabbitai bot reviewed Oct 24, 2025

View reviewed changes

migrate.py Show resolved Hide resolved

migrate.py Show resolved Hide resolved

scripts/run.sh Show resolved Hide resolved

omertuc force-pushed the ps branch from da74fd0 to 9b81bf6 Compare October 24, 2025 11:15

omertuc force-pushed the ps branch from 9b81bf6 to 2d7f643 Compare October 24, 2025 11:16

coderabbitai bot reviewed Oct 24, 2025

View reviewed changes

eranco74 reviewed Oct 27, 2025

View reviewed changes

openshift-ci bot assigned eranco74 Oct 27, 2025

openshift-ci bot added the lgtm label Oct 27, 2025

openshift-merge-bot bot merged commit 662cbf3 into rh-ecosystem-edge:main Oct 27, 2025
7 of 8 checks passed

omertuc mentioned this pull request Oct 27, 2025

Try to remove ID mapping from run script #237

Open

Perform lightspeed-stack migrations #235

Perform lightspeed-stack migrations #235

Uh oh!

Conversation

omertuc commented Oct 24, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Solution

Other changes

Summary by CodeRabbit

Uh oh!

openshift-ci bot commented Oct 24, 2025

Uh oh!

coderabbitai bot commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Oct 24, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

openshift-ci bot commented Oct 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

omertuc commented Oct 27, 2025

Uh oh!

eranco74 left a comment

Choose a reason for hiding this comment

Uh oh!

eranco74 Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

omertuc Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

omertuc Oct 27, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

omertuc commented Oct 24, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Oct 24, 2025 •

edited

Loading

openshift-ci bot commented Oct 24, 2025 •

edited

Loading