Skip to content

Conversation

@omertuc
Copy link
Member

@omertuc omertuc commented Oct 24, 2025

Background

lightspeed-stack does not currently handle database migrations on its own. This means that when we update lightspeed-stack to a new version that requires database schema changes, we need to handle the migrations ourselves. Otherwise lightspeed-stack will not start or will function wrong.

This happened to us recently when we upgraded lightspeed-stack to 0.3.0 and since they added a new column, list conversation calls were failing.

Solution

We will do the migrations using a new migrate.py file defined in this repo which we will add to our containers and run right before starting lightspeed-stack.

Other changes

  • We need to set the LIGHTSPEED_STACK_POSTGRES_SSL_MODE environment variable in the pod spec so that our migration script can connect to the database correctly.

  • We need to set the PostgreSQL data volume mount to :Z so that SELinux does not block the database from writing to it.

  • Added optional comments in the dev pod that can be uncommented to persist PostgreSQL data between pod restarts by using a hostPath volume. Also modified the run.sh script to map the PostgreSQL user inside the container to the current host user so that there are no permission issues when using a hostPath volume. This is conditional so that users who do not want to persist data do not suffer the slowdowns of using userns mapping.

  • We need to source the template-params.dev.env file in our run.sh script so that the LIGHTSPEED_STACK_POSTGRES_SSL_MODE variable is available when we run podman play kube.

Summary by CodeRabbit

  • New Features

    • Database migrations run automatically on container startup.
    • PostgreSQL SSL mode is now configurable via environment.
    • Optional hostPath persistence for database data and host port exposure for Postgres.
    • Pod deployment can remap container user IDs when hostPath persistence is enabled.
  • Documentation

    • Deployment config updated with optional persistence and user-mapping instructions.

@openshift-ci openshift-ci bot requested review from eranco74 and maorfr October 24, 2025 09:59
@openshift-ci
Copy link

openshift-ci bot commented Oct 24, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: omertuc

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link

coderabbitai bot commented Oct 24, 2025

Walkthrough

Adds an idempotent PostgreSQL migration script and runs it at container startup; updates container images to copy and invoke the migration, modifies pod spec for Postgres mounting and SSL mode, and alters deployment script to optionally remap user namespaces for hostPath volumes.

Changes

Cohort / File(s) Summary
Container Entrypoint Updates
Containerfile.add_llama_to_lightspeed, Containerfile.assisted-chat
Copies migrate.py into the image and replaces/updates ENTRYPOINT to run python3.12 /app/migrate.py before starting src/lightspeed_stack.py.
PostgreSQL Migration Script
migrate.py
New script that retries connecting to Postgres (up to 30 attempts, 2s backoff) and performs an idempotent migration to add topic_summary text column to lightspeed-stack.user_conversation if the schema exists.
Pod Configuration
assisted-chat-pod.yaml
Adds LIGHTSPEED_STACK_POSTGRES_SSL_MODE env var, exposes hostPort 5432, changes pgdata mountPath to /var/lib/pgsql/data:Z, and includes a commented hostPath persistence option.
Deployment Script Enhancement
scripts/run.sh
Sources template-params.dev.env and conditionally applies user namespace remapping (--userns=keep-id:uid=26,gid=26) and POSTGRES_USER_ID/POSTGRES_GROUP_ID when assisted-chat-pod.yaml contains a hostPath pgdata volume; otherwise runs original pod play command.

Sequence Diagram(s)

sequenceDiagram
    rect rgba(200,230,255,0.3)
    participant Container as Container Startup
    participant Migrate as migrate.py
    participant DB as PostgreSQL
    participant App as lightspeed_stack.py
    end

    Container->>Migrate: Run migrate.py (ENTRYPOINT)
    Migrate->>Migrate: Retry connect (30 attempts, 2s backoff)
    Migrate->>DB: Connect and check schema
    alt schema exists
        Migrate->>DB: ALTER TABLE add topic_summary IF NOT EXISTS
        DB-->>Migrate: OK
    else schema missing
        Migrate-->>Migrate: Skip migrations (fresh DB)
    end
    Migrate->>Migrate: Commit and cleanup
    Migrate-->>Container: Migration completed
    Container->>App: Start lightspeed_stack.py
    App-->>Container: App running
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Suggested labels

lgtm

Suggested reviewers

  • eranco74
  • jhernand

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "Perform lightspeed-stack migrations" directly and accurately captures the main objective of the pull request. The changeset's primary focus is implementing a migration mechanism for the lightspeed-stack database, which is demonstrated by the addition of the migrate.py script, updates to container startup sequences in both Containerfiles, and supporting infrastructure changes in the pod configuration and deployment script. The title is concise, specific, and clearly communicates the core change without vague language or unnecessary details, making it immediately understandable to anyone reviewing the commit history.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (8)
Containerfile.assisted-chat (1)

6-6: Ensure clean signal handling: use exec for the final process.

Without exec, /bin/sh stays PID 1 and may swallow signals. Switch to exec so the app is PID 1.

-ENTRYPOINT ["/bin/sh", "-c", "python3.12 /app/migrate.py && python3.12 src/lightspeed_stack.py"]
+ENTRYPOINT ["/bin/sh", "-c", "python3.12 /app/migrate.py && exec python3.12 src/lightspeed_stack.py"]
assisted-chat-pod.yaml (3)

24-25: Avoid empty sslmode; define or default it.

If LIGHTSPEED_STACK_POSTGRES_SSL_MODE is unset, migrate.py passes an empty sslmode which can fail. Either ensure this env is always set, or let migrate.py default to "disable" for local dev.


114-114: Binding postgres to hostPort 5432 can conflict locally.

Intra-pod traffic doesn’t need hostPort. Unless you must access Postgres from the host, drop hostPort: 5432 to avoid bind conflicts with system Postgres.


117-117: Confirm :Z in mountPath is honored by podman play kube.

Kubernetes mountPath doesn’t accept :Z; podman run uses :Z as a volume option. Please verify podman play kube applies SELinux relabel via this syntax; otherwise use an alternative (e.g., pre-create labeled dir or appropriate securityContext).

scripts/run.sh (1)

27-27: Source template-params.dev.env only if present.

Avoid hard failure when the file is missing.

-set -a && source "$PROJECT_ROOT/template-params.dev.env" && set +a
+if [[ -f "$PROJECT_ROOT/template-params.dev.env" ]]; then
+  set -a && source "$PROJECT_ROOT/template-params.dev.env" && set +a
+fi
Containerfile.add_llama_to_lightspeed (2)

15-15: Use exec so the app becomes PID 1.

Improve signal handling and shutdown.

-ENTRYPOINT ["/bin/sh", "-c", "python3.12 /app/migrate.py && python3.12 src/lightspeed_stack.py"]
+ENTRYPOINT ["/bin/sh", "-c", "python3.12 /app/migrate.py && exec python3.12 src/lightspeed_stack.py"]

14-16: Install psycopg2 for migrate.py if not already present.

If the base doesn’t include psycopg2, add it here.

 RUN cd /app-root/ && python3.12 -m pip install .
+RUN python3.12 -m pip install --no-cache-dir psycopg2-binary==2.9.9
migrate.py (1)

1-14: Docstring nit: grammar and clarity.

Minor wording fix: “can be run multiple times” (not “ran”), and clarify that migrations run at every startup.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1638a68 and da74fd0.

📒 Files selected for processing (5)
  • Containerfile.add_llama_to_lightspeed (1 hunks)
  • Containerfile.assisted-chat (1 hunks)
  • assisted-chat-pod.yaml (2 hunks)
  • migrate.py (1 hunks)
  • scripts/run.sh (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
scripts/run.sh (1)
utils/ocm-token.sh (1)
  • export_ocm_token (29-38)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Red Hat Konflux / assisted-chat-test-image-saas-main-on-pull-request
  • GitHub Check: Red Hat Konflux / assisted-chat-saas-main-on-pull-request
🔇 Additional comments (1)
Containerfile.assisted-chat (1)

5-6: Verify or add psycopg2 to container image.

The review concern is valid: migrate.py imports psycopg2, but Containerfile.assisted-chat contains no explicit installation of it. Either confirm that the base image (quay.io/lightspeed-core/lightspeed-stack:0.3.0) includes psycopg2, or add an installation step before the ENTRYPOINT.

# Background

lightspeed-stack does not currently handle database migrations on its
own. This means that when we update lightspeed-stack to a new version
that requires database schema changes, we need to handle the migrations
ourselves. Otherwise lightspeed-stack will not start or will function
wrong.

This happened to us recently when we upgraded lightspeed-stack to 0.3.0
and since they added a new column, list conversation calls were failing.

# Solution

We will do the migrations using a new `migrate.py` file defined in this
repo which we will add to our containers and run right before starting
lightspeed-stack.

# Other changes

- We need to set the `LIGHTSPEED_STACK_POSTGRES_SSL_MODE` environment
  variable in the pod spec so that our migration script can connect to
  the database correctly.

- We need to set the PostgreSQL data volume mount to `:Z` so that SELinux
  does not block the database from writing to it.

- Added optional comments in the dev pod that can be uncommented to
  persist PostgreSQL data between pod restarts by using a hostPath
  volume. Also modified the `run.sh` script to map the PostgreSQL user
  inside the container to the current host user so that there are no
  permission issues when using a hostPath volume. This is conditional so
  that users who do not want to persist data do not suffer the slowdowns
  of using userns mapping.

- We need to source the `template-params.dev.env` file in our `run.sh`
  script so that the `LIGHTSPEED_STACK_POSTGRES_SSL_MODE` variable is
  available when we run `podman play kube`.
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (2)
migrate.py (2)

59-65: Add table existence check and query timeouts.

The migration assumes the user_conversation table exists if the schema exists, but lightspeed-stack might create the schema before creating tables, causing ALTER TABLE to fail on partially-initialized databases. Additionally, without timeouts, the migration could block indefinitely on locked tables.

Apply this diff:

-cur = conn.cursor()
-cur.execute(
-    'ALTER TABLE "lightspeed-stack"."user_conversation" ADD COLUMN IF NOT EXISTS topic_summary text'
-)
-conn.commit()
-cur.close()
+with conn:
+    with conn.cursor() as cur:
+        # Set timeouts to avoid indefinite blocking
+        cur.execute("SET lock_timeout = '5s'; SET statement_timeout = '15s';")
+        
+        # Check if table exists
+        cur.execute(
+            "SELECT to_regclass(%s)", 
+            ('"lightspeed-stack"."user_conversation"',)
+        )
+        if cur.fetchone()[0] is None:
+            print('Table "user_conversation" not found, skipping column migration', file=sys.stderr)
+        else:
+            cur.execute(
+                'ALTER TABLE "lightspeed-stack"."user_conversation" '
+                'ADD COLUMN IF NOT EXISTS topic_summary text'
+            )
+            print("Migration completed")
+
 conn.close()
-print("Migration completed")

27-42: Add sslmode default and connection timeout.

The connection loop lacks resilience:

  1. sslmode will be None if LIGHTSPEED_STACK_POSTGRES_SSL_MODE is unset, which may cause connection failures.
  2. No connect_timeout means each attempt could hang indefinitely.
  3. Printing the full exception may expose sensitive environment details.

Apply this diff:

+sslmode = os.getenv("LIGHTSPEED_STACK_POSTGRES_SSL_MODE", "disable")
+connect_timeout = 5
+
 for _ in range(30):
     try:
         conn = psycopg2.connect(
             host=os.getenv("ASSISTED_CHAT_POSTGRES_HOST"),
             port=os.getenv("ASSISTED_CHAT_POSTGRES_PORT"),
             dbname=os.getenv("ASSISTED_CHAT_POSTGRES_NAME"),
             user=os.getenv("ASSISTED_CHAT_POSTGRES_USER"),
             password=os.getenv("ASSISTED_CHAT_POSTGRES_PASSWORD"),
-            sslmode=os.getenv("LIGHTSPEED_STACK_POSTGRES_SSL_MODE"),
+            sslmode=sslmode,
+            connect_timeout=connect_timeout,
         )
         break
     except psycopg2.OperationalError as e:
-        print("Waiting for Postgres...", e, file=sys.stderr)
+        print("Waiting for Postgres...", file=sys.stderr)
         time.sleep(2)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between da74fd0 and 2d7f643.

📒 Files selected for processing (5)
  • Containerfile.add_llama_to_lightspeed (1 hunks)
  • Containerfile.assisted-chat (1 hunks)
  • assisted-chat-pod.yaml (2 hunks)
  • migrate.py (1 hunks)
  • scripts/run.sh (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • Containerfile.assisted-chat
🧰 Additional context used
🧬 Code graph analysis (1)
scripts/run.sh (1)
utils/ocm-token.sh (1)
  • export_ocm_token (29-38)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: Red Hat Konflux / assisted-chat-saas-main-on-pull-request
  • GitHub Check: Red Hat Konflux / assisted-chat-test-image-saas-main-on-pull-request
🔇 Additional comments (7)
assisted-chat-pod.yaml (2)

24-25: LGTM: SSL mode configuration for migration script.

This environment variable is correctly added to support the migration script's database connection.


117-117: LGTM: SELinux label for volume mount.

The :Z flag correctly allows the container to write to the volume by applying the appropriate SELinux label.

scripts/run.sh (2)

27-27: LGTM: Sources template parameters.

Correctly sources template-params.dev.env to make LIGHTSPEED_STACK_POSTGRES_SSL_MODE available for pod deployment.


36-46: Remove the review comment; the code is correct as-is.

The review comment misidentifies the yq tool in use. The project explicitly requires kislyuk's py-yq (pip install yq), not mikefarah's Go yq. kislyuk's py-yq reads YAML from stdin and outputs JSON by default, making the current pipeline syntax correct:

<"$PROJECT_ROOT/assisted-chat-pod.yaml" yq | jq '.spec.volumes[] | select(.name == "pgdata").hostPath != null' --exit-status

The proposed fix uses -o=json, which is mikefarah's yq syntax and is not supported by kislyuk's py-yq. The existing code already works as intended with the project's configured dependencies.

Likely an incorrect or invalid review comment.

Containerfile.add_llama_to_lightspeed (1)

14-15: LGTM: Migration integrated into startup sequence.

The migration script is correctly copied and executed before lightspeed_stack.py. The use of && ensures that if the migration fails, the main application won't start, preventing runtime errors from schema mismatches.

migrate.py (2)

1-19: LGTM: Clear documentation.

The docstring clearly explains the purpose, idempotency requirement, and current migrations.


45-56: LGTM: Schema existence check.

Properly checks if the schema exists before attempting migrations, correctly handling fresh database installations.

value: assisted-chat
ports:
- containerPort: 5432
hostPort: 5432
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Consider security implications of exposing PostgreSQL port.

Exposing PostgreSQL on hostPort: 5432 makes the database accessible from the host network. This is acceptable for local development but ensure this configuration is not used in production or shared environments.

🤖 Prompt for AI Agents
In assisted-chat-pod.yaml around line 114, exposing PostgreSQL via hostPort:
5432 opens the DB to the host network; remove the hostPort mapping (or make it
conditional via a dev-only overlay/helm value) and instead expose the DB as a
ClusterIP service with containerPort set, using kubectl port-forward or a
controlled Service (NodePort with firewall rules or LoadBalancer behind an
ingress for controlled access) for non-local use; if you must keep hostPort for
development, gate it behind an explicit dev flag and add a clear comment/README
warning not to use that config in production.

@openshift-ci
Copy link

openshift-ci bot commented Oct 24, 2025

@omertuc: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/eval-test 2d7f643 link false /test eval-test

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@omertuc
Copy link
Member Author

omertuc commented Oct 27, 2025

/retest

Copy link
Collaborator

@eranco74 eranco74 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a nit
/lgtm

Comment on lines +38 to +46
if <"$PROJECT_ROOT/assisted-chat-pod.yaml" yq | jq '.spec.volumes[] | select(.name == "pgdata").hostPath != null' --exit-status; then
# Map the PostgreSQL user (UID 26) inside the container to the current host user
# This allows the PostgreSQL container to write to host-mounted volumes without permission issues
POSTGRES_USER_ID=26
POSTGRES_GROUP_ID=26
podman play kube --build=false --userns=keep-id:uid=$POSTGRES_USER_ID,gid=$POSTGRES_GROUP_ID <(envsubst <"$PROJECT_ROOT"/assisted-chat-pod.yaml)
else
podman play kube --build=false <(envsubst <"$PROJECT_ROOT"/assisted-chat-pod.yaml)
fi
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to replace this with:

securityContext:
    fsGroup: 26

in the assisted-chat-pod.yaml

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK tracking in #237

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't help

@openshift-ci openshift-ci bot added the lgtm label Oct 27, 2025
@openshift-merge-bot openshift-merge-bot bot merged commit 662cbf3 into rh-ecosystem-edge:main Oct 27, 2025
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants