-
Notifications
You must be signed in to change notification settings - Fork 22
Perform lightspeed-stack migrations #235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: omertuc The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
WalkthroughAdds an idempotent PostgreSQL migration script and runs it at container startup; updates container images to copy and invoke the migration, modifies pod spec for Postgres mounting and SSL mode, and alters deployment script to optionally remap user namespaces for hostPath volumes. Changes
Sequence Diagram(s)sequenceDiagram
rect rgba(200,230,255,0.3)
participant Container as Container Startup
participant Migrate as migrate.py
participant DB as PostgreSQL
participant App as lightspeed_stack.py
end
Container->>Migrate: Run migrate.py (ENTRYPOINT)
Migrate->>Migrate: Retry connect (30 attempts, 2s backoff)
Migrate->>DB: Connect and check schema
alt schema exists
Migrate->>DB: ALTER TABLE add topic_summary IF NOT EXISTS
DB-->>Migrate: OK
else schema missing
Migrate-->>Migrate: Skip migrations (fresh DB)
end
Migrate->>Migrate: Commit and cleanup
Migrate-->>Container: Migration completed
Container->>App: Start lightspeed_stack.py
App-->>Container: App running
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
🧹 Nitpick comments (8)
Containerfile.assisted-chat (1)
6-6: Ensure clean signal handling: use exec for the final process.Without exec, /bin/sh stays PID 1 and may swallow signals. Switch to exec so the app is PID 1.
-ENTRYPOINT ["/bin/sh", "-c", "python3.12 /app/migrate.py && python3.12 src/lightspeed_stack.py"] +ENTRYPOINT ["/bin/sh", "-c", "python3.12 /app/migrate.py && exec python3.12 src/lightspeed_stack.py"]assisted-chat-pod.yaml (3)
24-25: Avoid empty sslmode; define or default it.If LIGHTSPEED_STACK_POSTGRES_SSL_MODE is unset, migrate.py passes an empty sslmode which can fail. Either ensure this env is always set, or let migrate.py default to "disable" for local dev.
114-114: Binding postgres to hostPort 5432 can conflict locally.Intra-pod traffic doesn’t need hostPort. Unless you must access Postgres from the host, drop hostPort: 5432 to avoid bind conflicts with system Postgres.
117-117: Confirm :Z in mountPath is honored by podman play kube.Kubernetes mountPath doesn’t accept :Z; podman run uses :Z as a volume option. Please verify podman play kube applies SELinux relabel via this syntax; otherwise use an alternative (e.g., pre-create labeled dir or appropriate securityContext).
scripts/run.sh (1)
27-27: Source template-params.dev.env only if present.Avoid hard failure when the file is missing.
-set -a && source "$PROJECT_ROOT/template-params.dev.env" && set +a +if [[ -f "$PROJECT_ROOT/template-params.dev.env" ]]; then + set -a && source "$PROJECT_ROOT/template-params.dev.env" && set +a +fiContainerfile.add_llama_to_lightspeed (2)
15-15: Use exec so the app becomes PID 1.Improve signal handling and shutdown.
-ENTRYPOINT ["/bin/sh", "-c", "python3.12 /app/migrate.py && python3.12 src/lightspeed_stack.py"] +ENTRYPOINT ["/bin/sh", "-c", "python3.12 /app/migrate.py && exec python3.12 src/lightspeed_stack.py"]
14-16: Install psycopg2 for migrate.py if not already present.If the base doesn’t include psycopg2, add it here.
RUN cd /app-root/ && python3.12 -m pip install . +RUN python3.12 -m pip install --no-cache-dir psycopg2-binary==2.9.9migrate.py (1)
1-14: Docstring nit: grammar and clarity.Minor wording fix: “can be run multiple times” (not “ran”), and clarify that migrations run at every startup.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
Containerfile.add_llama_to_lightspeed(1 hunks)Containerfile.assisted-chat(1 hunks)assisted-chat-pod.yaml(2 hunks)migrate.py(1 hunks)scripts/run.sh(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
scripts/run.sh (1)
utils/ocm-token.sh (1)
export_ocm_token(29-38)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Red Hat Konflux / assisted-chat-test-image-saas-main-on-pull-request
- GitHub Check: Red Hat Konflux / assisted-chat-saas-main-on-pull-request
🔇 Additional comments (1)
Containerfile.assisted-chat (1)
5-6: Verify or add psycopg2 to container image.The review concern is valid: migrate.py imports psycopg2, but Containerfile.assisted-chat contains no explicit installation of it. Either confirm that the base image (quay.io/lightspeed-core/lightspeed-stack:0.3.0) includes psycopg2, or add an installation step before the ENTRYPOINT.
# Background lightspeed-stack does not currently handle database migrations on its own. This means that when we update lightspeed-stack to a new version that requires database schema changes, we need to handle the migrations ourselves. Otherwise lightspeed-stack will not start or will function wrong. This happened to us recently when we upgraded lightspeed-stack to 0.3.0 and since they added a new column, list conversation calls were failing. # Solution We will do the migrations using a new `migrate.py` file defined in this repo which we will add to our containers and run right before starting lightspeed-stack. # Other changes - We need to set the `LIGHTSPEED_STACK_POSTGRES_SSL_MODE` environment variable in the pod spec so that our migration script can connect to the database correctly. - We need to set the PostgreSQL data volume mount to `:Z` so that SELinux does not block the database from writing to it. - Added optional comments in the dev pod that can be uncommented to persist PostgreSQL data between pod restarts by using a hostPath volume. Also modified the `run.sh` script to map the PostgreSQL user inside the container to the current host user so that there are no permission issues when using a hostPath volume. This is conditional so that users who do not want to persist data do not suffer the slowdowns of using userns mapping. - We need to source the `template-params.dev.env` file in our `run.sh` script so that the `LIGHTSPEED_STACK_POSTGRES_SSL_MODE` variable is available when we run `podman play kube`.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
♻️ Duplicate comments (2)
migrate.py (2)
59-65: Add table existence check and query timeouts.The migration assumes the
user_conversationtable exists if the schema exists, but lightspeed-stack might create the schema before creating tables, causingALTER TABLEto fail on partially-initialized databases. Additionally, without timeouts, the migration could block indefinitely on locked tables.Apply this diff:
-cur = conn.cursor() -cur.execute( - 'ALTER TABLE "lightspeed-stack"."user_conversation" ADD COLUMN IF NOT EXISTS topic_summary text' -) -conn.commit() -cur.close() +with conn: + with conn.cursor() as cur: + # Set timeouts to avoid indefinite blocking + cur.execute("SET lock_timeout = '5s'; SET statement_timeout = '15s';") + + # Check if table exists + cur.execute( + "SELECT to_regclass(%s)", + ('"lightspeed-stack"."user_conversation"',) + ) + if cur.fetchone()[0] is None: + print('Table "user_conversation" not found, skipping column migration', file=sys.stderr) + else: + cur.execute( + 'ALTER TABLE "lightspeed-stack"."user_conversation" ' + 'ADD COLUMN IF NOT EXISTS topic_summary text' + ) + print("Migration completed") + conn.close() -print("Migration completed")
27-42: Add sslmode default and connection timeout.The connection loop lacks resilience:
sslmodewill beNoneifLIGHTSPEED_STACK_POSTGRES_SSL_MODEis unset, which may cause connection failures.- No
connect_timeoutmeans each attempt could hang indefinitely.- Printing the full exception may expose sensitive environment details.
Apply this diff:
+sslmode = os.getenv("LIGHTSPEED_STACK_POSTGRES_SSL_MODE", "disable") +connect_timeout = 5 + for _ in range(30): try: conn = psycopg2.connect( host=os.getenv("ASSISTED_CHAT_POSTGRES_HOST"), port=os.getenv("ASSISTED_CHAT_POSTGRES_PORT"), dbname=os.getenv("ASSISTED_CHAT_POSTGRES_NAME"), user=os.getenv("ASSISTED_CHAT_POSTGRES_USER"), password=os.getenv("ASSISTED_CHAT_POSTGRES_PASSWORD"), - sslmode=os.getenv("LIGHTSPEED_STACK_POSTGRES_SSL_MODE"), + sslmode=sslmode, + connect_timeout=connect_timeout, ) break except psycopg2.OperationalError as e: - print("Waiting for Postgres...", e, file=sys.stderr) + print("Waiting for Postgres...", file=sys.stderr) time.sleep(2)
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (5)
Containerfile.add_llama_to_lightspeed(1 hunks)Containerfile.assisted-chat(1 hunks)assisted-chat-pod.yaml(2 hunks)migrate.py(1 hunks)scripts/run.sh(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- Containerfile.assisted-chat
🧰 Additional context used
🧬 Code graph analysis (1)
scripts/run.sh (1)
utils/ocm-token.sh (1)
export_ocm_token(29-38)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Red Hat Konflux / assisted-chat-saas-main-on-pull-request
- GitHub Check: Red Hat Konflux / assisted-chat-test-image-saas-main-on-pull-request
🔇 Additional comments (7)
assisted-chat-pod.yaml (2)
24-25: LGTM: SSL mode configuration for migration script.This environment variable is correctly added to support the migration script's database connection.
117-117: LGTM: SELinux label for volume mount.The
:Zflag correctly allows the container to write to the volume by applying the appropriate SELinux label.scripts/run.sh (2)
27-27: LGTM: Sources template parameters.Correctly sources
template-params.dev.envto makeLIGHTSPEED_STACK_POSTGRES_SSL_MODEavailable for pod deployment.
36-46: Remove the review comment; the code is correct as-is.The review comment misidentifies the
yqtool in use. The project explicitly requires kislyuk's py-yq (pip install yq), not mikefarah's Go yq. kislyuk's py-yq reads YAML from stdin and outputs JSON by default, making the current pipeline syntax correct:<"$PROJECT_ROOT/assisted-chat-pod.yaml" yq | jq '.spec.volumes[] | select(.name == "pgdata").hostPath != null' --exit-statusThe proposed fix uses
-o=json, which is mikefarah's yq syntax and is not supported by kislyuk's py-yq. The existing code already works as intended with the project's configured dependencies.Likely an incorrect or invalid review comment.
Containerfile.add_llama_to_lightspeed (1)
14-15: LGTM: Migration integrated into startup sequence.The migration script is correctly copied and executed before
lightspeed_stack.py. The use of&&ensures that if the migration fails, the main application won't start, preventing runtime errors from schema mismatches.migrate.py (2)
1-19: LGTM: Clear documentation.The docstring clearly explains the purpose, idempotency requirement, and current migrations.
45-56: LGTM: Schema existence check.Properly checks if the schema exists before attempting migrations, correctly handling fresh database installations.
| value: assisted-chat | ||
| ports: | ||
| - containerPort: 5432 | ||
| hostPort: 5432 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider security implications of exposing PostgreSQL port.
Exposing PostgreSQL on hostPort: 5432 makes the database accessible from the host network. This is acceptable for local development but ensure this configuration is not used in production or shared environments.
🤖 Prompt for AI Agents
In assisted-chat-pod.yaml around line 114, exposing PostgreSQL via hostPort:
5432 opens the DB to the host network; remove the hostPort mapping (or make it
conditional via a dev-only overlay/helm value) and instead expose the DB as a
ClusterIP service with containerPort set, using kubectl port-forward or a
controlled Service (NodePort with firewall rules or LoadBalancer behind an
ingress for controlled access) for non-local use; if you must keep hostPort for
development, gate it behind an explicit dev flag and add a clear comment/README
warning not to use that config in production.
|
@omertuc: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
/retest |
eranco74
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a nit
/lgtm
| if <"$PROJECT_ROOT/assisted-chat-pod.yaml" yq | jq '.spec.volumes[] | select(.name == "pgdata").hostPath != null' --exit-status; then | ||
| # Map the PostgreSQL user (UID 26) inside the container to the current host user | ||
| # This allows the PostgreSQL container to write to host-mounted volumes without permission issues | ||
| POSTGRES_USER_ID=26 | ||
| POSTGRES_GROUP_ID=26 | ||
| podman play kube --build=false --userns=keep-id:uid=$POSTGRES_USER_ID,gid=$POSTGRES_GROUP_ID <(envsubst <"$PROJECT_ROOT"/assisted-chat-pod.yaml) | ||
| else | ||
| podman play kube --build=false <(envsubst <"$PROJECT_ROOT"/assisted-chat-pod.yaml) | ||
| fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to replace this with:
securityContext:
fsGroup: 26
in the assisted-chat-pod.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK tracking in #237
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't help
662cbf3
into
rh-ecosystem-edge:main
Background
lightspeed-stack does not currently handle database migrations on its own. This means that when we update lightspeed-stack to a new version that requires database schema changes, we need to handle the migrations ourselves. Otherwise lightspeed-stack will not start or will function wrong.
This happened to us recently when we upgraded lightspeed-stack to 0.3.0 and since they added a new column, list conversation calls were failing.
Solution
We will do the migrations using a new
migrate.pyfile defined in this repo which we will add to our containers and run right before starting lightspeed-stack.Other changes
We need to set the
LIGHTSPEED_STACK_POSTGRES_SSL_MODEenvironment variable in the pod spec so that our migration script can connect to the database correctly.We need to set the PostgreSQL data volume mount to
:Zso that SELinux does not block the database from writing to it.Added optional comments in the dev pod that can be uncommented to persist PostgreSQL data between pod restarts by using a hostPath volume. Also modified the
run.shscript to map the PostgreSQL user inside the container to the current host user so that there are no permission issues when using a hostPath volume. This is conditional so that users who do not want to persist data do not suffer the slowdowns of using userns mapping.We need to source the
template-params.dev.envfile in ourrun.shscript so that theLIGHTSPEED_STACK_POSTGRES_SSL_MODEvariable is available when we runpodman play kube.Summary by CodeRabbit
New Features
Documentation