-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should failing entrypoint scripts clean up data directory? #159
Comments
Whew, sorry for the very, very long tail on this. This is definitely something I've thought about a lot and ideally, yes, if the database is not (to our knowledge) "successfully initialized" we'd ideally have a still-empty data directory instead of a potentially-half-initialized one. This has absolutely bitten me directly several times, and I've seen countless examples of other folks being bitten by it. On the other hand, I've also seen folks quietly relying on the current behavior (:weary: yes, I know :disappointed:), so I'm not sure what the best answer here is. This isn't necessarily an obvious "security" win that warrants a large breaking change like we've done previously, so perhaps it would make sense to make it either opt-in or opt-in on old versions and opt-out on new major releases? There's also some details on the implementation that I think could get complicated. I don't think we can initialize to a truly "temporary" directory in the traditional sense safely because both the size of the initialized data and the cross-device issues might be prohibitive. I think the best we could probably do is a temporary subdirectory of the provided Frankly we could probably do something like |
It's not complete enough to be a full PR yet, but I did play around with this a little bit and my preliminary testing was pretty successful: diff --git a/docker-entrypoint.sh b/docker-entrypoint.sh
index 09a7564..2cd765b 100755
--- a/docker-entrypoint.sh
+++ b/docker-entrypoint.sh
@@ -60,6 +60,38 @@ docker_create_db_directories() {
fi
}
+# TODO make these opt-in / opt-out (https://github.com/docker-library/postgres/issues/159#issuecomment-1042466970)
+docker_temp_create_initdb_location() {
+ [ -z "${dockerTempOrigPGDATA:-}" ]
+ dockerTempOrigPGDATA="$PGDATA"
+ export PGDATA="$PGDATA/.docker-initdb"
+ if [ -s "$PGDATA/PG_VERSION" ]; then
+ {
+ echo
+ echo "Error: '$PGDATA' exists and appears to contain a database!"
+ echo ' this usually means a previous initialization attempt failed'
+ echo ' it is recommended to check logs for previous container runs'
+ echo
+ } >&2
+ exit 1
+ fi
+ echo "Using '$PGDATA' as a temporary PGDATA for initialization ..."
+}
+docker_temp_migrate_initdb_location() {
+ if [ -n "${dockerTempOrigPGDATA:-}" ]; then
+ # TODO balk if "$dockerTempOrigPGDATA/PG_VERSION" or if *not* "$PGDATA/PG_VERSION" !!! (if we have two initialized databases or don't have one at all)
+ echo "Migrating '$PGDATA' back to '$dockerTempOrigPGDATA' ..."
+ # TODO use find -mindepth -maxdepth instead of dotglob
+ (
+ shopt -s dotglob
+ mv -ft "$dockerTempOrigPGDATA" "$PGDATA"/*
+ )
+ rmdir "$PGDATA"
+ export PGDATA="$dockerTempOrigPGDATA"
+ unset ORIG_PGDATA
+ fi
+}
+
# initialize empty PGDATA directory with new database via 'initdb'
# arguments to `initdb` can be passed via POSTGRES_INITDB_ARGS or as arguments to this function
# `initdb` automatically creates the "postgres", "template0", and "template1" dbnames
@@ -314,6 +346,8 @@ _main() {
# check dir permissions to reduce likelihood of half-initialized database
ls /docker-entrypoint-initdb.d/ > /dev/null
+ docker_temp_create_initdb_location
+
docker_init_database_dir
pg_setup_hba_conf "$@"
@@ -328,6 +362,8 @@ _main() {
docker_temp_server_stop
unset PGPASSWORD
+ docker_temp_migrate_initdb_location
+
echo
echo 'PostgreSQL init process complete; ready for start up.'
echo |
Thinking about this initialization in a temporary directory and I think the set of users we could break are ones using the full path ( To err on the side of most caution, maybe all current versions ( |
+1 on this, it would be very useful to prevent the daemon from starting if any initdb.d script encounters an error. |
Above patch by @tianon does not cover the case that an initialization script wants to access the cluster (in our case @@ -308,8 +308,19 @@ _main() {
exec gosu postgres "$BASH_SOURCE" "$@"
fi
+ local initFailed="$PGDATA/INITFAILED"
+
+ if [ -s "$initFailed" ]; then
+ cat >&2 <<-'EOE'
+ Error: Previous database initialization failed.
+ EOE
+ exit 1
+ fi
+
# only run initialization on an empty data directory
if [ -z "$DATABASE_ALREADY_EXISTS" ]; then
+ trap "mkdir "$initFailed"" EXIT
+
docker_verify_minimum_env
# check dir permissions to reduce likelihood of half-initialized database
@@ -329,6 +340,8 @@ _main() {
docker_temp_server_stop
unset PGPASSWORD
+ trap - EXIT
+
cat <<-'EOM'
PostgreSQL init process complete; ready for start up. |
I'm not sure I understand what you mean -- my patch still starts the temporary server? 😅 |
@tianon True. Let me clarify: An initialization script might want to access the cluster at the intended (non-temporary) location. Of course, scripts could be patched to find the cluster at the temporary location. I would argue, however, that a less intrusive solution is preferable. |
Urg, |
For instance, I have some scripts that set up some users and such using ENV variables passed to
docker run
. I want to make sure that if those variables were not passed to thedocker run
then the initialization process halts. Currently, the data directory is created and can be left in some undetermined state if one of the entrypoint scripts fails.Should it perhaps create the data directory in a temporary location first, and when all scripts complete successfully - move it to the
$PG_DATA
location and clean everything up if those scripts fail?The text was updated successfully, but these errors were encountered: