Skip to content
This repository has been archived by the owner on Jan 18, 2024. It is now read-only.

timescale can't start up due to /var/lib/postgresql/data not created #72

Closed
lonelyleaf opened this issue Dec 16, 2019 · 6 comments
Closed

Comments

@lonelyleaf
Copy link

I'm useing this helm chart to create a ha timescale cluster . But The timescale pod can't run

here is my helm values and pod log.

---
persistentVolumes:
  wal:
    storageClass: "timescaledb-wal-local-volume"
    enabled: "true"
  data:
    storageClass: "timescaledb-data-local-volume"
    enabled: "true"
loadBalancer:
  enabled: "false"
prometheus:
  enabled: "true"

pod log

install: cannot change owner and permissions of ‘/var/lib/postgresql/data’: No such file or directory
install: cannot change owner and permissions of ‘/var/lib/postgresql/wal/pg_wal’: No such file or directory
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.
The database cluster will be initialized with locale "C.UTF-8".
The default text search configuration will be set to "english".
Data page checksums are disabled.
initdb: could not create directory "/var/lib/postgresql/data": Permission denied
pg_ctl: database system initialization failed
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.7/multiprocessing/process.py", line 297, in _bootstrap
self.run()
File "/usr/lib/python3.7/multiprocessing/process.py", line 99, in run
self._target(*self._args, **self._kwargs)
File "/usr/lib/python3/dist-packages/patroni/__init__.py", line 160, in patroni_main
patroni.run()
File "/usr/lib/python3/dist-packages/patroni/__init__.py", line 125, in run
logger.info(self.ha.run_cycle())
File "/usr/lib/python3/dist-packages/patroni/ha.py", line 1344, in run_cycle
info = self._run_cycle()
File "/usr/lib/python3/dist-packages/patroni/ha.py", line 1253, in _run_cycle
return self.post_bootstrap()
File "/usr/lib/python3/dist-packages/patroni/ha.py", line 1149, in post_bootstrap
self.cancel_initialization()
File "/usr/lib/python3/dist-packages/patroni/ha.py", line 1144, in cancel_initialization
raise PatroniException('Failed to bootstrap cluster')
patroni.exceptions.PatroniException: 'Failed to bootstrap cluster'
creating directory /var/lib/postgresql/data ...
@lonelyleaf
Copy link
Author

lonelyleaf commented Dec 16, 2019

well,when I chang to use ceph as storageClass,it can run successfully.Befor I use hostPath(not local volume in k8s 1.14) as static pv like below.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: timescaledb-data-local-volume
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Retain
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: timescaledb-wal-local-volume
provisioner: kubernetes.io/no-provisioner
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Retain
apiVersion: v1
kind: PersistentVolume
metadata:
  name: timescaledb-data-vol-0
spec:
  capacity:
    storage: 200Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  nodeAffinity:
    required:
      nodeSelectorTerms:
        - matchExpressions:
            - key: "kubernetes.io/hostname"
              operator: In
              values:
                - "rh2-inf-kafka1"
  storageClassName: timescaledb-data-local-volume
  volumeMode: Filesystem
  hostPath:
    type: DirectoryOrCreate
    path: /data/timescaledb/data

....other pv like this

so dose this mean hostPath is not support,,for I must use local nvme ssd on host

@feikesteenbergen
Copy link
Member

I'm not sure why this is happening, the error message:

install: cannot change owner and permissions of ‘/var/lib/postgresql/data’: No such file or directory

Occurs because the user postgres cannot create that directory; However the StatefulSet explicitly mounts /var/lib/postgresql/data with the postgres user.

For the failing deploy, could you share the following output?

kubectl describe pod/<yourpod>

@feikesteenbergen
Copy link
Member

feikesteenbergen commented Dec 16, 2019

While thinking about the above, I think the problem is with the permissions on the directory of the HostPath. The Docker Image is run as a non-root user, which means the container cannot change ownership of directories.

For the dynamically provisioned Volumes (like your ceph volume), the following piece of code ensures correct ownership:

      securityContext:
        # The postgres user inside the TimescaleDB image has uid=1000.
        # This configuration ensures the permissions of the mounts are suitable
        fsGroup: 1000

https://github.com/timescale/timescaledb-kubernetes/blob/master/charts/timescaledb-single/templates/statefulset-timescaledb.yaml#L36-L39

For troubleshooting could you set the permissions of your HostPath to very liberally (even 0777) and see if that works?

If that solves the problem, the only thing you may need to change is the owner uid of the directory of the HostPath Volume.

feikesteenbergen added a commit that referenced this issue Dec 16, 2019
The install command should not fail if permissions are correct or if the
directories already exist. As a failure on the data directory or the wal
directory will also cause Patroni and PostgreSQL to fail, it seems
better to fail fast with the error message rather than to continue,
which will clutter the logs with more error messages.

For example in issue #72, the output would have been very clear that it
is a permission problem.
@lonelyleaf
Copy link
Author

Thanks for your replay.

Change peermissions to 0777 did not help,but after use command likechown -R 1000:1000 /data/timescaledb/wal/ did slove the problem.

feikesteenbergen added a commit that referenced this issue Dec 17, 2019
The install command should not fail if permissions are correct or if the
directories already exist. As a failure on the data directory or the wal
directory will also cause Patroni and PostgreSQL to fail, it seems
better to fail fast with the error message rather than to continue,
which will clutter the logs with more error messages.

For example in issue #72, the output would have been very clear that it
is a permission problem.
@matte0080
Copy link

chown -R 1000:1000 /data/timescaledb/wal/

Hi mate, where you put this code line? I have the same error.

@badokun
Copy link

badokun commented Mar 31, 2022

chown -R 1000:1000 /data/timescaledb/wal/

Hi mate, where you put this code line? I have the same error.

Same here.. .where can you run that cmd. I tried to exec into the container, but it's not running

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants