Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Consul for distributed consensus #238

Merged
merged 57 commits into from
Jan 17, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
57 commits
Select commit Hold shift + click to select a range
9019c7e
add role "consul"
vitabaks Jan 3, 2023
b35c3e3
delete unused
vitabaks Jan 3, 2023
204a84d
consul: yamllint disable rule:braces
vitabaks Jan 3, 2023
e9d4b90
consul: fix yamllint warnings
vitabaks Jan 3, 2023
21b8416
consul: exclude from ansible-lint
vitabaks Jan 3, 2023
1e7d4b1
consul: specify the proxy_env variable at the playbook level
vitabaks Jan 3, 2023
01b8fad
add consul to tags.md
vitabaks Jan 3, 2023
dec44e7
add the pre-downloaded consul archive for installation
vitabaks Jan 3, 2023
16421ea
consul: update vars
vitabaks Jan 3, 2023
17ef486
add dig command
vitabaks Jan 3, 2023
740cb17
Install consul role requirements (ansible.utils)
vitabaks Jan 3, 2023
f9bfa93
Update consul.yml
vitabaks Jan 3, 2023
cab162c
Update system.yml
vitabaks Jan 3, 2023
930b85b
consul: add pre-check
vitabaks Jan 3, 2023
c7802e4
README: consul role requirements
vitabaks Jan 3, 2023
a207a2d
Update inventory
vitabaks Jan 4, 2023
8f622af
add a nameserver entry poining to localhost for dnsmasq
vitabaks Jan 4, 2023
e1af272
Update vars/main for consul
vitabaks Jan 4, 2023
d9d02b6
add consul_services
vitabaks Jan 4, 2023
1bb35db
consul: fix a typo
vitabaks Jan 4, 2023
3164e24
update consul pre-check
vitabaks Jan 4, 2023
f70a894
consul: make sure the python3-pip package are present
vitabaks Jan 4, 2023
2268e26
consul: update the "Disable systemd-resolved" tasks
vitabaks Jan 4, 2023
bc9b7aa
Update deploy_pgcluster.yml
vitabaks Jan 4, 2023
71ce871
Update main.yml
vitabaks Jan 4, 2023
8af02b1
Update main.yml
vitabaks Jan 4, 2023
3403d6b
molecule: dcs_type: "consul"
vitabaks Jan 8, 2023
deefc73
molecule: add the ability to test dcs_type: "consul"
vitabaks Jan 8, 2023
d51d696
Update molecule.yml
vitabaks Jan 8, 2023
26f0780
Update converge.yml
vitabaks Jan 8, 2023
7e917a6
Update prepare.yml
vitabaks Jan 8, 2023
789c9ab
Update molecule.yml
vitabaks Jan 8, 2023
7462220
Set dcs_type: "consul" for tests
vitabaks Jan 8, 2023
6c3b042
molecule: exposed_ports: 8500 (consul client) for pgnode
vitabaks Jan 8, 2023
c4af678
molecule: add consul_node_role: client for pgnode
vitabaks Jan 8, 2023
fe5f3f4
molecule: install consul client on pgnode
vitabaks Jan 8, 2023
cdf5877
update molecule vars for dcs_type: "consul"
vitabaks Jan 11, 2023
922b766
consul: add AlmaLinux to supported *nix distributions
vitabaks Jan 11, 2023
a19e409
consul: fix yamllint warning
vitabaks Jan 11, 2023
f8566f8
Update .ansible-lint
vitabaks Jan 11, 2023
acf555a
Update asserts.yml
vitabaks Jan 11, 2023
468b301
Update asserts.yml
vitabaks Jan 11, 2023
54a6078
Update asserts.yml
vitabaks Jan 11, 2023
53650a8
consul: add AlmaLinux support
vitabaks Jan 11, 2023
cb9d2e4
molecule: consul_iface_check: false
vitabaks Jan 11, 2023
240c8a5
molecule: consul_gather_server_facts: true
vitabaks Jan 11, 2023
72737b2
Update converge.yml
vitabaks Jan 11, 2023
3b6160c
deploy_pgcluster: prepare the system
vitabaks Jan 12, 2023
04ce592
Update deploy_pgcluster.yml
vitabaks Jan 12, 2023
27ccd72
deploy-finish: update Cluster connection info
vitabaks Jan 13, 2023
17f5628
consul: Check if resolv.conf is pointing to systemd-resolved before s…
vitabaks Jan 16, 2023
158bdd9
Update README.md
vitabaks Jan 16, 2023
c9bb3bc
Update dnsmasq.yml
vitabaks Jan 16, 2023
f57b6f7
Check if systemd-resolved service exists
vitabaks Jan 16, 2023
ca34b3e
Add TypeC image
vitabaks Jan 17, 2023
65d1139
return dcs_type: "etcd" by default
vitabaks Jan 17, 2023
7c52df2
README: add Consul scheme
vitabaks Jan 17, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion .ansible-lint
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ skip_list:
- name[missing] # All tasks should be named.
- name[casing] # TODO: All names should start with an uppercase letter.
- name[template] # Rule for checking task and play names
- jinja[spacing] . # TODO
- jinja[spacing] # TODO
- jinja[invalid] # TODO
- no-handler
- schema[tasks]
Expand All @@ -21,4 +21,8 @@ skip_list:
- fqcn[action]
- no-relative-paths

exclude_paths:
- roles/consul/ # TODO - https://github.com/ansible-community/ansible-consul/pull/520

# https://ansible-lint.readthedocs.io/configuring/
# https://ansible-lint.readthedocs.io/rules/
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
.DS_Store
74 changes: 51 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
[![GitHub license](https://img.shields.io/github/license/vitabaks/postgresql_cluster)](https://github.com/vitabaks/postgresql_cluster/blob/master/LICENSE)
![GitHub stars](https://img.shields.io/github/stars/vitabaks/postgresql_cluster)

### Deploy a Production Ready PostgreSQL High-Availability Cluster (based on "Patroni" and "DCS(etcd)"). Automating with Ansible.
### Deploy a Production Ready PostgreSQL High-Availability Cluster (based on "Patroni" and DCS "etcd" or "consul"). Automating with Ansible.

This Ansible playbook is designed for deploying a PostgreSQL high availability cluster on dedicated physical servers for a production environment. The cluster can also be deployed on virtual machines and in the Cloud.

Expand All @@ -23,15 +23,16 @@ In addition to deploying new clusters, this playbook also support the deployment

## Index
- [Cluster types](#cluster-types)
- [[Type A] PostgreSQL High-Availability with Load Balancing](#type-a-postgresql-high-availability-with-load-balancing)
- [[Type A] PostgreSQL High-Availability with HAProxy Load Balancing](#type-a-postgresql-high-availability-with-haproxy-load-balancing)
- [[Type B] PostgreSQL High-Availability only](#type-b-postgresql-high-availability-only)
- [[Type C] PostgreSQL High-Availability with Consul Service Discovery (DNS)](#type-c-postgresql-high-availability-with-consul-service-discovery-dns)
- [Compatibility](#compatibility)
- [Supported Linux Distributions:](#supported-linux-distributions)
- [PostgreSQL versions:](#postgresql-versions)
- [Ansible version](#ansible-version)
- [Requirements](#requirements)
- [Port requirements](#port-requirements)
- [Recommendations](#recommendations)
- [Recommendations](#recommenations)
- [Deployment: quick start](#deployment-quick-start)
- [Variables](#variables)
- [Cluster Scaling](#cluster-scaling)
Expand All @@ -54,9 +55,9 @@ In addition to deploying new clusters, this playbook also support the deployment

## Cluster types

You have two options available for deployment "Type A" and "Type B".
You have three schemes available for deployment:

### [Type A] PostgreSQL High-Availability with Load Balancing
### [Type A] PostgreSQL High-Availability with HAProxy Load Balancing
![TypeA](images/TypeA.png)

> To use this scheme, specify `with_haproxy_load_balancing: true` in variable file vars/main.yml
Expand Down Expand Up @@ -91,18 +92,34 @@ In our configuration keepalived checks the status of the HAProxy service and in
[**PgBouncer**](https://pgbouncer.github.io/features.html) is a connection pooler for PostgreSQL.



### [Type B] PostgreSQL High-Availability only
![TypeB](images/TypeB.png)

This is simple scheme without load balancing `Used by default`

To provide a single entry point (VIP) for databases access is used "vip-manager".
To provide a single entry point (VIP) for database access is used "vip-manager". If the variable `cluster_vip` is specified (optional).

[**vip-manager**](https://github.com/cybertec-postgresql/vip-manager) is a service that gets started on all cluster nodes and connects to the DCS. If the local node owns the leader-key, vip-manager starts the configured VIP. In case of a failover, vip-manager removes the VIP on the old leader and the corresponding service on the new leader starts it there. \
Written in Go. Cybertec Schönig & Schönig GmbH https://www.cybertec-postgresql.com


### [Type C] PostgreSQL High-Availability with Consul Service Discovery (DNS)
![TypeC](images/TypeC.png)

> To use this scheme, specify `dcs_type: consul` in variable file vars/main.yml

This scheme is suitable for master-only access and for load balancing (using DNS) for reading across replicas. Consul [Service Discovery](https://developer.hashicorp.com/consul/docs/concepts/service-discovery) with [DNS resolving ](https://developer.hashicorp.com/consul/docs/discovery/dns) is used as a client access point to the database.

Client access point (example):

- `master.postgres-cluster.service.consul`
- `replica.postgres-cluster.service.consul`

Besides, it can be useful for a distributed cluster across different data centers. We can specify in advance which data center the database server is located in and then use this for applications running in the same data center.

Example: `replica.postgres-cluster.service.dc1.consul`, `replica.postgres-cluster.service.dc2.consul`

It requires the installation of a consul in client mode on each application server for service DNS resolution (or use [forward DNS](https://developer.hashicorp.com/consul/tutorials/networking/dns-forwarding?utm_source=docs) to the remote consul server instead of installing a local consul client).

---
## Compatibility
Expand Down Expand Up @@ -149,6 +166,10 @@ This playbook requires root privileges or sudo.

Ansible ([What is Ansible](https://www.ansible.com/resources/videos/quick-start-video)?)

if `dcs_type: "consul"`, please install consul role requirements on the control node:

`ansible-galaxy install -r roles/consul/requirements.yml`

## Port requirements
List of required TCP ports that must be open for the database cluster:

Expand All @@ -157,31 +178,40 @@ List of required TCP ports that must be open for the database cluster:
- `8008` (patroni rest api)
- `2379`, `2380` (etcd)

additionally, for the scheme "[Type A] PostgreSQL High-Availability with Load Balancing":
for the scheme "[Type A] PostgreSQL High-Availability with Load Balancing":

- `5000` (haproxy - (read/write) master)
- `5001` (haproxy - (read only) all replicas)
- `5002` (haproxy - (read only) synchronous replica only)
- `5003` (haproxy - (read only) asynchronous replicas only)
- `7000` (optional, haproxy stats)

## Recommendations
for the scheme "[Type C] PostgreSQL High-Availability with Consul Service Discovery (DNS)":

- `8300` (Consul Server RPC)
- `8301` (Consul Serf LAN)
- `8302` (Consul Serf WAN)
- `8500` (Consul HTTP API)
- `8600` (Consul DNS server)


## Recommenations
- **linux (Operation System)**:

Update your operating system on your target servers before deploying;

Make sure you have time synchronization is configured (NTP).
Specify `ntp_enabled:'true'` and `ntp_servers` if you want to install and configure the ntp service.

- **DCS (Distributed Configuration Store)**:
- **DCS (Distributed Consensus Store)**:

Fast drives and a reliable network are the most important factors for the performance and stability of an etcd cluster.
Fast drives and a reliable network are the most important factors for the performance and stability of an etcd (or consul) cluster.

Avoid storing etcd data on the same drive along with other processes (such as the database) that are intensively using the resources of the disk subsystem!
Store the etcd and postgresql data on **different** disks (see `etcd_data_dir` variable), use ssd drives if possible.
See [hardware recommendations](https://etcd.io/docs/v3.3.12/op-guide/hardware/) and [tuning](https://etcd.io/docs/v3.3.12/tuning/) guides.
Avoid storing etcd (or consul) data on the same drive along with other processes (such as the database) that are intensively using the resources of the disk subsystem!
Store the etcd and postgresql data on **different** disks (see `etcd_data_dir`, `consul_data_path` variables), use ssd drives if possible.
See [hardware recommendations](https://etcd.io/docs/v3.3/op-guide/hardware/) and [tuning](https://etcd.io/docs/v3.3/tuning/) guides.

Overloaded (highload) database clusters may require the installation of the etcd cluster on dedicated servers, separate from the database servers.
It is recommended to deploy the DCS cluster on dedicated servers, separate from the database servers.

- **Placement of cluster members in different data centers**:

Expand Down Expand Up @@ -227,18 +257,16 @@ To minimize the risk of losing data on autofailover, you can configure settings

###### Minimum set of variables:
- `proxy_env` # if required (*for download packages*)

example:
```
proxy_env:
http_proxy: http://proxy_server_ip:port
https_proxy: http://proxy_server_ip:port
```
- `cluster_vip` # for client access to databases in the cluster (optional)
- `patroni_cluster_name`
- `with_haproxy_load_balancing` `'true'` (Type A) or `'false'`/default (Type B)
- `postgresql_version`
- `postgresql_data_dir`
- `with_haproxy_load_balancing` `'true'` (Type A) or `'false'`/default (Type B)
- `dcs_type` # "etcd" (default) or "consul" (Type C)

if `dcs_type: "consul"`, please install consul role requirements on the control node:

`ansible-galaxy install -r roles/consul/requirements.yml`

5. Try to connect to hosts

Expand Down
97 changes: 97 additions & 0 deletions consul.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
---
- hosts: localhost
any_errors_fatal: true
gather_facts: false
vars_files:
- vars/main.yml
tasks:
- name: Check if the consul role requirements (ansible.utils) are installed
command: ansible-galaxy collection list ansible.utils
changed_when: false
failed_when: false
register: ansible_utils_result

- name: Consul role requirements
fail:
msg:
- "Please install consul role requirements (ansible.utils)"
- "ansible-galaxy install -r roles/consul/requirements.yml"
when:
- ansible_utils_result.stderr is search("unable to find")

- hosts: consul_instances
become: true
become_method: sudo
any_errors_fatal: true
gather_facts: true
vars_files:
- vars/main.yml
- vars/system.yml
environment: "{{ proxy_env | default({}) }}"

pre_tasks:
- name: Include OS-specific variables
include_vars: "vars/{{ ansible_os_family }}.yml"
when: not ansible_os_family == 'Rocky' and not ansible_os_family == 'AlmaLinux'
tags: always

# For compatibility with Ansible old versions
# (support for RockyLinux and AlmaLinux has been added to Ansible 2.11)
- name: Include OS-specific variables
include_vars: "vars/RedHat.yml"
when: ansible_os_family == 'Rocky' or ansible_os_family == 'AlmaLinux'
tags: always

- name: Update apt cache
apt:
update_cache: true
cache_valid_time: 3600
when: ansible_os_family == "Debian" and installation_method == "repo"

- name: Make sure the gnupg and apt-transport-https packages are present
apt:
pkg:
- gnupg
- apt-transport-https
state: present
when: ansible_os_family == "Debian" and installation_method == "repo"

- name: Make sure the python3-pip package are present
package:
name: python3-pip
state: present

- name: Build a firewall_ports_dynamic_var
set_fact:
firewall_ports_dynamic_var: "{{ firewall_ports_dynamic_var | default([]) + (firewall_allowed_tcp_ports_for[item]) }}"
loop: "{{ hostvars[inventory_hostname].group_names }}"
when: firewall_enabled_at_boot|bool
tags: firewall

- name: Build a firewall_rules_dynamic_var
set_fact:
firewall_rules_dynamic_var: "{{ firewall_rules_dynamic_var | default([]) + (firewall_additional_rules_for[item]) }}"
loop: "{{ hostvars[inventory_hostname].group_names }}"
when: firewall_enabled_at_boot|bool
tags: firewall

roles:
- role: ansible-role-firewall
vars:
firewall_allowed_tcp_ports: "{{ firewall_ports_dynamic_var | unique }}"
firewall_additional_rules: "{{ firewall_rules_dynamic_var | unique }}"
when: firewall_enabled_at_boot|bool
tags: firewall

- role: hostname
- role: resolv_conf
vars:
nameservers: [127.0.0.1] # add a nameserver entry poining to localhost for dnsmasq.
- role: etc_hosts
- role: sysctl
- role: timezone
- role: ntp

- role: consul

...
80 changes: 57 additions & 23 deletions deploy_pgcluster.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,15 +13,71 @@
msg: "Ansible version must be {{ minimal_ansible_version }} or higher"
when: ansible_version.full is version(minimal_ansible_version, '<')

- name: Gathering facts from all servers
- name: Gathering facts from all servers and preparing the system
hosts: all
become: true
become_method: sudo
gather_facts: true
tags: always
vars_files:
- vars/main.yml
- vars/system.yml
environment: "{{ proxy_env | default({}) }}"

roles:
- role: resolv_conf
- role: hostname
- role: etc_hosts
- role: timezone

tasks:
- name: Clean yum cache
command: yum clean all
when:
- ansible_os_family == "RedHat"
- ansible_distribution_major_version == '7'

- name: Clean dnf cache
command: dnf clean all
when:
- ansible_os_family == "RedHat"
- ansible_distribution_major_version is version('8', '>=')

- name: Update apt cache
apt:
update_cache: true
cache_valid_time: 3600
when: ansible_os_family == "Debian"

- name: Make sure the gnupg and apt-transport-https packages are present
apt:
pkg:
- gnupg
- apt-transport-https
state: present
when: ansible_os_family == "Debian"

# Ansible requires the iproute package for network facts to be populated
- name: Make sure that the iproute is installed
package:
name: iproute
state: present
when: ansible_os_family == "RedHat"

- name: Make sure that the iproute is installed
apt:
name: iproute2
state: present
when: ansible_os_family == "Debian"

- import_playbook: etcd_cluster.yml
when: not dcs_exists|bool and dcs_type == "etcd"
tags: etcd

- import_playbook: consul.yml
when: dcs_type == "consul"
tags: consul

- hosts: postgres_cluster
become: true
become_method: sudo
Expand Down Expand Up @@ -54,24 +110,6 @@
msg: "{{ ansible_distribution_version }} of {{ ansible_distribution }} is not supported"
when: ansible_distribution_version is version_compare(os_minimum_versions[ansible_distribution], '<')

- name: Update apt cache
apt:
update_cache: true
cache_valid_time: 3600
environment: "{{ proxy_env | default({}) }}"
when: ansible_os_family == "Debian" and installation_method == "repo"
tags: add_repo, install_packages, install_postgres

- name: Make sure the gnupg and apt-transport-https packages are present
apt:
pkg:
- gnupg
- apt-transport-https
state: present
environment: "{{ proxy_env | default({}) }}"
when: ansible_os_family == "Debian" and installation_method == "repo"
tags: add_repo, install_packages, install_postgres

- name: Build a firewall_ports_dynamic_var
set_fact:
firewall_ports_dynamic_var: "{{ firewall_ports_dynamic_var | default([]) + (firewall_allowed_tcp_ports_for[item]) }}"
Expand All @@ -95,9 +133,6 @@
when: firewall_enabled_at_boot|bool
tags: firewall

- role: hostname
- role: resolv_conf
- role: etc_hosts
- role: add-repository
- role: packages
- role: sudo
Expand All @@ -107,7 +142,6 @@
- role: pam_limits
- role: io-scheduler
- role: locales
- role: timezone
- role: ntp
- role: ssh-keys
- role: copy
Expand Down
Binary file added images/TypeC.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading