Skip to content

Latest commit

 

History

History
467 lines (320 loc) · 11.6 KB

postgres.md

File metadata and controls

467 lines (320 loc) · 11.6 KB

Postgres for Python Programmers

Debian CLI

  • pgcreatecluster - create a new cluster of databases. Useful if you want to run two different versions of postgres concurrently.

  • service postgresql start - start postgres

  • service postgresql stop - stop postgres

DB Creation Tips

  • always create DBs as UTF8

Postgres configuration files

  • pg_hba.conf and postgresql.conf are the main config files. pg_hba.conf controls login, postgresql.conf controls server config.

Important Params

Logging

  • Be generous with logging
  • Low impact
  • Best source of info for debugging
  • log_destination controls how to log e.g. syslog
  • log_directory controls the directory to log into e.g. pg_log

Memory

  • shared_buffers

    • below 2GB: 20%
    • above 32GB: 8GB
  • work_mem

    • 32-64 MB
    • look for 'temporary file' lines in logs
    • set for 2-3x the largest temp file you see
    • Huge speed-up if set correctly
    • Remember, each connection can spawn multiple nodes so don't set this too high
  • maintenance_work_mem = 10% of sys memory up to 1GB

Checkpoints

  • Everything that is in shared_buffers gets flushed to disk

  • Potentially a lot of I/O

  • wal_buffers = 16MB

  • checkpoint_completion_target = 0.9

  • checkpoint_timeout = between 10m and 30m

  • checkpoint_segments = arbitrary, start at 32

  • Now look in the log for how often checkpoint entries are occurring. If they are happening more often than checkpoint_timeout, increase the number of checkpoint_segments

Planner settings

  • effective_io_concurrency - Set to the number of I/O channels. Or ignore if you don't know the hardware config.

  • random_page_cost - By default 4.0, bring down to 3.0 for raid, EBS or SSD should be 1.1.

DO NOT TOUCH

  • fsync = on
  • synchronous_commit

  • Most settings just need a reload
  • Some, like shared_buffers, need a full server restart
  • Some can be changed on the fly with set local <param> = <value>;

Users and roles

  • A 'role' is a db object that can own other objects
  • A 'user' is a login
  • Don't use the 'postgres' superuser for apps
  • However, you will probably have to grant schema-modification privileges to your app user

User security

  • Turn on SSL

pg_hba.conf

  • Method:
    • trust: if they can log into the system, they can do anything
    • peer: if the unix and postgres user match, let them log in
    • md5: md5-hashed password

The WAL (Write-Ahead Log)

  • Write-ahead log governs replication, crash-recovery, etc.
  • When each transaction is committed, it is logged to the WAL
  • Pushed to disk before the transaction is confirmed
  • If the system crashes, the WAL is 'replayed'

pg_xlog

  • WAL is stored in the pg_xlog directory
  • Don't mess with it
  • Postgres recycles them

Archiving

  • archive_command
  • Runs a command each time a WAL segment completes
  • For example: copy and move to someplace safe
  • Therefore you can recover from terrible things like a full crash

synchronous_commit

  • 'on': COMMIT doesn't return until WAL flush is done
  • 'off' COMMIT returns when WAL flush is queued
  • Can lose transactions on a crash, but faster
  • Data cannot be corrupted, just lost

Backup + Recovery

pg_dump

  • takes a logical snapshot of the db at the moment it starts
  • runs concurrently with everything except schema changes
  • no locks/writes
  • low, non-zero load on the database

pg_restore

  • recreates the DB from a pg_dump output

  • N.B. most of the time is recreating indeces since these aren't copied over

  • Back up globals with pg_dumpall --globals-only

  • Dump with pg_dump --format=custom

  • Restore in parallel

PITR backup/recovery

  • Snapshot of the data directory...

  • ...will not be consistent, but if we have the WAL segments that were created after it we can restore consistency.

  • Check out WAL-E

Recovery

  • Copy disk image
  • Set up recovery.conf to point to WAL files
  • Start up Postgres and let it recover
  • More WAL files = slower
  • Takes 10-20% of the time it took to make the WAL files originally
  • N.B. don't have to replay the entire WAL. Can stop at a safe/stable point

Replication

  • What if we just sent the WAL to another server?
  • That is postgres replication

WAL Archiving

  • Each 16MB segment is sent to the secondary when complete
  • Secondary reads and applies it
  • Make use WAL is copied atomically (e.g. using rsync)

Streaming Replication

  • Pushes WAL down through a TCP/IP connection (postgres connection)
  • Secondary continuously applies these
  • recovery.conf orchestrates this. Lives in $PGDATA
  • restore_command tells postgres what command to restore from e.g. wal-fetch

Disaster Recovery

  • What if AWS region goes down?
  • Have a recovery plan from a remote site (e.g. Multi-AZ)
  • Probably best done with wal_archive

pg_basebackup

  • Utility for doing a snapshot of a running server
  • Easy way to take a snapshot for a secondary

The bad parts of replication

  • Entire dbs or none
  • Can't write anything else to the secondary, even temporary tables. Sucks for data analysis

Trigger-based Replication

  • Only want to replicate a certain set of tables
  • Installs triggers on tables on the master DB
  • A daemon watches these and applies them to secondaries
  • E.g. slony
  • Good:
    • Highly configurable
  • Bad:
    • Really fiddly and complex to set up
    • Schema changes have to be pushed out manuallys
    • Overhead

Transactions, MVCC and VACCUUM

Transactions

  • An atomic thing to add to the DB
  • Once the COMMIT completes, data has been written to storage
  • If a database crashes, any transactions will be COMMITed or not - all or nothing
  • No transaction can see another
  • In postgres, everything is a transaction
  • Even schema changes (e.g. South)

Warnings

  • Can only rollback the current transaction
  • Resources are held until the end of transactions e.g. locks
  • Be aware of IDLE IN TRANSACTION sessions

MVCC

  • How do we deal with concurrency?

    • Lock the database (old Mongo)
    • Lock the table (new Mongo)
    • Lock the tuple
  • Actually, postgres creates multiple 'versions' of the DB for each transaction

  • Readers don't block readers

  • Readers don't block writ theers

  • Writers don't block readers

  • Writers block writers to the same tuple

Snapshots

  • Can't 'fork' DB and combine two changes to the same thing

  • So transactions are blocked if they try to hit the same tuple

  • REPEATABLE READ and SERIALIZABLE take snapshots when transaction begins

  • Not every conflict can be detected at the single-tuple level.

  • Eg INSERTING values

  • A consequence of MVCC is that deleted tuples are not immediately freed because there may be multiple snapshots.

  • End up with dead tuples

  • So therefore we have: VACUUM

VACUUM

  • Clears out dead tuples.
  • Run automatically by AUTOVACUUM.
  • AUTOVACUUM also runs ANALYZE, which tells the query planner how to make queries
  • VACUUM suffers with high write rates

Optimizing

  • Reduce autovacuum sleep time
  • Increase autovacuum time

Schema Design

  • Keep attributes of entities in their entities
  • Consistent plural/singular tables
  • lower_case table names
  • "Normalization" means not repeating data in places
  • "Denormalization" is sometimes useful for storing calculated values so you don't have to do joins over tables loads. For simple values, don't copy

Joins

  • PostgreSQL executes joins efficiently
  • Don't be afraid!
  • Especially don't worry about large sets joining small sets

Types

  • Use the typing system
  • Use domains to create custom types
  • NO POLYMORPHIC FIELDS

Constraints

  • Use them. Cheap + fast.

  • Multi-column constraints are good

  • Exclusion constraints for constraints on multiple rows

  • Avoid EAV schemas. What even is that?

Keys

  • Use UUIDs instead of serials to avoid merging tables

Fast/Slow:

  • If a table has a frequently-updated section and a slowly-updated section, consider splitting the table. (Impact/Engagement?)

Arrays:

  • Good. Often better than many to many.

HStore:

  • Good for optional variables. (Languages on NewsEntities?)

JSONB:

  • Woooooooooo

NULL:

  • NULL sucks
  • only use it to mean missing value
  • NULL != NULL

Very Large Objects

  • Don't store images/video in the DB or you are an idiot
  • Store metadata that points to them
  • Database API is not designed for this

Many-to-many Tables

  • These are big
  • Consider replacing with array fields, either one or both directions
    • can use a trigger to maintain integrity
    • Up to about a megabyte/250k entries is practical
  • M2M = N^2 growth pattern

Time representation

  • TIMESTAMPTZ = timestamp in UTC
  • TIMESTAMP = timestamp in some timezone you hope you know what it is

Indexes

  • Databases should work identically with or without indexes

B-Tree

  • Standard postgres index

  • Single-, multi-column and functional indexes

  • Create when query:

    • is going to select <15% of the rows
    • uses comparison operator
    • run frequently
  • Could be ascending or descending

Partial Indexes

  • Can add a WHEN clause to limit the size of an index (activity feed?)

Other index types

  • GiST - geometric data
  • KNN indexes - nearest neighbour
  • GIN - text search

Creating indexes

  • CREATE INDEX CONCURRENTLY : slower but non-blocking

Checking index usage

  • pg_stat_user_indexes

Queries

Cost

  • EXPLAIN

  • Measured in arbitrary units

  • First number is startup cost (get first tuple), second is total cost (get all tuples)

  • Costs are inclusive of sub-nodes

  • EXPLAIN ANALYZE uses actual ms, but executes query

  • Also returns estimated and actual rows - big differences are bad

Bad things

  • JOINS between two very large tables
  • Sequential scan on large tables
  • SELECT COUNT(*)

Identifying slowness

  • pg_stat_activity
  • tail -f the logs
  • iostat to check I/O usage
  • pg_locks
  • Bounce between stat_activity and locks to identify bad transactions

Python Particulars

psycopg2 for python2

  • The result set of a query is loaded into client memory when the query completes
  • Regardless of the size of the result set
  • If you want to scroll through results, use a named cursor(?)

py-postgresql

  • Python 3.x

Django

  • If you are running on 1.6+, always use @atomic
  • cluster writes into small transactions
  • multi-DB is awesome for hot standby (writes to master, reads from secondary)

South

  • Use it.
  • Create manual migrations for schema changes that Django can't handle.

Special Situations

Minor version upgrades

  • Do these
  • Could be for security or data corruption
  • Only need to install new binaries

Major version upgrades

  • Requires planning
  • pg_upgrade is now reliable
  • Trigger-based replication is another option for zero downtime
  • A full pg_dump/pg_restore is the way to go for safety
  • All parts of a replication set need to be upgraded at the same time

Bulk-loading Data

  • Use COPY not INSERT
  • VACUUM after

Very high insert rates

  • Reduce shared buffers
  • Drop down

AWS

  • Generally, works like any other system
  • Always have a good backup solution
  • PIOPS are useful
  • Scale up and down to meet load on AWS

RDS

  • Overall, not bad
  • Automatic failover (AWESOME)
  • No reading from the secondary
  • Expensive
  • Generous set of extensions

Sharding

  • No integrated multi-master solution yet
  • pgpool_2

Tools

Monitoring

  • Nagios

Log Analysis

  • pgbadger
  • pg_stat_statements

Connection Tuning

  • MAX_CONNECTIONS = maximum number of simultaneous queries postgres can excecute at that moment
  • 150, 200
  • pgbouncer: give lots of client connections, MAX_CONNS - 10 database connections