-
pgcreatecluster
- create a new cluster of databases. Useful if you want to run two different versions of postgres concurrently. -
service postgresql start
- start postgres -
service postgresql stop
- stop postgres
- always create DBs as
UTF8
pg_hba.conf
andpostgresql.conf
are the main config files.pg_hba.conf
controls login,postgresql.conf
controls server config.
- Be generous with logging
- Low impact
- Best source of info for debugging
log_destination
controls how to log e.g.syslog
log_directory
controls the directory to log into e.g.pg_log
-
shared_buffers
- below 2GB: 20%
- above 32GB: 8GB
-
work_mem
- 32-64 MB
- look for 'temporary file' lines in logs
- set for 2-3x the largest temp file you see
- Huge speed-up if set correctly
- Remember, each connection can spawn multiple nodes so don't set this too high
-
maintenance_work_mem
= 10% of sys memory up to 1GB
-
Everything that is in
shared_buffers
gets flushed to disk -
Potentially a lot of I/O
-
wal_buffers
= 16MB -
checkpoint_completion_target
= 0.9 -
checkpoint_timeout
= between 10m and 30m -
checkpoint_segments
= arbitrary, start at 32 -
Now look in the log for how often checkpoint entries are occurring. If they are happening more often than
checkpoint_timeout
, increase the number ofcheckpoint_segments
-
effective_io_concurrency
- Set to the number of I/O channels. Or ignore if you don't know the hardware config. -
random_page_cost
- By default 4.0, bring down to 3.0 for raid, EBS or SSD should be 1.1.
fsync = on
synchronous_commit
- Most settings just need a reload
- Some, like
shared_buffers
, need a full server restart - Some can be changed on the fly with
set local <param> = <value>;
- A 'role' is a db object that can own other objects
- A 'user' is a login
- Don't use the 'postgres' superuser for apps
- However, you will probably have to grant schema-modification privileges to your app user
- Turn on SSL
- Method:
- trust: if they can log into the system, they can do anything
- peer: if the unix and postgres user match, let them log in
- md5: md5-hashed password
- Write-ahead log governs replication, crash-recovery, etc.
- When each transaction is committed, it is logged to the WAL
- Pushed to disk before the transaction is confirmed
- If the system crashes, the WAL is 'replayed'
- WAL is stored in the
pg_xlog
directory - Don't mess with it
- Postgres recycles them
archive_command
- Runs a command each time a WAL segment completes
- For example: copy and move to someplace safe
- Therefore you can recover from terrible things like a full crash
- 'on': COMMIT doesn't return until WAL flush is done
- 'off' COMMIT returns when WAL flush is queued
- Can lose transactions on a crash, but faster
- Data cannot be corrupted, just lost
- takes a logical snapshot of the db at the moment it starts
- runs concurrently with everything except schema changes
- no locks/writes
- low, non-zero load on the database
-
recreates the DB from a
pg_dump
output -
N.B. most of the time is recreating indeces since these aren't copied over
-
Back up globals with
pg_dumpall --globals-only
-
Dump with
pg_dump --format=custom
-
Restore in parallel
-
Snapshot of the data directory...
-
...will not be consistent, but if we have the WAL segments that were created after it we can restore consistency.
-
Check out WAL-E
- Copy disk image
- Set up recovery.conf to point to WAL files
- Start up Postgres and let it recover
- More WAL files = slower
- Takes 10-20% of the time it took to make the WAL files originally
- N.B. don't have to replay the entire WAL. Can stop at a safe/stable point
- What if we just sent the WAL to another server?
- That is postgres replication
- Each 16MB segment is sent to the secondary when complete
- Secondary reads and applies it
- Make use WAL is copied atomically (e.g. using
rsync
)
- Pushes WAL down through a TCP/IP connection (postgres connection)
- Secondary continuously applies these
recovery.conf
orchestrates this. Lives in$PGDATA
restore_command
tells postgres what command to restore from e.g.wal-fetch
- What if AWS region goes down?
- Have a recovery plan from a remote site (e.g. Multi-AZ)
- Probably best done with
wal_archive
- Utility for doing a snapshot of a running server
- Easy way to take a snapshot for a secondary
- Entire dbs or none
- Can't write anything else to the secondary, even temporary tables. Sucks for data analysis
- Only want to replicate a certain set of tables
- Installs triggers on tables on the master DB
- A daemon watches these and applies them to secondaries
- E.g. slony
- Good:
- Highly configurable
- Bad:
- Really fiddly and complex to set up
- Schema changes have to be pushed out manuallys
- Overhead
- An atomic thing to add to the DB
- Once the COMMIT completes, data has been written to storage
- If a database crashes, any transactions will be COMMITed or not - all or nothing
- No transaction can see another
- In postgres, everything is a transaction
- Even schema changes (e.g. South)
- Can only rollback the current transaction
- Resources are held until the end of transactions e.g. locks
- Be aware of IDLE IN TRANSACTION sessions
-
How do we deal with concurrency?
- Lock the database (old Mongo)
- Lock the table (new Mongo)
- Lock the tuple
-
Actually, postgres creates multiple 'versions' of the DB for each transaction
-
Readers don't block readers
-
Readers don't block writ theers
-
Writers don't block readers
-
Writers block writers to the same tuple
-
Can't 'fork' DB and combine two changes to the same thing
-
So transactions are blocked if they try to hit the same tuple
-
REPEATABLE READ and SERIALIZABLE take snapshots when transaction begins
-
Not every conflict can be detected at the single-tuple level.
-
Eg INSERTING values
-
A consequence of MVCC is that deleted tuples are not immediately freed because there may be multiple snapshots.
-
End up with dead tuples
-
So therefore we have: VACUUM
- Clears out dead tuples.
- Run automatically by AUTOVACUUM.
- AUTOVACUUM also runs ANALYZE, which tells the query planner how to make queries
- VACUUM suffers with high write rates
- Reduce autovacuum sleep time
- Increase autovacuum time
- Keep attributes of entities in their entities
- Consistent plural/singular tables
- lower_case table names
- "Normalization" means not repeating data in places
- "Denormalization" is sometimes useful for storing calculated values so you don't have to do joins over tables loads. For simple values, don't copy
- PostgreSQL executes joins efficiently
- Don't be afraid!
- Especially don't worry about large sets joining small sets
- Use the typing system
- Use domains to create custom types
- NO POLYMORPHIC FIELDS
-
Use them. Cheap + fast.
-
Multi-column constraints are good
-
Exclusion constraints for constraints on multiple rows
-
Avoid EAV schemas. What even is that?
- Use UUIDs instead of serials to avoid merging tables
- If a table has a frequently-updated section and a slowly-updated section, consider splitting the table. (Impact/Engagement?)
- Good. Often better than many to many.
- Good for optional variables. (Languages on NewsEntities?)
- Woooooooooo
- NULL sucks
- only use it to mean missing value
- NULL != NULL
- Don't store images/video in the DB or you are an idiot
- Store metadata that points to them
- Database API is not designed for this
- These are big
- Consider replacing with array fields, either one or both directions
- can use a trigger to maintain integrity
- Up to about a megabyte/250k entries is practical
- M2M = N^2 growth pattern
TIMESTAMPTZ
= timestamp in UTCTIMESTAMP
= timestamp in some timezone you hope you know what it is
- Databases should work identically with or without indexes
-
Standard postgres index
-
Single-, multi-column and functional indexes
-
Create when query:
- is going to select <15% of the rows
- uses comparison operator
- run frequently
-
Could be ascending or descending
- Can add a WHEN clause to limit the size of an index (activity feed?)
- GiST - geometric data
- KNN indexes - nearest neighbour
- GIN - text search
CREATE INDEX CONCURRENTLY
: slower but non-blocking
pg_stat_user_indexes
-
EXPLAIN
-
Measured in arbitrary units
-
First number is startup cost (get first tuple), second is total cost (get all tuples)
-
Costs are inclusive of sub-nodes
-
EXPLAIN ANALYZE uses actual ms, but executes query
-
Also returns estimated and actual rows - big differences are bad
- JOINS between two very large tables
- Sequential scan on large tables
- SELECT COUNT(*)
pg_stat_activity
tail -f
the logsiostat
to check I/O usagepg_locks
- Bounce between
stat_activity
andlocks
to identify bad transactions
- The result set of a query is loaded into client memory when the query completes
- Regardless of the size of the result set
- If you want to scroll through results, use a named cursor(?)
- Python 3.x
- If you are running on 1.6+, always use @atomic
- cluster writes into small transactions
- multi-DB is awesome for hot standby (writes to master, reads from secondary)
- Use it.
- Create manual migrations for schema changes that Django can't handle.
- Do these
- Could be for security or data corruption
- Only need to install new binaries
- Requires planning
pg_upgrade
is now reliable- Trigger-based replication is another option for zero downtime
- A full
pg_dump
/pg_restore
is the way to go for safety - All parts of a replication set need to be upgraded at the same time
- Use COPY not INSERT
- VACUUM after
- Reduce shared buffers
- Drop down
- Generally, works like any other system
- Always have a good backup solution
- PIOPS are useful
- Scale up and down to meet load on AWS
- Overall, not bad
- Automatic failover (AWESOME)
- No reading from the secondary
- Expensive
- Generous set of extensions
- No integrated multi-master solution yet
pgpool_2
- Nagios
pgbadger
pg_stat_statements
- MAX_CONNECTIONS = maximum number of simultaneous queries postgres can excecute at that moment
- 150, 200
- pgbouncer: give lots of client connections, MAX_CONNS - 10 database connections