Replication with PostgreSQL on Ubuntu
In this guide, you will learn how to setup replication between two PostgreSQL servers on Ubuntu. Along with the how and why of what you will learn.
Contents
- Why Replication?
- Ubuntu and PostgreSQL Version Compatibility
- Install Prerequisites
- Install PostgreSQL
- Create Replication User
- Replica Connection Permission
- Configuration for Creating Archive Files for Replication
- Make Folder for Saving Archive To
- Restart Primary Server
- Replication Server
- Configure Replica postgresql.config
- Sync Primary Data to Replica
- Configure recovery.conf for Continuing Replication
- Restart Replica Server
- Test Replciation is Working
- Final Thoughts
Why Replication?
Database replication is fairly important when you want to scale your database horizontally. Services like Amazon RDS make it super easy. However, that can be expensive, or as I found in my case, I couldn't use RDS. We ended up needing to use a plugin that wasn't available for RDS so we have to host our own.
The process of setting up replication contains multiple steps, and if you get them wrong it wont work, and wont easily tell you why. Through this guide I will show you what you need to do to setup replication with ubuntu. Also, hopefully, enough information to figure it out for other distributions as well. I also hope to explain what is going on so you can understand the why of what you are doing.
Ubuntu and PostgreSQL Version Compatibility
Below are the versions of postgres and ubuntu I have tested this guide on to make sure that it works. As new versions of both come out I hope to test them to keep this guide up-to-date.
Ubuntu 14.04 | Ubuntu 16.04 | Ubuntu 18.04 | |
---|---|---|---|
PostgreSQL - 10.0 | |||
PostgreSQL - 9.6 | |||
PostgreSQL - 9.5 |
Install Prerequisites
With ubuntu we need to add the apt repository so that we can install postgres. One of the frustrating sides to notice in this is the pgdg
if you don't get that it wont work. However, you should just be able to copy and paste the following 4 lines and it should work no matter your version of ubuntu.
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
sudo apt-get install wget ca-certificates
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add -
sudo apt-get update
Install PostgreSQL
Next is to actually install PostgreSQL. Through the rest of this guide I'll add the line or file you need to use for each version of postgres that you want to use. Only use that line or frustration will happen. Now all we need to do is a simple apt-get install
and we are good to go.
sudo apt-get install postgresql-10
sudo apt-get install postgresql-9.6
sudo apt-get install postgresql-9.5
Create Replication User
Now that Postgres is installed we need to create our user for doing replication. Postgres requires a user with replication privileges in order to pull data over. By convention people use a user called replication
with the replication
permission.
First thing we need to do is log in to postgres, and the easiest way to do that is to switch users to the postgres
user. Once you are the postgres
user use psql
to log in and execute the CREATE USER
command.
sudo su - postgres
psql
CREATE USER replication REPLICATION LOGIN ENCRYPTED PASSWORD 'password1';
CREATE USER
will create the replication user, set the replication permission, set the user to be able to login, and encrypt the password in the database. It is recommended to use a different password than password1
. Also please do encrypt the password because if you don't it can lead to headaches later on with other tool configurations.
Replica Connection Permission
Now that we have our user we need to give them permission to actually connect to the primary postgres server. We do that in the pg_hba.conf
file. In the file we declare what user, database, and where they are connecting from. In this case our replica server is at ip 192.168.1.102
so we are explicitly telling our server to allow connections from the user replication
from that ip
only. The more servers or users you have the more you add to this list.
/etc/postgresql/9.6/main/pg_hba.conf
/etc/postgresql/9.5/main/pg_hba.conf
host replication replication 192.168.1.102/32 md5
Configuration for Creating Archive Files for Replication
One of the most critical parts of configuring replicaiton is the saving of data to archive files so they can be pulled over to replicas. The following configuration does just this.
So that you know WAL
stand for write-ahead logging
. It is very useful for a few things, but we are only going to focus on replication to other servers in this guide.
The first step is to set the wal_level
. I liken it to logging_level
it is how you configure the intensity, might be a bad analogy, but it is how I keep it in my head. We want to set it to replica
for postgre 9.6+, for 9.5 hot_standby
. We follow the wal_level
with a few other settings that are good defaults.
/etc/postgresql/9.6/main/postgresql.conf
listen_addresses = '*'
wal_level = replica # In releases prior to 9.6, this parameter also allowed the values archive and hot_standby. These are mapped to replica.
max_wal_senders = 2
wal_keep_segments = 32
archive_mode = on
archive_command = 'cp -i %p /var/lib/postgresql/10/main/archive/%f' # Use with PostgreSQL 10
archive_command = 'cp -i %p /var/lib/postgresql/9.6/main/archive/%f' # Use with PostgreSQL 9.6
listen_addresses = '*'
wal_level = hot_standby
max_wal_senders = 2
wal_keep_segments = 32
archive_mode = on
archive_command = 'cp -i %p /var/lib/postgresql/9.5/main/archive/%f'
To me the key to understanding and getting replication working is the archive_command
this is where you set the location of the archive files for our replicas to restore from. In the case of our archive folder we have it in the base of the postgres main install folder. This is an ideal location because later when we pull the data it will pull not only the archive but a similar set of configurations as well. That way we get a good configuration and setup for our replica.
Make Folder for Saving Archive To
So with the last paragraph in mind we need to make the archive
folder, and it needs to have the owner be the postgres
user. You can either use chown
or temporarily change to the postgres
user to create the folder.
sudo su - postgres
mkdir /var/lib/postgresql/10/main/archive
mkdir /var/lib/postgresql/9.6/main/archive
mkdir /var/lib/postgresql/9.5/main/archive
Restart Primary Server
At this point our primary server is done being configured, and we just need to restart postgres.
sudo systemctl restart postgresql # or
sudo service postgresql restart
Replication Server
The replication server(s) is as simple as a standard PostgreSQL server with a couple of configuration tweaks. First install postgres like we have done above, and then follow along below. It is fairly similar in that you install, tweak configuration, run a sync command, create a file and restart the server.
Configure Replica postgresql.config
Much like with the primary server just edit the postgresql.conf
file and add the following lines to the bottom. This sets up that we are going to pull data from the primary server.
/etc/postgresql/9.6/main/postgresql.conf
wal_keep_segments = 32
wal_level = replica
hot_standby = on
wal_log_hints = on
wal_keep_segments = 32
wal_level = hot_standby
hot_standby = on
wal_log_hints = on
Sync Primary Data to Replica
Now that our configuration is set we need to pull over all of our data, and some base files. We are running a main
setup for postgres so we need to remove that folder and all of its contents. Then we need to pull the data over from the primary server. The first is a simple rm -rf
command. The later uses pg_basebackup
pg_basebackup
is called with the host you want to pull data from. Then the directory we want to pull over. Since we deleted the main
folder we need to be sure to pull that one from the primary server. This will pull in the base files, plus all of the files in the archive
folder which we will use for restoring to our replica. You also need to do this as the postgres
user so that it sets proper permissions.
sudo su - postgres
rm -rf /var/lib/postgresql/10/main/
rm -rf /var/lib/postgresql/9.6/main/
rm -rf /var/lib/postgresql/9.5/main/
pg_basebackup -h 192.168.1.103 -D /var/lib/postgresql/10/main -P -U replication
pg_basebackup -h 192.168.1.103 -D /var/lib/postgresql/9.6/main -P -U replication
pg_basebackup -h 192.168.1.103 -D /var/lib/postgresql/9.5/main -P -U replication
Configure recovery.conf for Continuing Replication
At this point we have installed postgres, configured it, and pulled over all of the data. Now we need to add the file that will connect and constantly pull data over. It will also do the restore command of the data. The file needs to be named recovery.conf
, note the y
, and it needs to go in the main
folder we pulled over. This file needs to be added after a sync, and does not belong on the primary server.
The contents set the connection info to the primary server, along with password. It also sets the command for restoring files. This works since we are pulling WAL logs. It is just moving them to where postgres will restore them automatically. This file needs to be owned by the postgres
user.
/var/lib/postgresql/9.6/main/recovery.conf
/var/lib/postgresql/9.5/main/recovery.conf
standby_mode = 'on'
primary_conninfo = 'host=192.168.1.103 port=5432 user=replication password=password1'
trigger_file = '/var/tmp/standby.trig'
restore_command = 'cp /var/lib/postgresql/10/main/archive/%f %p'
restore_command = 'cp /var/lib/postgresql/9.6/main/archive/%f %p'
restore_command = 'cp /var/lib/postgresql/9.5/main/archive/%f %p'
Note the trigger_file
if you create that file it will stop the replication process on the replica server.
Restart Replica Server
Now our replica is configured so we just need to restart. After a restart it should start the process of restoreing replicated data, and pullig over new data as well.
sudo systemctl restart postgresql
Test Replciation is Working
Your database server should be replicating, but lets test it to be sure. It is as simple as creating a new database on the primary server, then logging in on the replica and validating it is there.
On Primary
sudo su - postgres
psql
create database test1;
On Replica
sudo su - postgres
psql
\l
Final Thoughts
This is just the tip of the iceberg with regards to what you can do with replication of postgresql, hopefully though it gives you a jumping off point. This is great for creating a primary server to write to and a replica you can read from so you can distribute some of the load of using a database. Another great thing about the read-only replica is it is ideal for setting up your database backups to pull from there, again reducing load on the primary.
At this point you should be good to go. The next thing I suggest is taking a look at pg-pool
for doing connection pooling between both of the servers. It is a great tool, and I hope to come out with a guide on it soon as well.