What You Should Know About MySQL Replication

on September 28, 2015

Replication in MySQL is the dynamic process of synchronizing data between a primary (main) database server and one or more secondary (replica) database servers in near-real time. Using this process, it’s possible to create copies of one or more databases so that even if the primary server fails, data can still be recovered from one of the secondary servers.

Replication is essential for many applications, and the lack of replication support was a major drawback to MySQL compared to other relational database management systems (RDBMSs). MySQL 3.23 was the first version to introduce replication support, and support has improved continually in subsequent versions. However, MySQL is still best suited for one-way replication, where you have one primary and one or more secondary database.

TIP

As much as possible, try to use the same version of MySQL for both server(s). A version mismatch can sometimes result in erratic replication behavior

Why replication? There are four common reasons.

To create a standby database server. If the primary server fails, the standby can step in, take over, and immediately be current. For any organization that has mission-critical, time-sensitive tasks involving its database, this is a must!
To enable backups without having to bring down or lock out the primary server. After replication takes place, backups are done on the secondary server, rather than on the primary one. This way, the main can be left to do its job without disturbance.
To keep data current across multiple locations. Replication is necessary if several branches of an organization need to work from a current copy of the same database.
To balance the workload of multiple servers. By making it possible to create mirror images of one database on multiple servers, replication can help alleviate the woes of a single overloaded database server by splitting queries between multiple servers, each running on separate hardware.

Now that you have an idea why you might want to set up replication, let’s look at some of the concepts on which it’s based.

The Primary-Secondary Server Relationship

As previously stated, replication requires at least two servers. The servers are set up such that the first server, called the primary, enters into a relationship with the other server, called the secondary. Periodically, the latest changes to the database on the primary server are transferred to the secondary. Through this replication relationship, an updated database can be propagated throughout an enterprise into multiple secondary servers, but only one primary server can be in a replication relationship at any one time. It’s also possible to “promote” a secondary server, if necessary.

As a necessary prelude to configuring servers for replication, both primary and secondary servers must be synchronized so that the databases being replicated are the same at both ends of the replication connection. Once this is accomplished, it becomes critical for all updates to be done on the primary server, and not on the secondary, to avoid confusion about the sequence of the updates.

In addition, binary update logging must be enabled on the main server for replication to take place. This is because updates are transferred from the primary server to the secondary via the primary server’s binary update logs. Replication is based on the concept that the primary keeps track of the changes to the database through the binary logs and the secondary updates its copy of the database by executing the changes recorded on the same logs.

Once the primary and secondary servers are configured, the process begins with the replica contacting the main and requesting updates. Permissions for this must be enabled on the replica server(s). The replica informs the main server of the point in the binary log where the last update occurred, and then it begins the process of adding the new updates. Once completed, the secondary notes where it left off and connects periodically to the primary server, checking for the next round of changes. This process continues for as long as replication is enabled.

Replication Threads

Three threads are involved in replication: one on the main server and two on the replica. The I/O thread on the replica connects to the main server and requests the binary update log. The binary log dump thread on the primary server sends the binary update log to the secondary on request. Once on the secondary, the I/O thread reads the data sent by the primary server and copies it to the relay log in the secondary’s data directory. The third thread, also on the secondary server, is the SQL thread, which reads and executes the queries from the relay log to bring the secondary in alignment with the primary server.

The relay logs on the replica are in the same format as binary logs. Once all the events in the relay log are executed, the SQL thread automatically deletes the log. A new relay log is automatically created when an I/O thread starts. It’s worth pointing out that MySQL replication is asynchronous and so the replica needn’t be connected to the main server all the time; it has the capability to keep track of where it left off and automatically get itself current, regardless of how much time has passed since the last update took place.

Note

The reason for two separate secondary threads? Performance! By being independent of each other, the processes of reading and writing on the secondary can occur simultaneously. Because the execution of the SQL commands on the secondary takes longer than reading and copying the binary logs to the relay logs, splitting these two functions also makes sense in terms of efficiency on the primary server. The binary logs can be safely purged from the primary because a copy of them already exists on the secondary, even if all the updates to the secondary haven’t yet been committed.

Learn more about MySQL replication methods here.