Replication in MySQL is the dynamic process of synchronizing data between a primary (master) database server and one or more secondary (slave) database servers in near-real time. Using this process, it’s possible to create copies of one or more databases so that even if the primary server fails, data can still be recovered from one of the secondary servers.
Replication is essential for many applications, and the lack of replication support was a major drawback to MySQL compared to other relational database management systems (RDBMSs). MySQL 3.23 was the first version to introduce replication support, and support has improved continually in subsequent versions. However, MySQL is still best suited for one-way replication, where you have one master and one or more slaves.
TIP
As much as possible, try to use the same version of MySQL for both the master and slave server(s). A version mismatch can sometimes result in erratic replication behavior
Why replication? There are four common reasons.
- To create a standby database server. If the primary server fails, the standby can step in, take over, and immediately be current. For any organization that has mission-critical, time-sensitive tasks involving its database, this is a must!
- To enable backups without having to bring down or lock out the master server. After replication takes place, backups are done on the slave, rather than on the master. This way, the master can be left to do its job without disturbance.
- To keep data current across multiple locations. Replication is necessary if several branches of an organization need to work from a current copy of the same database.
- To balance the workload of multiple servers. By making it possible to create mirror images of one database on multiple servers, replication can help alleviate the woes of a single overloaded database server by splitting queries between multiple servers, each running on separate hardware.
Now that you have an idea why you might want to set up replication, let’s look at some of the concepts on which it’s based.
The Master-Slave Relationship
As previously stated, replication requires at least two servers. The servers are set up such that the first server, called the master, enters into a relationship with the other server, called the slave. Periodically, the latest changes to the database on the master are transferred to the slave. Through this replication relationship, an updated database can be propagated throughout an enterprise into multiple slave servers, but only one master can be in a replication relationship at any one time. It’s also possible to “promote” a slave to a master, if necessary.
As a necessary prelude to configuring servers for replication, both master and slave servers must be synchronized so that the databases being replicated are the same at both ends of the replication connection. Once this is accomplished, it becomes critical for all updates to be done on the master, and not on the slave(s), to avoid confusion about the sequence of the updates.
In addition, binary update logging must be enabled on the master for replication to take place. This is because updates are transferred from the master to the slave via the master server’s binary update logs. Replication is based on the concept that the master keeps track of the changes to the database through the binary logs and the slave updates its copy of the database by executing the changes recorded on the same logs.
Once the master and slave servers are configured, the process begins with the slave contacting the master and requesting updates. Permissions for this must be enabled on the slave server(s). The slave informs the master of the point in the binary log where the last update occurred, and then it begins the process of adding the new updates. Once completed, the slave notes where it left off and connects periodically to the master, checking for the next round of changes. This process continues for as long as replication is enabled. Figure 1 illustrates this relationship.
Figure 1 The master-slave replication relationship
Replication Threads
Three threads are involved in replication: one on the master and two on the slave. The I/O thread on the slave connects to the master and requests the binary update log. The binary log dump thread on the master sends the binary update log to the slave on request. Once on the slave, the I/O thread reads the data sent by the master and copies it to the relay log in the slave’s data directory. The third thread, also on the slave, is the SQL thread, which reads and executes the queries from the relay log to bring the slave in alignment with the master.
The relay logs on the slave are in the same format as binary logs. Once all the events in the relay log are executed, the SQL thread automatically deletes the log. A new relay log is automatically created when an I/O thread starts. It’s worth pointing out that MySQL replication is asynchronous and so the slave needn’t be connected to the master all the time; it has the capability to keep track of where it left off and automatically get itself current, regardless of how much time has passed since the last update took place.
NOTE
The reason for two separate slave threads? Performance! By being independent of each other, the processes of reading and writing on the slave can occur simultaneously. Because the execution of the SQL commands on the slave takes longer than reading and copying the binary logs to the relay logs, splitting these two functions also makes sense in terms of efficiency on the master. The binary logs can be safely purged from the master because a copy of them already exists on the slave, even if all the updates to the slave haven’t yet been committed.
Learn more about MySQL replication methods here.
Leave a Reply