Transactional Replication

The estimated reading time for this post is 7 minutes

SQL Server Replication is a technology that you can use to synchronize data across multiple SQL Server instances, making it possible to distribute data throughout an enterprise organization. Replication can also improve availability by helping to ensure that if one server becomes unavailable, others can serve users; it also helps you to improve scalability by enabling multiple servers to share a common workload.

Implementation of SQL Server Replication has many advantages including: Load balancing, Improved availability, Reduced network traffic, Offline processing.

SQL Server replication uses a publishing metaphor to describe the key components in a replication system.

  • Publisher: A publisher is a SQL Server instance that makes data and database objects available to other locations by using replication.
  • Article: An article is a database object that is exposed through replication; for example, a table, a subset of data from a table, a view, or a stored procedure.
  • Publication: A publication is a collection of one or more articles.
  • Subscriber: A subscriber is a SQL Server instance that receives data published by a publisher.
  • Subscription: A subscription is a request for a subscriber to receive a copy of a publication.
  • Distributor: A distributor is a SQL Server instance that hosts one or more publications while it is being moved to subscribers.
  • Node: A node is a SQL Server instance that sends or receives data in the replication process; for example, a publisher or a subscriber.

SQL Server supports both push subscriptions and pull subscriptions. For push subscriptions, the articles are pushed from the distributor to the subscribers. For pull subscriptions, the articles are pulled from the distributor by the subscribers.

Latency defines the period of time during which copies of the replicated data may not be identical to each other because the replication process is incomplete. Depending on the type of replication that you are using, latency can vary from a few seconds to many days.

Autonomy defines the ability to use the replicated database without connecting to other databases in the replication system. Again, the type of replication that you use determines the autonomy of your databases.

Types of Replication

Snapshot replication: Snapshot is the simplest type of replication. When you initiate this, SQL Server takes a copy of the publication at that point in time and distributes it to subscribers. It does not monitor changes made to the publication after that point in time; however, if you later opt to synchronize a subscriber, a new copy of the database is taken and redistributed to the subscriber.

Snapshot replication exhibits high latency because the changes made at the publisher are only sent to the subscriber when a new snapshot is distributed. It is also highly autonomous because the subscriber has a complete copy of the data and database objects, and can work without a connection to the original publisher.

Snapshot replication can take a long time to complete because it copies the entire data set each time; therefore, it is best used when data is updated infrequently.

This type of replication is useful when one or more of the following are true:

  • You have data that changes very infrequently.
  • You do not need all copies of the data to always be completely up-to-date.
  • You are replicating small datasets.
  • You have disconnected sites and high latency is tolerable.

 

Transactional replication: Transactional replication enables you to track and distribute changes, including both data and schema changes, made to a publication after the initial snapshot is distributed. Changes are tracked at the publisher, passed to the distributor, and then on to the subscribers as they occur. This results in low latency because the subscribers are only marginally out of synchronization with the publisher. Data changes are applied to subscribers in the same order that they occurred at the publisher and with identical transactional boundaries. This helps to ensure that the data remains consistent across all copies of the database. 

This type of replication is useful when one or more of the following are true:

  • You have data that changes frequently at one central location.
  • You need all copies of the data to be as up-to date as possible.
  • You need all individual changes that occur at the publisher to be applied to the subscriber, not just the net result of those changes.

There is a specialized form of transactional replication, peer-to-peer (P2P) transactional replication that enables changes to be made at other nodes in the system.

P2P transactional replication extends the standard transactional replication and enables changes to be made to data at any node in the system; however, an individual row of data should be changed only one node at a time. Each node contains a writeable copy of the data and acts as both a publisher and subscriber, so that changes made at any node can be published from that node and subscribed at all others.

This type of replication is useful when one or more of the following are true:

  • You have subsets of data that change frequently at different locations within the system.
  • You need changes that occur at any node to be synchronized to all other nodes.
  • You need all copies of data to be as up-to-date as possible.

 

Merge replication: Merge replication takes transactional replication one stage further by enabling users to make changes to data and schema at any subscriber, propagate these changes back to the publisher, and then propagate them on to the other subscribers. In merge replication, transaction ordering is not maintained; changed rows are simply replicated between databases. You can either partition data across the subscribers to avoid the same data being changed in two locations or implement conflict detection and resolution to handle conflicts when they occur.

When you configure merge replication, SQL Server modifies the subscription database to include a unique identifier for each row in an article (unless the rows already contain a column with the ROWGUIDCOL property) and to add change tracking tables to the database. These new tables track which rows in an article are changed at that node so that only the changed rows are merged during synchronization.

This type of replication is useful when one or more of the following are true:

  • You have data that can be logically partitioned, with each partition being updated at a different location.
  • You have autonomous sites that need to update data while they are disconnected for the majority of the time, and then need to synchronize changes later with the publisher and other subscribers.

 

Replication agents

  • Snapshot Agent: The Snapshot Agent runs at the distributor. It creates the snapshot schema and data files for both snapshot and other types of replication. It stores the files in the snapshot folder and also records the synchronization of jobs in the distribution database. There is one snapshot agent per publication.
  • Distribution Agent: The Distribution Agent runs at the distributor for push subscriptions and at the subscribers for pull subscriptions. It applies the initial snapshot to subscribers in snapshot and transactional replication. When using transactional replication, it also moves the transactions that are stored in the distribution database to the subscribers.
  • Log Reader Agent: The Log Reader Agent runs at the distributor for transactional replication. It monitors the transaction log of published databases and copies the transactions that occur at the publisher to the distribution database on the distributor.
  • Merge Agent: The Merge Agent runs at the distributor for push subscriptions and at the subscriber for pull subscriptions in merge replication. It applies the initial snapshot to the subscriber, and then copies the data changes made at the publisher and subscribers to the other nodes in the replication system. By default, it copies the changes made at the subscriber to the publisher before copying the changes made at the publisher to the subscriber. It also resolves any conflicts that occur during the merge process. There is one merge agent per merge subscription.

The replication of data can be configured as a continuous, scheduled, or on-demand process.

Continuous replication reduces latency between publishers and subscribers; however, it can lead to an increase in network traffic. It is commonly used in transactional replication scenarios.

Scheduled replication aids off-hours replication and is commonly used for snapshot replication to avoid overloading servers during peak hours.

On-demand replication is commonly used for applications that provide a synchronization facility as part of their core functionality and is commonly used for merge replications.

There is some related subject such as replication security and agents roles, data validation and monitoring, troubleshooting replication… which I can not explain in this blog due to avoiding of being boring blog post.

Now lets go to implement Transactional Replication by SSMS step by step. Order of photos is from left to right side.

Now lets verify implemented Replication

I wish this blog post would be useful. Stay tuned for more blog posts.

Hamid J. Fard

I am SQL Server Data Platform Expert with more than 9 years’ of professional experience, I am currently Microsoft Certified Master: SQL Server 2008, Microsoft Certified Solutions Master: Charter-Data Platform, Microsoft Data Platform MVP and CIW Database Design Specialist. I also do Pig, Hive and Sqoop development regarding Hadoop Big Data platform. After a few years of being a production database administrator I jumped into the role of Data Platform Expert. Being a consultant allows me to work directly with customers to help solve questions regarding database issues for SQL Server.

More Posts

Follow Me:
FacebookLinkedIn


Leave a Comment

avatar
1000
  Subscribe  
Notify of