Oracle RAC vs. SQL Server AlwaysOn

The estimated reading time for this post is 16 minutes

sqlserverblog_05

Oracle RAC and its predecessor Oracle Parallel Server (OPS) have been around for about 20 years since the first release of OPS in 1992. Oracle RAC is a complex and archaic architecture that might not be suitable for customers, especially in the cloud computing era. Newer cloud computing applications require much larger scale (more nodes) than the practical scalability limits of Oracle RAC. Customers should look at cloud computing architectures for their applications, considering longer term needs (> 12 months) in application scalability in this new cloud computing era. Microsoft has a strong vision and offerings to usher customers to the full benefits of next generation cloud computing paradigm.
In addition, Microsoft offers database solution to meet customers’ requirements with much better value than Oracle RAC. Customers need to pay a much higher cost for Oracle RAC compared with SQL Server solution that will satisfy the equivalent requirements. The main question customers must ask themselves is whether it is justified to deploy Oracle RAC as a solution where it could cost more than 5X the SQL Server solution. Given that it can cost 5X more, does Oracle RAC provide more than 5X of better performance, scalability, and high availability than SQL Server?
We believe the answer is “no”, and customers should be looking at SQL Server solutions instead. Oracle customers seem to agree and understand as Oracle RAC customers make up a very small percentage of the entire Oracle install base (< 5%). Very few customers have deployed Oracle RAC in production because the cost and complexity of Oracle RAC deployment, management, and troubleshooting simply outweigh the benefits. In fact, in some cases, customers who have implemented or evaluated Oracle RAC have since decided to switch to SQL Server.

Finally, with economy conditions requiring IT to do more with less, Oracle RAC might not be the soundest investment. SQL Server can meet the same requirements with lower TCO (initial implementation cost and ongoing maintenance) than Oracle RAC. Customers should contact their Microsoft representatives and perform a SQL Server technology evaluation today.

Scalability with Microsoft Cloud Computing

Choosing the right database technology at the right time is crucial for all organizations and plays an important role in the company’s ability to plan for current and future market directions. Over the past several years, the concept of elastic cloud computing (the ability to scale out to hundreds of nodes as well as down) to meet user demand has become more realistic and affordable.
Microsoft has started to provide the tools that architects and developers need to keep their companies thriving in a competitive marketplace. The most recent capability from Microsoft is SQL Azure Federation. Federations in SQL Azure are a way to achieve greater scalability (hundreds of nodes) and performance from the database tier of the application through horizontal partitioning. Federations bring in the sharding pattern into SQL Azure as a first class citizen. Sharding pattern is used for building many sites on the web such as social networking sites, auction sites or scalable email applications such as Facebook, eBay and Hotmail. By bringing in the sharding pattern into SQL Azure, federations enable building scalable and elastic database tiers and greatly simplify developing and managing modern multi-tenant cloud applications.

Federations in SQL Azure answers two common challenges for scale out by making it easier to distribute/repartition data and provides built-in routing support that requires no downtime for application even when repartitioning operations are moving data around in the system.

  1. For online partitioning of data, SPLIT operation allows spreading of a federation into multiple members of data (collection of atomic units). MERGE operation allows gluing back of federation members data together. These operations can be easily done by administrators without downtime.
  2. For connection routing, all connections are established to the database containing the federation (a.k.a. root database). This eliminates the need for applications to cache any routing or directory information when working with federations. SQL Azure Federation guarantees that applications will always be connected to the correct federation member regardless of any repartitioning operation.

Deploying Oracle RAC in Production

Recently, Oracle released pre-installed RAC in Exadata to help overcome installation complexity. However, Exadata does not take away complexity in ongoing management and troubleshooting. As a result, very few customers (<5% of all Oracle customers) have deployed the technology after evaluation because of its complexity in deployment, management, and troubleshooting.

  • According to Oracle, total Oracle database customers worldwide: 380,000 
  • According to Gartner, total production Oracle RAC customers worldwide: 15,000 (<5%)

This result shows that most customers do understand that the cost and complexity of Oracle RAC far outweigh the benefits that Oracle claims to provide. Customers who are currently evaluating Oracle RAC understand the risk of using a database technology that is complex and expensive. These customers should also look at the high value solutions that Microsoft provides to address their requirements.

Oracle RAC is Extremely Costly

Customers constantly look to implement IT solutions that both meets their requirements and are at the same time affordable. An Oracle RAC solution is not a good investment because it is too costly. Rather, customers should look at SQL Server solutions as an alternative that satisfies their database requirements. For example, SQL Server gives a comparable response in terms of recovery time compared to Oracle RAC (see table below) while providing better overall pricing.

Technology  Recovery Time (Approximately)
Oracle RAC 30-60 Seconds
 SQL Server AlwaysOn Availability Group <45 Seconds
SQL Server AlwaysOn Failover Cluster Seconds to Minutes
According to Oracle document entitled “Oracle Database High Availability Best Practices 11g Release 2” on recovery times.

The following will illustrate why an Oracle RAC implementation is costly in both software and hardware terms. In order to implement Oracle RAC successfully, customers need to follow Oracle’s Maximum Availability Architecture (MAA). MAA recommends having the exact same configuration of Oracle RAC in QA, Production, and DR environment. Plus, the Active Data Guard option is needed to ensure high availability between Production and DR. This is a massive requirement that means additional licenses are required for Oracle RAC in QA and DR environment as well as a different technology called Active Data Guard. In addition, special Oracle-certified switches are required to implement Cache Fusion, a way for Oracle RAC to share memory among member nodes. And usually redundant copies of switches are required for high-availability. These will add to the cost of implementing Oracle RAC and every customer should know that. The very simplified comparison table below illustrates an example of SQL Server vs. Oracle RAC software processor cost comparison for a 2-node production environment without the consideration of QA and DR environment (each node is a 2 CPUs, quad core machine).

Pricing is based on price list for Enterprise Edition of SQL Server 2012 and Oracle 11gR2. 
 Component SQL Server Solution Oracle RAC Solution
 Core Database USD115,000 USD380,000
 High Availability Option Included USD184,000
Total Cost USD115,000 USD564,000
Note: The cost of the Oracle solution will be considerably higher if we factor in QA and DR environment where Oracle recommends Active Data Guard (additional USD$10K per processor) and an exact replica of Oracle RAC.

Customers need to pay a much higher cost for Oracle RAC compared with SQL Server solutions that will satisfy the equivalent requirements. Another question customers must ask themselves is whether it is justified to deploy Oracle RAC as a solution where it cost more than 5X the SQL Server solution. Does Oracle RAC provide more than 5X better performance, scalability, and high-availability than SQL Server? We believe the answer is “no” and customers should be looking at a SQL Server solution instead.

Lower Total Cost Ownership with Microsoft SQL Server

Microsoft SQL Server is known to provide great performance and scalability in the enterprise. The latest release of SQL Server 2014 provides many major enhancements. SQL Server 2014 is a cloud-ready information platform that will help organizations unlock breakthrough insights across the organization and quickly build solutions to extend data across on-premises and public cloud, backed by mission critical confidence.

In the following sections we will examine common scenarios for database solutions, outline the main reasons Oracle RAC is not suitable for these scenarios causing few customers to deploy, and provide the best SQL Server solution.

High Availability

In this section we will examine the scenario of high-availability (HA). Usually this scenario involves having redundant computers or nodes which are then used to provide the database service when one of the computers or nodes fails. Usually the HA system detects hardware or software faults and immediately transfers all the service to the standby server without administrative intervention, a process known as failover.

  • Oracle RAC is not a good solution for HA scenario. It is known to have a stability issue that happens in Linux environment where node is randomly evicted from the cluster for no apparent reason. This creates additional downtime risk for customers and extra burden for the DBAs, because the need to monitor the RAC cluster more closely requires installing an additional monitoring tool from Oracle. The most recent news about an outage experienced by major bank JP Morgan Chase shows the seriousness of database outages for businesses.
    In addition, Oracle RAC uses many shared components such as processes (cluster ready services) and hardware (voting disk and SAN) that has a single point of failure. This means the entire cluster will be down if any of those components fails. It is also important to note that patching Oracle RAC almost always requires shutting down the entire cluster. The most recent blog posting by EMC engineers showed the role of Oracle RAC in enterprise database deployments becoming less every day.

 

  • On the other hand, SQL Server 2014 provides a new capability called AlwaysOn. AlwaysOn is a new integrated, flexible, cost-efficient high availability and disaster recovery solution that was built on a very stable failover clustering platform. AlwaysOn Failover Cluster instances provides server-level redundancy on a certified Microsoft Cluster Services configuration and enables seamless failover capabilities in the event of a CPU, memory, or other non-storage hardware failure by sharing disk access between nodes and automatically restarting SQL Server on a working node in the event of a failure. AlwaysOn Failover Cluster instances support multisite clustering across subnets, which enables cross-data-center failover of SQL Server instances.
    Also new, AlwaysOn Availability Groups greatly enhance the capabilities of database mirroring and help ensure availability of application databases, and they enable zero data loss through log-based data movement for data protection without shared disks. AlwaysOn Availability Groups provide an integrated set of options including automatic and manual failover of a logical group of databases, support for up to four secondary replicas, fast application failover, and automatic page repair.
    SQL Server 2014 also supports deployments on Windows Server Core, a minimal, streamlined deployment option for Windows Server 2012 and Windows Server 2012 R2. This operating system configuration can reduce planned downtime by minimizing operating system patching requirements by as much as 60 percent according to internal Microsoft study. SQL Server 2014 AlwaysOn can facilitate rolling upgrades and patching of instances, which helps significantly to reduce application downtime. Enhanced support for online operations like LOB re-indexing and adding columns with default values also helps to reduce downtime during database maintenance operations.

Scale-Out OLTP Applications

In this section we will examine the scenario of scale-out online transaction processing (OLTP) applications. Scaling usually refers to the process of adding resources to a tier so that it can handle increased workloads. Scale-out increases the processing power of a system designed in a modular fashion, such as becoming a cluster of computers, by adding one or more additional computers (also called nodes) to the system. Scale-out usually has some initial hardware cost advantages – eight of four-processor servers generally cost less than one 32-processor server. But this advantage is often cancelled out when licensing and maintenance costs are included.

  • Oracle RAC is not a good solution for scale-out OLTP scenarios. Oracle RAC is not ideal for database operations such as bulk load, long-running transactions, and high-update applications because those operations require a bigger buffer. Overall performance will suffer because Oracle RAC needs to transfer large amount of buffer among the nodes through the interconnect Cache Fusion. Another type of application that uses a lot of serialization (such as Oracle’s sequence request and index update) will require a waiting period by the rest of the nodes and the operations cannot be scalable.
    According to an Oracle OpenWorld presentation customers need to redesign their applications to use hash partitioning to mitigate that issue. Oracle RAC best practice says that it cannot make a non-scalable application scale and suggests data partitioning to minimize the traffic inside the interconnect. Coincidentally, this approach is the same of using Data Dependent Routing (DDR) and customers do not need to pay for the extra cost of Oracle RAC to implement DDR. Moreover, Oracle documentation states that an application will not scale on Oracle RAC if it does not scale on a SMP system.

 

  • SQL Server provides a technology called Distributed Partitioned View (DPV) which adds a scale-out capability to the database backend. This implementation is designed for high-end OLTP applications with update-intensive applications.
    SQL Server also provides Data Dependent Routing (DDR), where the data is partitioned among databases. In this architecture, SQL Server understands how the data is partitioned, and decides where to go to find the data. DDR requires a data layer that understands how to locate and access data entities from the database where it is stored. The DDR solution is designed for high transaction volumes, and therefore it is a clear winner for applications with very high update frequencies. Other SQL Server architectures that support scale out are Scalable Shared Databases and Peer-to-peer replication.
    In order to scale database applications to hundreds of nodes and supporting an elastic cloud computing paradigm, SQL Azure Federation provides a great way to bring sharding pattern into existing database in order to do scale out. With Federations, database tiers can be elastically scaled-out much like the middle and front tiers of the application based on application workload. Using federations, applications can expand and contract the number of nodes that service the database workload without requiring any downtime

Oracle RAC Misunderstandings

Because of long and extensive product marketing by Oracle, many customers have misunderstandings about the real capabilities and limitations of Oracle RAC. We list some of the common myths about Oracle RAC by sharing the truth behind its market perception.

Oracle RAC has 100% availability and no failover downtime.

In reality, according to Oracle document the failover time for Oracle RAC is around 30-60 seconds depending on configuration and workload. In other words, when an Oracle RAC node fails, the cluster needs some recovery time to perform rebalancing processes that includes the following: 

  • Redistribution of resource management to the surviving nodes. 
  • Recovery of database blocks by reading the redo log of the failed node. 
  • Collect all database blocks that need to be recovered.  
  • Perform the first step of database recovery process called rolling forward where all the collected redo logs are applied to the database. 
  • Perform the second step of the database recovery process called undo where all the uncommitted transactions are applied to the database.

It is important to note that the cluster is not available during most of the rebalancing process. In fact, to achieve maximum availability, Oracle recommends customers use another technology called Data Guard in addition to Oracle RAC. 

Oracle RAC has gone mainstream.

Gartner published a report that Oracle RAC has gained mainstream implementation. In reality, very small percentages (<5%) of Oracle customers are deploying RAC after 20 years of launching the technology. 

Oracle RAC has no maintenance downtime.

In reality, few Oracle-certified patches for rolling upgrades are available. Moreover, rolling upgrade of patches is currently available for one-off patches only and it is not available for patchsets. In real life situations, very few Oracle DBAs will patch their database using one-off patches but rather by applying patchsets. Again, to achieve maximum availability, Oracle recommends another technology called Data Guard for applying patchsets in rolling upgrade fashion according to Oracle’s documentation. It is also important to note that major version upgrades are also not supported according to Oracle’s documentation

Oracle RAC provides linear scale out.

In reality, as nodes increases, more Cache Fusion traffic will occur among nodes and performance degrades. This prevents Oracle RAC from scaling linearly. Oracle also provides many ways to tune Oracle RAC by changing the application. In the best scenario using application tuning such as partitioning, adding nodes will achieve up to 80% of scalability. However, this requires application changes and adds to its complexity.

 Any applications will run in Oracle RAC without modifications.

In reality, according to an Oracle OpenWorld presentation on “Oracle RAC Performance Experts Reveal All”, any serialization operation will seriously impact the performance of the Oracle RAC cluster. For example, an Oracle sequence request and an index update will require a sequential waiting period. In other words, this operation cannot be scalable regardless of how many nodes are in the cluster. In order to mitigate this, Oracle recommends customers re-design their applications by using hash partitioning or re-write their applications without using sequences.

Oracle RAC provides failover transparency.

In reality, application code must be modified to be able to re-run statements that occurred after the last commit to restore other elements of an active database connection, such as active DML transactions and serverside package state. It is also important to note that failover transparency feature in Oracle RAC called Transparent Application Failover (TAF) works only for SELECT queries and not INSERT or UPDATE or DELETE queries.

Oracle RAC will be able to scale my applications more than SMP.

 In reality, according to an Oracle RAC best practice white paper applications will not scale on Oracle RAC if they do not scale on SMP. In other words, Oracle RAC cannot make a non-scalable application scale. Scalability is a direct function of the degree of contention introduced by the application for system resources and data. If the application won’t scale in moving from a 4 processor to an 8 processor SMP configuration, then it certainly won’t scale going from a single 4 processor box to a cluster of 2 4 –way boxes. A similar reference is found in another Oracle’s document: Oracle 11g Oracle RAC administration & deployment guide which says that if an application does not scale on an SMP system, then moving the application to an Oracle RAC database cannot improve performance.

Oracle RAC is cheap because Oracle Standard Edition comes with Oracle RAC.

In reality, although Oracle Standard Edition comes with Oracle RAC, Standard Edition is positioned to be a low-end database and has hardware and scale limitations. In fact, Oracle Standard Edition lacks the ability to use some of the more standard features in the database such as compression, partitioning, advanced security, and Active Data Guard. This makes an Oracle RAC implementation on Oracle Standard Edition incomplete and it does not support DR environment for maximum high-availability.

Conclusion

Microsoft SQL Server offers database solution to meet customers’ requirements with lower TCO than Oracle RAC in terms of initial implementation cost and ongoing maintenance. In some cases, customers who have implemented or evaluated Oracle RAC have since decided to switch to SQL Server.

 

 

Hamid J. Fard

I am SQL Server Data Platform Expert with more than 9 years’ of professional experience, I am currently Microsoft Certified Master: SQL Server 2008, Microsoft Certified Solutions Master: Charter-Data Platform, Microsoft Data Platform MVP and CIW Database Design Specialist. I also do Pig, Hive and Sqoop development regarding Hadoop Big Data platform. After a few years of being a production database administrator I jumped into the role of Data Platform Expert. Being a consultant allows me to work directly with customers to help solve questions regarding database issues for SQL Server.

More Posts

Follow Me:
FacebookLinkedIn


Leave a Comment

Be the First to Comment!

Notify of
avatar
1000
wpDiscuz