For Exchange Server 2016 the high availability building block is the database availability group (DAG). Exchange 2016 DAGs are very similar to Exchange 2013 DAGs however there are some new features and behaviours to be aware of, which I’ll demonstrate in this article series. I’ll also cover:
- Installing a new Exchange Server 2016 database availability group
- Managing database copies for Exchange Server 2016 database availability groups
- Database switchovers and failovers for Exchange Server 2016 database availability groups
- Reseeding a failed database copy in an Exchange Server 2016 DAG
- Recovering a failed Exchange Server 2016 DAG member
Let’s begin with an overview of database availability group concepts.
Exchange Server 2016 DAG Concepts
Database availability groups can contain up to 16 Exchange 2016 mailbox servers, each of which hosts copies of one or more databases that are replicated with database copies on other members of the same DAG.
When a DAG is first created it has zero members. A minimum of two members is required for the DAG to provide high availability. Two-member DAGs are reasonably common as a simple HA deployment of Exchange, for example in the diagram below two Exchange 2016 servers and a file share witness make up a database availability group.
Database Availability Groups and Quorum
Exchange Server DAGs make use of an underlying Windows Failover Cluster. You don’t need to create, configure, or even touch the Windows Failover Cluster using cluster management tools, except in specific maintenance scenarios that are clearly documented. When you add members to a DAG the failover clustering components are automatically installed and configured for you.
Quorum is the voting process that the cluster uses to determine whether the DAG should remain online or go offline. If the DAG goes offline all of the databases in the DAG are dismounted and inaccessible to end users, causing an outage.
There are two quorum models:
- Node Majority – when the DAG has an odd number of members the file share witness is not required for the quorum voting process, because the DAG members can determine a “majority” themselves. For example, if one DAG member fails, 2/3 DAG members are still online (a majority) and the DAG can remain online. If two DAG members fail, 1/3 DAG members are still online, which may result in quorum being lost and the DAG going offline.
- Node and File Share Majority – when the DAG has an even number of members the file share witness is included in the quorum voting process to ensure that a “majority” can be determined. For example, in a two-member DAG if one member fails, 1/2 members are still online (not a majority), but you would expect the DAG to be able to withstand a single node failure. The file share witness is used as the tie-breaker, meaning 2/3 “votes” are still available, and the DAG can stay online. Similarly with a four-member DAG, if two members failed, with the file share witness there are still 3/5 “votes” online, so the DAG can stay online.
I wrote above that in some failure scenarios the DAG may lose quorum and go offline. In some circumstances the DAG can sustain a majority of nodes being offline if there has been sequential failures. This is thanks to a feature of Windows Server 2012 clusters called Dynamic Quorum.
Database Copies and Continuous Replication
Each member of the Exchange 2016 DAG hosts one or more database copies, and participates in the process of continuous replication to keep those database copies updated with changes. The Exchange 2016 server edition determines how many database copies a DAG member can host. A Standard edition server can host up to 5 database copies, and an Enterprise edition server can host up to 100 database copies.
Exchange 2016 DAG members can host a mix of active and passive database copies, because the switchover/failover occurs at the database level, not the server level. So there is no concept of an “active server” or a “passive server”.
During continuous replication the transaction log data that is generated on the active database copy is shipped across the network to the DAG members hosting passive database copies. Those DAG members then replay the transaction log data to update their passive database copy. Replay can occur immediately, or it can be configured to be a lagged database copy.
There is not a special installation of Exchange Server 2016 for DAG members. An Exchange 2016 mailbox server can be added to a DAG, or removed from a DAG, at any time without impacting the databases and other services hosted on that server. Incremental deployment makes it possible for organizations to deploy a single server today, and then scale out to a DAG at a later time if necessary, without any impact to production services.
Database Availability Group Networks
A DAG network is one or more IP subnets that the DAG members are directly connected to. Every Exchange 2016 database availability group has at least one DAG network that is used for client traffic. A DAG can also have one or more separate, dedicated networks for database replication traffic.
With the speed of modern networks it is generally recommended to use only one DAG network, which is simpler to manage and creates a more predictable failure scenario.
An Exchange 2016 database availability group provides high availability for Exchange within a single datacenter or Active Directory site. Exchange 2016 DAGs can also be deployed across multiple datacenters to provide site resilience as well, allowing the Exchange services to remain online in the event of a complete datacenter outage.