For those of you who are regular visitors to this blog, this topic may seem rather basic. However, I was recently asked to write an article on this subject and thought that if it’s good enough for that venue then perhaps someone will find benefit in reading here as well. So here are some highlights from that piece-
Clients often come to us for help with data center consolidation or new data center implementation projects. The discussion quickly comes around to the appropriate “Tier Level” for their IT facilities. What we’re talking about here (to a large extent) is an industry standard way of describing the availability of the data center facility. Availability, in this case, is referring to the degree to which the facility can support constant uninterrupted operation of the contained data processing systems.
We know that the systems themselves can be architected with high-availability configurations. Autonomous failover of network connections, clustered server environments, and so on are ways that the systems can sustain operation even if, say, a server crashes. What the Tier Levels of a data center refer to though, is the capability of the facility itself to support the systems it serves. Utility power can fail, the temperature in the building can rise to cause damage to equipment, and so on. These are facilities issues, and are the foundation upon which any amount of data processing fault-tolerance stands.
A bit of history on standards for data center facilities
As with most areas of business, we rely on industry standards as reference-able qualifications and guidelines to avoid the dangers and ambiguities of qualitative references. Instead of, “very robust,” “fully fault-tolerant,” “top notch,” or “Class-A,” we use industry standards to articulate quantifiable guidelines for exactly how “top notch” a certain thing really is.
The American National Standards Institute (ANSI) and the Telecommunications Industry Association (TIA) are examples of organizations that formulate standards for the industry to follow. The TIA developed a specification entitled TIA-942: Telecommunications Infrastructure Standard for Data Centers. This is perhaps the most widely referenced standard when talking about data center facility availability. From the title one might be inclined to think this is a specification of telecom for data centers. It is that, but it’s much broader than that including cabling, space layout, site selection criteria, and infrastructure tiers/availability.¬† This last point, represented by Annex G in the TIA-942 standard, is where all the talk about data center tiers comes from.
As it turns out, the TIA relied upon an organization called The Uptime Institute (or, ‘The Institute” for short) to develop this part of the standard. The Institute’s charter is to provide research-based information on high density computing and mission critical facilities in a vendor-neutral manner. As such, the Institute has become the industry’s trusted source of information in this regard. The Institute continually gathers benchmark data by surveying existing data centers and data center projects. Its research includes models for estimating implementation costs and Total Cost of Ownership (TCO) for data center projects.
Perhaps the one piece of research that most strongly defines the Institute’s work is the definition of tier classifications for data center performance. This is the definition of the four-tier system for classifying Data Center capabilities. Thus, when we hear people speak of, say, a “Tier-4 Data Center,” they are referring to the tier classifications from TIA-942, or the Uptime Institute.
The Tier Classifications for Data Centers
The tier classification model provides an objective basis for comparing or describing the functionality, capacity, and cost of a data center’s facility architecture. In particular, the tier classification model is focused on the Availability of the facility itself, and is driven by the infrastructure to power and cool the data processing environment.
The power and cooling capabilities of a facility are delivered by its Mechanical, Electrical, and Plumbing (MEP) infrastructure. The Mechanical systems provide cooling to the environment in which the data processing equipment is installed. It is comprised of air handlers, air conditioners, chillers, plenums to channel air flow, and so on. The Electrical systems provide the power to the data processing equipment. It is comprised of the utility service to the facility, transfer switches, generators and Uninterruptible Power Supplies (UPS), batteries, Power Distribution Units (PDUs), load banks, breaker panels, copper cabling, and so on. The Plumbing systems support the Mechanical and Electrical systems by routing cabling, air, water, fire suppression gases, and so on. There are multiple plumbing circuits in the facility and is analogous to the vascular system of the building.
Very simply put, the tier classifications refer to the degree of resilience the facility has to failures of MEP systems. Resilience to failures is provided by redundancy and topology of the infrastructure design. In the tier classification model, a Tier-1 facility is the least resilient and a Tier-4 is the most resilient. Said another way, a Tier-1 facility has the lowest availability and a Tier-4 has the highest availability. Said yet another way, a Tier-1 facility carries the highest risk to the business and a Tier-4 carries the lowest risk to the business.
Let’s discuss each of the four tier levels and compare their capabilities:
Tier-1:¬† Basic Data Center Infrastructure
A Tier-1 facility has no redundant capacity components. It provides basic power and cooling to the data processing footprint with no excess capacity for backup or failover, and has no redundancy in the MEP distribution paths.
In this type of facility, any unplanned outage or failure of a capacity component or distribution element will impact the data processing equipment and end-users. Whenever maintenance is needed for the MEP infrastructure (utility work, replacement of components, certification testing, preventative maintenance, and so on) the impact is just as if there were an unplanned outage. All systems and users are affected.
Per Institute benchmark data, Tier-1 sites typically experience two separate 12-hour site-wide shutdowns per year for repair work. In addition, Tier-1 sites typically experience 1.2 equipment or distribution component failures on average each year. Statistically, this means 28.8 hours of downtime per year, or 99.67% availability.
What kind of business is suited for this type of facility? In general, a Tier-1 facility is suitable for small businesses or start-ups where IT is an enhancement to internal business processes, where the principal use of web-presence is for passive marketing, or where there is no enforceable financial penalty to customer quality-of-service commitments.
Tier-2: Data Center with Redundant Capacity Components
A Tier-2 Data Center has redundant capacity components, but only a single non-redundant distribution path serving the data processing equipment. ¬†The benefit of this level is that any redundant capacity component can be removed from service on a planned basis (e.g., for preventative maintenance) without causing the data processing to be shut down. However, an unplanned outage or failure of any capacity component or any disruption to the distribution path may impact the computer equipment.
On average, Tier-2 sites have one unplanned outage per year, and schedule three maintenance activities over a two-year period. The annual impact to operations is 22 hours of downtime per year, or 99.75% availability.
Businesses appropriate for Tier-2 facilities are small businesses whose IT requirements are mostly limited to traditional 9-5 business hours, companies without serious financial penalties for customer quality-of-service commitments, services without real-time delivery obligations, and call centers with multiple sites.
Tier-3: Concurrently Maintainable
A Tier-3 Data Center has redundant capacity components and multiple independent distribution paths serving the data processing footprint. There is sufficient MEP capacity to meet the needs of the data processing systems even when one of these redundant MEP components has been removed from the infrastructure. In a Tier-3 Data Center, maintenance activities and certain unplanned events can occur without interruption to the computing systems.
Because of the concurrently maintainable characteristic of Tier-3 facilities, no annual shutdowns for routine maintenance are required. This allows for very aggressive preventative maintenance programs to be implemented, extending further the operational duty of the MEP components. The Institute has concluded that Tier-3 Data Centers have unplanned events totaling only 1.6 hours per year.¬† Tier-3 sites then, deliver 99.98% availability.
Notice that both the Tier-1 and Tier-2 levels deliver “two-nines” availability, but the step to Tier-3 delivers “three-nines.” This is a big improvement in uptime, and comes with a cost as well, which we’ll discuss later.
Businesses appropriately supported by a Tier-3 Data Center are companies that serve both internal and external customers 24×7and whose IT resources support automation of business processes, so that the customer impact of short shutdowns due to facility outage are manageable. Tier-3 is appropriate for businesses that span multiple time zones and corresponding geographic diversity of employees and customers. Businesses which have significant financial exposure due to customer quality-of-service issues are well supported by Tier-3 facilities.
Tier-4 facilities have multiple, independent, and physically separate systems that each have redundant capacity components and multiple, independent, diverse, AND active distribution paths supporting the data processing footprint. In a Tier-4 Data Center, any single failure of an MEP component or distribution path has no negative impact to the data processing systems, and the infrastructure automatically responds to the failure to prevent further impact to the facility.
Because of the degree of redundancy and fault-tolerance in Tier-4 infrastructures, facility-related failures that impact the data processing equipment are statistically reduced to 0.8 hours per year. This yields 99.99% availability.
Companies that operate in international markets, with a “24 by forever” services commitment, and in a market space in which processes are continuous are well served by Tier-4 facilities. This also includes businesses that are based upon market transactions, financial settlement, e-commerce, or where customer access to applications or employee access to IT is competitively advantageous.
Notes of caution about Tier Classifications
Those of us working in the Data Center and mission critical facility fields are very reliant on the TIA-942/ Uptime Institute tier classification guidelines. Some additional background and words of caution are important to keep in mind as well.
There has been a temptation to stretch the definition of the Institute’s tier classifications. One may hear of a “strong Tier-2” Data Center or a “Tier-3 +” Data Center.¬† In actuality, there is no such thing. The best one can conclude from those example terms are that the facility qualifies as a Tier-3 (in the case of a“Tier-3 +”), and has some other availability-supportive features. If any single system in a Tier-n Data Center does not meet the Tier-n requirements, then the facility as a whole is not Tier-n. There is no such thing as a fractional tier rating.
Similarly, one cannot “back into” a tier rating based on the empirical uptime of the site. That is, if one has determined that a certain facility has been up a sufficient number of hours over a five year period to show a 99.95% availability, that in and of itself does not qualify it as a Tier-3 facility. Furthermore, the tier rating is determined by capacity and topology of the MEP infrastructure.
Finally, a word or two about costs. The cost of any mission critical facility is driven predominantly by the MEP infrastructure. The power and cooling ca
pacity necessary for the Data Center are the dominant drivers of cost in implementing a new facility. This cannot be understated. Similarly, because the tier classifications imply redundancy, the higher the intended tier rating, the higher the cost. This is not a linear incrementing cost, and estimating the costs of a new facility project should involve your Data Center consultant.
Tier Classifications and your project
Data Center consolidation and Data Center construction or outsourcings are top of mind for many CIO’s these days. Many companies have ’90s vintage IT facilities that not only do not have the availability to align with the Business’ operating model but also are struggling to keep up with the power and cooling demands of contemporary computing systems. The deployment of multi-core processors and blade-based systems has pulled the rug out from beneath many a facility manager. The rapidly growing consumption and cost of energy due to the Data Center have caused many a CFO to define facility operational costs as an IT problem.
Whatever the motivation for your Data Center project, one will have to become familiar with the spirit of the TIA-942 tier classifications as well as the nuances thereof to exercise the proper degree of due care necessary in planning these very expensive projects.