The Uptime Institute recently released new guidance regarding operational behaviors supporting data center Tier levels. In several other articles, we’ve discussed the notion that The Uptime Institute’s tier models for mission critical facilities are centered upon the topology of MEP infrastructure for increasing levels of site availability, but that these models do not significantly take into account operational maturity, which we will propose is predominantly responsible for availability performance regardless of the topology of infrastructure. It is the reason that lower tier designs can historically demonstrate availability performance equal to or better than that predicted for higher tier designs (it should also be noted that the converse is true in the cases of poor operational frameworks on higher tier designs). In this post, we share bits of what has been published by The Uptime Institute regarding this new guidance, and offer our own thoughts and comments along the way.
The new guidance from The Uptime Institute includes characteristics of data center operations in the categories of Management and Operations, Building Characteristics, and Site Location. This augmentation of the four-tier model is long anticipated, and we think these are appropriate categories to encompass the quality of operations procedures (MOPS/SOPS) as well as the risk management factors that are directly impactful to the successful management and operation of the facility.
The point of including the operational sustainability factors into the Tier standard is to facilitate computation of a score for a data center’s operational elements, reflecting the potential impact of these elements on the long-term availability performance of the site’s topology score. This is then articulated as a suffix to the tier rating in a three-level color code- bronze, silver, gold. For example, a Tier III data center with exceptional assessment of operational elements may receive a rating of Tier III-Gold.
As a matter of course, The Uptime Institute reserves the exclusive right to assign an operational sustainability score to a facility, just as it does for the infrastructure topology tier rating. Within the information published so far by The Uptime Institute describing this operational sustainability scoring system, no guidance is revealed for scoring beyond yes/no, other than the prioritization of which factors are seen to be more impactful or important than others. Specifics regarding how the assessment takes place are simply not published at this time.
More on the elements of Operational Sustainability
The Operational Sustainability assessment of a given facility involves discovery and evaluation in three major categories:
1. Management and Operations
2. Building Characteristics
3. Site location
The incident database compiled by The Uptime Institute reveals that as much as 70% of availability incidents are due to human error. This is within the range of other research in this area as well (we’ve seen statistics as high as 80%). As such, perhaps the most important element of operational sustainability is that of Management and Operations.
Within the Management and Operations category, The Uptime Institute defines four areas of assessment:
1. Staffing and Organization- Referring to the proper number and qualifications of personnel comprising the operations team in the data center, as well as shift coverage and well defined roles and responsibilities which are taken in high regard by management.
2. Maintenance- Referring to the rigor of the preventative maintenance programs, housekeeping, maintenance management, service level agreements, and life-cycle planning.
3. Training- referring to personnel training programs for policies and procedures, incident response, et. al., and the source of training including OJT, vendor delivered training, and external educational sources. The experience and competency of the staff is key to maintaining systems and components.
4. Planning, Coordination, and Management- Referring to the full scope of data center management factors including capacity planning, operational planning, policy creation and enforcement, et. al.
While there is certainly depth beyond what the Institute has published regarding this model so far, we think there are other factors that deserve mention in this category. One of these is the strength of Methods of Procedures (MOPS) and Standard Operating Procedures (SOPS). MOPS and SOPS are foundational components of data center management and the core of effective maintenance programs. They are often drawn upon component vendor recommendations and construction or commissioning contractor guidance, but also include the wisdom of experience and expertise of the operations and management staff of the site itself.
Governance is another area that may be implied by the Planning, Coordination, and Management topic, but one, which among other things infuses the operational framework with risk management and business planning factors.
Building Characteristics is the second element included in the Institute’s model. Building characteristics are further defined by four sub categories as follows.
1. Building Features- The specifics around this metric are few, but it makes sense that building features be included as a category in this context, since the shape and architectural layout of the space can directly impact not only operational complexity but also the MEP topology and scale able to be installed in the building.
2. Infrastructure- Characteristics of the building itself that impact MEP infrastructure installation, maintenance, and operations
3. Operating Conditions- The Uptime Institute’s paper describes here the notion of operating set points of infrastructure components. This is certainly an operational topic impacting availability (not to mention efficiency), but seems oddly positioned as a “building characteristic.”
4. Pre-Operational- Included here are proper commissioning and documentation for handoff to the operational team. This is certainly an important step in the timeline of data center development. It would seem though, that an organization going to the effort and expense of certification would likely be of sufficient maturity to include commissioning and staff training as a part of the operational readiness plan.
The third category of the Operational Sustainability model is Site Location. Certainly this is important in the overall risk profile of a facility and rather fundamental to tier level certification by The Uptime Institute. The Uptime Institute consolidates this topic into two areas:
1. Natural Disasters
2. Manmade Disasters
This is probably a sufficient category set for site selection, as any threat coming to mind could arguably be placed into either of these two categories.
At the end of it all, The Uptime Institute will assess their findings for a particular site in this regard and award a Gold, Silver, or Bronze score, which is then used as a suffix with the tier rating. The reader is encouraged to read The Uptime Institute’s white paper on this subject themselves for further information, but the paper does not go into detail regarding how the assessment process plays out, nor detail about how scoring takes place.
This guidance on operational sustainability is new from The Uptime Institute, so as of this writing, few miles have been traveled with the expanded tier standard. The release was much anticipated though, and while the time since release has not been long; it’s surprising at least to us how little dialogue has resulted. As always, the guidance from The Uptime Institute is well regarded and based on solid research and thought leadership. We suppose though, that the reason for the low level of attention for the new expanded guidance is due to the same reasons the MEP infrastructure tier guidance is so popular.
To say it another way, the four-tier model for data center availability has gained significant mind share because it provides a (very easy) model for a business to align with a certain level of data center specifications. Whether a given business is well served by that approach alone is the subject of another discussion. However, the absence of operational considerations was noticeably lacking in the four-tier model until now, and led to lengthy debate between provider and buyer about historical availability performance versus tier-model predictions. The audience that embraced the four-tier availability model probably does not as easily internalize this operational guidance. It’s an important topic, but is perhaps beyond the “happy place” of the business audience, or at least does not offer the same immediate value as the preceding four-tier model. For the technical audience, on the other hand, it is perhaps difficult to give the new model the credit it’s due when it is presented as a sort of appendage to the tier model.
While we have very high regard for the work produced by The Uptime Institute and recognize the value it brings to many mission critical facilities discussions, we think that a model that more holistically includes infrastructure and operations, as well as a basis in a risk management assessment is more relevant to data center planning. We suggest the reader look at the BICSI-002 guidance, recently published and presented. What is also missing, in our opinion, is inclusion of historical availability performance of the site and its team, but then The Uptime Institute’s tier methodology is a predictive model, and the inclusion of such factors might confound the underpinnings upon which that methodology is built so we will leave that point to the side.
We’re interested in your thoughts on this topic, and especially if your organization is considering certification for Operational Sustainability by The Uptime Institute. It would be very interesting to hear your views on how this new extension to the model may impact your facility.