Posted by: Electric Thoughts™ | April 29, 2013

Cooling Doesn’t Manage Itself

Cooling Doesn’t Manage Itself

Of the primary components driving data center operations – IT assets, power, space and cooling – the first three command the lion’s share of attention.  Schneider Electric (StruxureWare), Panduit (PIM), ABB (Decathalon), Nlyte, Emerson (Trellis) and others have created superb asset and power tracking systems.   Using systems like these and others, companies can get a good idea as to where their assets are located, how to get power to them and even how to optimally manage them under changing conditions.

Less well understood and, I would argue, not understood at all, is how to get all the IT-generated heat out of the data center, and as efficiently as possible.

Some believe that efficient cooling can be “designed in,” as opposed to operationally managed, and that this is good enough.

On the day a new data center goes live the cooling will, no doubt, operate superbly.  That is, right up until something changes – which could happen the next day, weeks or months later.  Even the most efficiently designed data centers eventually operate inefficiently. At that point, your assets are at risk and you probably won’t even know it.  Changes and follow-in inefficiencies are inevitable.

As well, efficiency by design only applies to new data centers.  The vast majority of data centers operating today are aging. All of them have degraded with incremental cooling issues over time.   IT changes, infrastructure updates, failures, essentially any and all physical data center changes or incidents, affect cooling in ways that may not be detected through traditional operations or “walk around” management.

Data center managers must manage their cooling infrastructure as dynamically and closely as they do their IT assets.  The health of the cooling system directly impacts the health of those very same IT assets.

Further, cooling must be managed operationally.  Beyond the cost savings of continually optimized efficiency, cooling management systems provide clearer insight into where to add capacity, redundancy, potential thermal problems, and areas of risk.

Data centers have grown beyond the point where they can be managed manually.  It’s time stop treating cooling as the red-headed step-child of data centers.  Cooling requires the same attention and sophisticated management systems that are in common use for IT assets.  There’s no time to lose.

Posted by: Electric Thoughts™ | March 12, 2013

Machine Learning

Why Machine Learning-based DCIM Systems Are Becoming Best Practice.

Here’s a conundrum.  While data center IT equipment has a lifespan of about three years, data center cooling equipment will endure about 15 years. In other words,  your data center will likely  undergo five complete IT refreshes within the lifetime of your cooling equipment – at the very least.  In reality, refreshes happen much more frequently. Racks and servers come and go, floor tiles are moved, maintenance is performed, density is changed based on containment operations – any one of which will affect the ability of the cooling system to work efficiently and effectively.

If nothing is done to re-configure cooling operations as IT changes are made, and this is typically the case, the data center develops hot and cold spots, stranded cooling capacity and wasted energy consumption.  There is also risk with every equipment refresh – particularly if the work is done manually.

There’s a better way. The ubiquitous availability of low cost sensors, in tandem with the emerging availability of machine learning technology, is leading to development of new best practices for data center cooling management. Sensor-driven machine learning software enables the impact of IT changes on cooling performance to be anticipated and more safely managed.

Data centers instrumented with sensors gather real-time data which can inform software of minute-by-minute cooling capacity changes.  Machine learning software uses this information to understand the influence of each and every cooling unit, on each and every rack, in real-time as IT loads change.  And when loads or IT infrastructure changes, the software re-learns accordingly and updates itself, ensuring that the accuracy of its influence predictions remains current and accurate.   This ability to understand cooling influence at a granular level also enables the software to learn which cooling units are working effectively – and at expected performance levels  - and which aren’t.

This understanding also illuminates, in a data-supported way, the need for targeted corrective maintenance. With a clearer understanding and visualization of cooling unit health, operators can justify the right budget to maintain equipment effectively thereby improving the overall health and reducing risk in the data center.

In one recent experience at a large US data center, machine learning software revealed that 40% of the cooling units were consuming power but not cooling.  The data center operator was aware of the problem, but couldn’t convince senior management to expend budget because he couldn’t quantify the problem nor prove the value/need for a specific expenditure to resolve the issue.  With new and clear data in hand, the operator was able to identify the failed CRACs and present the appropriate budget required to fix and replace them accordingly.

This ability to more clearly see the impact of IT changes on cooling equipment enables personnel to keep up with cooling capacity adjustment and, in most cases, eliminate the need for manual control.  A reduction of the corresponding “on-the-fly, floor time corrections” also frees up operators to focus on problems that require more creativity and to more effectively manage physical changes such floor tile adjustments, etc.

There’s no replacement for experience-based human expertise. However, why not leverage your staff  to do what they do best, and eliminate those tasks which are better served by software control.  Data centers using machine learning software are undeniably more efficient and more robust.  Operators can more confidently future proof themselves against inefficiency or adverse capacity impact as conditions change.  For these reasons alone, use of machine learning-based software should be considered an emerging best practice.

Posted by: Electric Thoughts™ | January 11, 2013

2012 Retrospective

It’s getting better all the time.

Despite our relentless drive to consume more and more data, driven by ever more interesting and arguably useful multimedia applications, energy consumption of data centers is growing slower than would be predicted from historical trends.

For that success, we should be proud, while remaining focused on even greater efficiency innovation.

Large companies have stepped up with powerful sustainability initiatives which impact energy use throughout their enterprise. We’ve gotten better at leveraging natural resources, like outside air to moderate data center temperatures.  We are using denser, smarter racks for space and other efficiencies. Data center cooling units are built with variable speed devices improving energy efficiency machine-by-machine. Utility companies are increasingly offering sophisticated and results-generating incentives to jump-start efficiency programs.

These and other contributing factors are making a difference, clearly proven in Jonathan Koomey’s Growth in Data Center Electricity Use 2011 report which showed a flattening, versus a lockstep correlation of energy usage to data center growth. Koomey and other analyst growth estimates projected a doubling of world data center energy usage from 2005 to 2010.  Actual growth rates were closer to 56%, a reduction that Koomey attributes both to fewer than expected server installations – and a reduced use of electricity per server.

I am proud of what our industry – and what our company – has achieved.  Consider some of this year’s highlights.

The New York Times raised the profile – and the ire  – of the data center industry calling attention to the massive energy consumed by, well, consumers.  Data center facilities and analysts alike responded with criticism, saying that the article ignored the many and significant sustainability and energy use reductions now actively in use.

Vigilent received an astounding 8 industry awards this year – recognizing our technology innovation, business success and workplace values. I’m very proud of the fact that several of these awards were presented by or achieved in partnership with our customers.  For example, Vigilent and NTT won the prestigious Uptime GEIT 2012 award in the Facility Product Deployment Category.  NTT Facilities with NTT Communications received the 2012 Green Grid Grand Prix award, recognizing NTT’s innovative efforts in raising the energy efficient levels of Japan by using Vigilent and contributing DCIM tools.  And Verizon, in recognition of our support for their commitment to continuing quality and service, presented us with their Supplier Recognition award in the green and sustainability category.

We moved strongly into Japanese and Canadian markets with the help of NTT Facilities and Telus, both of whom made strategic investments in Vigilent following highly successful deployments.  Premiere Silicon Valley venture firm Accel Partners became an investor early in the year.

We launched Version 5 of our intelligent energy management system adding enhanced cooling system control with Intelligent Analytics-driven trending and visualization, along with a new alarm and notification product to further reduce downtime risk.

And, perhaps most satisfyingly of all, we helped our customers avert more than a few data center failures through real-time monitoring and intercession, along with early notification of possible issues.

This year, we will reduce energy consumption by more than 72 million kWh in the US alone.  And this figure grows with each new deployment.  We do this profitably, and with direct contribution to our customer’s bottom line as well through energy cost savings.

Things are getting better. And we’re just getting started.

Posted by: Electric Thoughts™ | September 28, 2012

Cooling Failures

The New York Times story “Power, Pollution, and the Internet” highlights a largely unacknowledged issue with data centers, cooling.  James Glanz starts with an anecdote describing an overheating problem at a Facebook data center in the early days. The article then goes on to quote: “Data center operators live in fear of losing their jobs on a daily basis, and that’s because the business won’t back them up if there’s a failure.”

It turns out that the issue the author describes is not an isolated incident. As data centers get hotter, denser and more fragile, cooling becomes increasingly critical to reliability. Here are examples of cooling-related failures which have made the headlines in recent years.

Facebook: A BMS programming error in the outside air economizer logic at Facebook’s Prineville data center caused the outdoor air dampers to close and the spray coolers to go to 100%, which caused condensate to form inside servers leading to power unit supply failure.

Wikipedia: A cooling failure caused servers at Wikimedia to go into automatic thermal shutdown, shutting off access to Wikipedia from European users.

Nokia: A cooling failure led to a lengthy service interruption and data loss for Nokia’s Contacts by Ovi service.

Yahoo: A single cooling unit failure resulted in locally high temperatures, which tripped the fire suppression system and shut down the remainder of the units.

Lloyds: Failure of a “server cooling system” brought down the wholesale banking division of the British financial services company Lloyds Banking Group for several hours.

Google: For their 1800-server clusters, Google estimates that “In each cluster’s first year, … there’s about a 50 percent chance that the cluster will overheat, taking down most of the servers in less than 5 minutes and taking 1 to 2 days to recover.”

It is no surprise that data center operators live in fear.  What is surprising is that so few operators have mitigated risk through currently-available technology. It’s now possible to non-intrusively upgrade existing data centers with supervisory cooling management systems that compensate for and alert operators to cooling failures. Changes in IT load, environmental conditions, or even human error can quickly be addressed, avoiding what could quickly become an out-of-control incident that results in downtime, loss of availability, and something that’s anathema to colo operators: SLA penalties.

It’s incumbent on facilities operators and business management to evaluate and install the latest technology that puts not only operational visibility, but essential control, in their hands before the next avoidable incident occurs.

Posted by: Electric Thoughts™ | September 17, 2012

Data Center Risk

Surprising Areas of Data Center Risk and How to Proactively Manage Them

Mission critical facilities need a different level of scrutiny and control over cooling management.

It’s no surprise that cooling is critical to the security of these facilities.  With requirements for 99.999 uptime and multimillion dollar facilities at risk, cooling is often the thin blue line between data safety and disaster.

And yet, many mission critical facilities use cooling control systems that were designed for comfort cooling, versus the reliable operation of hugely valuable and sensitive equipment.

When people get warm, they become uncomfortable. When IT equipment overheats, it fails – often with catastrophically expensive results.

In one recent scenario, a 6-minute chiller plant failure resulted in lost revenue and penalties totaling $14 million.  In another scenario, the failure of a single CRAC unit caused temperatures to shoot up to over 100 degrees Fehrenheit in a particular zone, resulting in the failure of a storage array.

These failures result from a myriad of complex, and usually unrealized risk areas.  My recent talk at the i4Energy Seminar series hosted by the California Institute for Energy and Environment (CIEE) exposes some of these hidden risk areas and what you can do about them.

You can watch that talk here:

Posted by: Electric Thoughts™ | July 24, 2012

Cleantech Evolves

Smart Loading for the Smart Grid – New Directions in Cleantech

I recently participated in a TiE Energy Panel (The Hottest Energy Startups: Companies Changing the Energy Landscape), with colleagues from Primus Power, Power Assure, Mooreland Partners and Gen110.

The panel concurred that the notion of Cleantech – and the investment money that follows it – has shifted from a focus on energy generation to a focus on energy management.   To date, this is primarily because cheaper energy sources, hyped in early Cleantech press, haven’t materialized.  It’s hard to compete with heavily subsidized incumbent energy sources, much less build a business for what’s perceived as a commodity business.  There are exceptions, like solar energy development, but other alternative sources have languished financially despite their promise.

The investment shift toward energy management is also a result of emerging efficiency-focused technology.  Data Center Infrastructure Management or DCIM is all about smart management – with an emphasis on energy.  Gartner believes that there are some 60+ companies in this space, which is rapidly gaining acceptance as a data center requirement.

This shift is also supported by the convergence of other technology growth areas, such as big data and cloud computing, both of which play well with energy management.   As our increasingly sensor-driven environment creates more and more data – big data – its volume has surpassed the ability of humans to manage it.

And yet the availability of this data, accurate, collected in real-time, inclusive of the dimensions of time and location, represents real promise.  Availability and analysis of this information within individual corporations and perhaps shared more broadly via the cloud, will reveal continuous options for improving efficiency and will likely point to entirely new means of larger scale energy optimization through an integrated smart grid.

The days of facility operators running around with temperature guns and clipboards – although still surprisingly common today – is giving way to central computer screens with consolidated and scannable, actionable data.

This is an exciting time.  I’m all for new ideas and the creation of less expensive, less environmentally harmful ways to generate energy.  But as these alternative options evolve, I am equally excited by the strides industry has made for the smarter use of the resources we have.

The wave of next generation energy management is still rising.

Posted by: Electric Thoughts™ | May 9, 2012

Data Center Brains

If I Only Had a Brain… said the Data Center

Maintenance…

My recent blog talked about the fact that intelligent cooling management system reduces wear and tear on cooling equipment.  It does this in part by avoiding short-cycling.  Additionally, intelligent cooling improves thermal stability, reducing further wear and tear on IT equipment.

Beyond reducing the life of equipment, undue wear and tear causes catastrophic failures which are always unbudgeted and expensive.  Intelligent cooling management extends the life of equipment and reveals potential equipment issues before they can cause problems.

Capacity Boost…

I’ve also described how intelligent cooling management allows you to do more with less.  When equipment is managed just right, and efficiency is managed moment by moment, the mixing of hot and cold air is avoided, return air temps are higher and the capacity of the cooling equipment increases.  This capacity boost allows you to add more IT equipment avoid buying/adding more cooling equipment and ultimately avoid or postpone co-locating or building a new data center as your IT needs expand.

Adding a Smart Layer…

Intelligent cooling management can be added in a lightweight overlay to legacy cooling infrastructures.   The benefits are instantaneous.  You gain system-level coordinated control, new insights through visualization of data center floor cooling operations, and sophisticated cooling control diagnostics  –  without buying a single piece of new cooling equipment or hiring professional service oversight.  And these benefits are equal opportunity – they can be gained from old, new and multi-vendor data centers.

Every data center has untapped potential to work better and deliver more.   By giving your data center a brain, you can increase its brawn as well as its endurance.

Posted by: Electric Thoughts™ | April 2, 2012

Thermal Instability

Reducing Thermal Instability

Today, the traditional means of data center cooling is through decentralized air conditioner control. And yet, decentralized control is inherently unstable. This instability occurs because most data centers are redundantly cooled. In these scenarios, there is a lot of cross talk between air conditioners. Cross talk causes positive feedback.

Decentralized control leads to at least three different types of thermal instability. The first is when air conditioners fight against one another creating the afore-mentioned crosstalk, based on redundant cooling. The effect of this real-time crosstalk is that temperatures cycle up and down, which can cause the temperature to exceed the rate of change limits specified by ASHRAE.

The second type is short-cycling of compressors for air conditioners which have direct expansion cooling. When these air conditioners operate at low load, the compressors will often short cycle. This causes the temperature at the rack inlet to oscillate in an unacceptably high frequency, which again can cause the temperature to exceed ASHRAE guidelines. The third type derives from the combination of points one and two, in which some air conditioners overpower others and cause them to deliver warm air – versus cool air – to raised floor plenums or other areas that should receive cool air.

An additional problem stemming from temperature instability is the increased wear and tear on air conditioning and IT equipment. For example, compressors that are continually cycling wear out faster than those that run at a steady speed. Perhaps more significantly, cycling temperatures impose a higher level of mechanical stress on electronic components – which can result in either intermittent or catastrophic failure of IT equipment.

Smart coordinated control of cooling equipment eliminates or attenuates a number of these common instabilities. This improved control can have a dramatic, positive affect on both mechanical and IT equipment life and ultimately, on system reliability.

Posted by: Electric Thoughts™ | March 14, 2012

More Cooling

More Cooling With Less $$

My last post took a look at the maintenance savings possible through more efficient data center/facility cooling management.  You can gain further savings by increasing the capacity of your existing air handling/ air conditioning units.  It is even possible to add IT load without requiring new air conditioners or at the least, deferring those purchases.  Here’s how.

Data centers and buildings have naturally occurring air stratification.  Many facilities deliver cool air from an under floor plenum.  As the air heats and rises, cooling air is delivered low and moved about with low velocity.  Because server racks sit on the floor, they sit in a colder area on average.  The air conditioners however, draw from higher in the room – capturing the hot air from above and delivering it, once cooled down, to the under floor plenum. This vertical stratification creates an opportunity to deliver cooler air to servers and at the same time increase cooling capacity by drawing return air from higher in the room.

However, this isn’t easy to achieve.  The problem is that uncoordinated or decentralized control of air conditioners often causes some of the units to deliver uncooled air into the under floor plenum. There, the mixing of cooled and uncooled air results in higher inlet air temperatures of servers, and ultimately lower return-air temperatures, which reduces the capacity of the cooling equipment.

A cooling management system can establish a colder profile at the bottom of the rack and make sure that each air conditioner is actually having a cooling effect, versus working ineffectively and actually increasing heat through its operation. An intelligent cooling energy management system dynamically right-sizes air conditioning unit capacity loads, coordinating their combined operation so that all the units deliver cool air and don’t mix hot return air from some units with cold air from other units. This unit-by-unit but combined coordination squeezes the maximum efficiency out of all available units so that, even at full load, inefficiency due to mixing is avoided and significant capacity-improving benefits are gained.

Consider this example.  At one company, their 40,000 sq. foot data center appeared to be out of cooling capacity.  After deploying an intelligent energy management system, not only did energy usage drop, but the company was able to increase its data center IT load by 40% without adding additional air conditioners and, in fact,  after de-commissioning two existing units. As well, the energy management system maintained proper, desired inlet air temperatures under this higher load condition.

Consider going smarter before moving to an additional equipment purchase decision.  Savings become even larger if you consider avoided maintenance costs for new equipment, and energy reduction through more efficiently balanced capacity loads, year-over-year.

Posted by: Electric Thoughts™ | January 4, 2012

2011 Reflections

There is a saying in the MEP consulting business: “no one ever gets sued for oversizing.” That fear-driven mentality also affects the operation of mechanical systems in data centers, which accounts for why data centers are over-cooled at great expense.  But few facility managers know by how much.  The fact is that it has been easier – and to date –safer to over-cool a data center as the importance of the data it contains has increased and with that importance, the pressure to protect it.

Last year that changed.  With new technology, facility managers know exactly how much cooling is required in a data center, at any given time. And, perhaps more importantly, technology can provide warning – and reaction time – in the rare instances when temperatures increase unexpectedly. With this technology, data center cooling can now be “dynamically right-sized.”  The risk of dynamic management can be made lower than manual operation, which is prone to human error.

In our own nod to the advantages of this technology, we re-named the company I co-founded in 2004, from Federspiel Corporation to Vigilent Corporation.  As our technology increased in sophistication, we felt that our new name, denoting vigilance and intelligent oversight of facility heating and cooling operations, was more reflective of the new reality in data center cooling management.   Last year, through smart, automated management of data center energy consumption, Vigilent reduced carbon emissions and energy consumption of cooling systems by 30-40%.  These savings will continue year after year, benefiting not only those companies’ bottom line, but also their corporate sustainability objectives.   These savings have been accomplished while maintaining the integrity and desired temperatures of data centers of all sizes and configurations in North America, Canada and Japan.

I’m proud of what we have achieved last year.  And I’m proud of those companies who have stepped up to embrace technology that can replace fear with certainty, and waste with efficiency.

Older Posts »

Categories

Follow

Get every new post delivered to your Inbox.