Posted by: Electric Thoughts™ | August 6, 2013


Yes, DCIM Systems ARE like ERP Systems, Critical for Both Cost and Risk Management

Technology and manufacturing companies nearly all use sophisticated ERP systems for oversight on the myriad functions that contribute to a company’s operation.  Service companies use SAP. 

Data center managers more typically use their own experience.   With all due respect to this experience, the complexity of today’s data center has long surpassed the ability of any human or even group of humans, to manage it for maximum safety and efficiency.

As data centers have come to acknowledge this fact, they are increasingly adopting DCIM, the data center’s answer to ERP.   The similarities between ERP systems and DCIM are striking.

Just as manufacturing and technology firms needed a system to manage the complexity of operations, data center operations have grown and matured to the state that such systems are now required as well.

 Data Center Knowledge’s  Jason Verge says that “… [DCIM] is being touted as the ERP for the data center; it is addressing a complicated challenge.  When a device is introduced changes or fails, it changes the make-up of these complex facilities.”

Mark Harris of Nlyte said in a related Data Center Journal article: “DCIM was envisioned to become the ERP for IT.  It was to become the enabler for the IT organization to extend and manage their span of control, much like all other organizations (Sales, Engineering, manufacturing, Finance, etc.) had adopted over the years.”

Just like ERP systems, DCIM attempts to de-silo and shed light, along with management control, on cost and waste, while also addressing risk concerns.   In initial DCIM deployments the focus has understandably been on asset management.  Understanding the equipment you have and if this equipment is appropriate for your challenges was the right place to start. However, DCIM vendors and users quickly realized that elimination of energy waste, particularly energy wasted by unused IT assets, was another useful area of focus.  Cooling as a resource or even area of waste, was a tertiary concern.  Business managers no longer have this luxury.  The cost of cooling and the risk of a cooling/heating-related data center failure is too high.  As Michelle Bailey, VP of 451 Datacenter Initiatives and Digital Infrastructure said in a recent webinar on Next Generation Data Centers, Data centers have become too big to fail.  She also said that data centers are still using imprecise measurements of accountability – which don’t match up to business goals.  Processes must be made more transparent to business managers, and that metrics must be established and tie directly back to business goals.

Data center managers can and do make extremely expensive energy-related decisions from a cost perspective in order to reduce risk.  These may not even be bad decisions.  But the issue is that, without site visibility and the transparency that Michelle suggests above, business managers don’t realize that these decisions are being made at all, or that there may be options available which, with more analysis, make more sense from a business cost and risk trade-off perspective. And, while cost is one driver of the need for management oversight, waste (and its obvious effect on cost), is another.

As an example, a facility manager may turn his chiller plant down a degree to manage his cost function and perception of risk control.  This action has the cost equivalent of expensing a Tesla, but likely has no visibility to management.  Nor, typically, does the facility manager realize that he has less expensive and even less risky alternatives, because he/she has never had to consider them.   Facility managers are not traditionally accountable to energy savings.  They are accountable to uptime.  This thinking is outdated.  The two are no longer mutually exclusive.  In fact they are inextricably tied.  Proactive and intelligently managed energy saves money and reduces downtime risk by reducing the possibility of cooling failures.  If DCIM, like an ERP system, is used to understand and manage where cost – and waste- is being generated, it must specifically address and incorporate cooling infrastructure.

DCIM systems, offering granular data center information, aggregated and analyzed for business case analysis enables such oversight and with this, improved operational management.



Posted by: Electric Thoughts™ | June 17, 2013

Intelligent Efficiency

Intelligent Efficiency, The Next New Thing.

Greentech Media’s senior editor Stephen Lacey reported that the convergence of the internet and distributed energy are contributing to a new economic paradigm for the 21st century.

Intelligent efficiency is the next new thing enabled by that paradigm, he says, in a special report  of the same name.  He also notes that this isn’t the “stale, conservation-based energy efficiency Americans often think about.”  He says that the new thinking around energy efficiency is information-driven.  It is granular. And it empowers consumers and businesses to turn energy from a cost into an asset.

I couldn’t agree more.

Consider how this contrast in thinking alone generates possibilities for resources that have been hidden or economically unavailable until now.

Conservation-based thinking or, as I think about it in data centers, “efficiency by design or replacement,” is capital intensive.  To date, this thinking has been focused on new construction, physical infrastructure change, or equipment swap-outs.  These efforts are slow and can’t take advantage of operational variations such as the time-varying costs of energy.

Intelligent energy efficiency thinking, on the other hand, leverages newly available information enabled by networked devices and wireless sensors  to make changes primarily through software.  Intelligent energy management is non-disruptive and easier to implement.  It reduces risk by offering greater transparency.   And, most importantly, it is fast.  Obstacles to the speed of implementation – and the welcome results of improved efficiency – have been removed by technology.

Intelligence is the key factor here.  You can have an efficient system, an efficient design, but if it isn’t operated effectively, it is inherently inefficient.  For example, you may deploy one perfectly efficient machine right next to another perfectly efficient machine believing that you have installed a state-of-the-art solution.  In reality, it’s more likely that these two machines are interacting and fighting with each other – at significant energy cost.   You also need to factor in and be able to track equipment degradation as well as the risks incurred by equipment swap-outs.

You need the third element – intelligence – working in tandem with efficient equipment, to make sure that the whole system works at peak level and continues to work at peak level, regardless of the operating conditions.  This information flow must be constant.  Even the newest, most perfectly optimized data centers will inevitably change.

Kudos to Greentech Media for this outstanding white paper and for highlighting how this new thinking and the” blending of real-time communications with physical systems”  is changing the game for energy efficiency.

Posted by: Electric Thoughts™ | April 29, 2013

Cooling Doesn’t Manage Itself

Cooling Doesn’t Manage Itself

Of the primary components driving data center operations – IT assets, power, space and cooling – the first three command the lion’s share of attention.  Schneider Electric (StruxureWare), Panduit (PIM), ABB (Decathalon), Nlyte, Emerson (Trellis) and others have created superb asset and power tracking systems.   Using systems like these and others, companies can get a good idea as to where their assets are located, how to get power to them and even how to optimally manage them under changing conditions.

Less well understood and, I would argue, not understood at all, is how to get all the IT-generated heat out of the data center, and as efficiently as possible.

Some believe that efficient cooling can be “designed in,” as opposed to operationally managed, and that this is good enough.

On the day a new data center goes live the cooling will, no doubt, operate superbly.  That is, right up until something changes – which could happen the next day, weeks or months later.  Even the most efficiently designed data centers eventually operate inefficiently. At that point, your assets are at risk and you probably won’t even know it.  Changes and follow-in inefficiencies are inevitable.

As well, efficiency by design only applies to new data centers.  The vast majority of data centers operating today are aging. All of them have degraded with incremental cooling issues over time.   IT changes, infrastructure updates, failures, essentially any and all physical data center changes or incidents, affect cooling in ways that may not be detected through traditional operations or “walk around” management.

Data center managers must manage their cooling infrastructure as dynamically and closely as they do their IT assets.  The health of the cooling system directly impacts the health of those very same IT assets.

Further, cooling must be managed operationally.  Beyond the cost savings of continually optimized efficiency, cooling management systems provide clearer insight into where to add capacity, redundancy, potential thermal problems, and areas of risk.

Data centers have grown beyond the point where they can be managed manually.  It’s time stop treating cooling as the red-headed step-child of data centers.  Cooling requires the same attention and sophisticated management systems that are in common use for IT assets.  There’s no time to lose.

Posted by: Electric Thoughts™ | March 12, 2013

Machine Learning

Why Machine Learning-based DCIM Systems Are Becoming Best Practice.

Here’s a conundrum.  While data center IT equipment has a lifespan of about three years, data center cooling equipment will endure about 15 years. In other words,  your data center will likely  undergo five complete IT refreshes within the lifetime of your cooling equipment – at the very least.  In reality, refreshes happen much more frequently. Racks and servers come and go, floor tiles are moved, maintenance is performed, density is changed based on containment operations – any one of which will affect the ability of the cooling system to work efficiently and effectively.

If nothing is done to re-configure cooling operations as IT changes are made, and this is typically the case, the data center develops hot and cold spots, stranded cooling capacity and wasted energy consumption.  There is also risk with every equipment refresh – particularly if the work is done manually.

There’s a better way. The ubiquitous availability of low cost sensors, in tandem with the emerging availability of machine learning technology, is leading to development of new best practices for data center cooling management. Sensor-driven machine learning software enables the impact of IT changes on cooling performance to be anticipated and more safely managed.

Data centers instrumented with sensors gather real-time data which can inform software of minute-by-minute cooling capacity changes.  Machine learning software uses this information to understand the influence of each and every cooling unit, on each and every rack, in real-time as IT loads change.  And when loads or IT infrastructure changes, the software re-learns accordingly and updates itself, ensuring that the accuracy of its influence predictions remains current and accurate.   This ability to understand cooling influence at a granular level also enables the software to learn which cooling units are working effectively – and at expected performance levels  – and which aren’t.

This understanding also illuminates, in a data-supported way, the need for targeted corrective maintenance. With a clearer understanding and visualization of cooling unit health, operators can justify the right budget to maintain equipment effectively thereby improving the overall health and reducing risk in the data center.

In one recent experience at a large US data center, machine learning software revealed that 40% of the cooling units were consuming power but not cooling.  The data center operator was aware of the problem, but couldn’t convince senior management to expend budget because he couldn’t quantify the problem nor prove the value/need for a specific expenditure to resolve the issue.  With new and clear data in hand, the operator was able to identify the failed CRACs and present the appropriate budget required to fix and replace them accordingly.

This ability to more clearly see the impact of IT changes on cooling equipment enables personnel to keep up with cooling capacity adjustment and, in most cases, eliminate the need for manual control.  A reduction of the corresponding “on-the-fly, floor time corrections” also frees up operators to focus on problems that require more creativity and to more effectively manage physical changes such floor tile adjustments, etc.

There’s no replacement for experience-based human expertise. However, why not leverage your staff  to do what they do best, and eliminate those tasks which are better served by software control.  Data centers using machine learning software are undeniably more efficient and more robust.  Operators can more confidently future proof themselves against inefficiency or adverse capacity impact as conditions change.  For these reasons alone, use of machine learning-based software should be considered an emerging best practice.

Posted by: Electric Thoughts™ | January 11, 2013

2012 Retrospective

It’s getting better all the time.

Despite our relentless drive to consume more and more data, driven by ever more interesting and arguably useful multimedia applications, energy consumption of data centers is growing slower than would be predicted from historical trends.

For that success, we should be proud, while remaining focused on even greater efficiency innovation.

Large companies have stepped up with powerful sustainability initiatives which impact energy use throughout their enterprise. We’ve gotten better at leveraging natural resources, like outside air to moderate data center temperatures.  We are using denser, smarter racks for space and other efficiencies. Data center cooling units are built with variable speed devices improving energy efficiency machine-by-machine. Utility companies are increasingly offering sophisticated and results-generating incentives to jump-start efficiency programs.

These and other contributing factors are making a difference, clearly proven in Jonathan Koomey’s Growth in Data Center Electricity Use 2011 report which showed a flattening, versus a lockstep correlation of energy usage to data center growth. Koomey and other analyst growth estimates projected a doubling of world data center energy usage from 2005 to 2010.  Actual growth rates were closer to 56%, a reduction that Koomey attributes both to fewer than expected server installations – and a reduced use of electricity per server.

I am proud of what our industry – and what our company – has achieved.  Consider some of this year’s highlights.

The New York Times raised the profile – and the ire  – of the data center industry calling attention to the massive energy consumed by, well, consumers.  Data center facilities and analysts alike responded with criticism, saying that the article ignored the many and significant sustainability and energy use reductions now actively in use.

Vigilent received an astounding 8 industry awards this year – recognizing our technology innovation, business success and workplace values. I’m very proud of the fact that several of these awards were presented by or achieved in partnership with our customers.  For example, Vigilent and NTT won the prestigious Uptime GEIT 2012 award in the Facility Product Deployment Category.  NTT Facilities with NTT Communications received the 2012 Green Grid Grand Prix award, recognizing NTT’s innovative efforts in raising the energy efficient levels of Japan by using Vigilent and contributing DCIM tools.  And Verizon, in recognition of our support for their commitment to continuing quality and service, presented us with their Supplier Recognition award in the green and sustainability category.

We moved strongly into Japanese and Canadian markets with the help of NTT Facilities and Telus, both of whom made strategic investments in Vigilent following highly successful deployments.  Premiere Silicon Valley venture firm Accel Partners became an investor early in the year.

We launched Version 5 of our intelligent energy management system adding enhanced cooling system control with Intelligent Analytics-driven trending and visualization, along with a new alarm and notification product to further reduce downtime risk.

And, perhaps most satisfyingly of all, we helped our customers avert more than a few data center failures through real-time monitoring and intercession, along with early notification of possible issues.

This year, we will reduce energy consumption by more than 72 million kWh in the US alone.  And this figure grows with each new deployment.  We do this profitably, and with direct contribution to our customer’s bottom line as well through energy cost savings.

Things are getting better. And we’re just getting started.

Posted by: Electric Thoughts™ | September 28, 2012

Cooling Failures

The New York Times story “Power, Pollution, and the Internet” highlights a largely unacknowledged issue with data centers, cooling.  James Glanz starts with an anecdote describing an overheating problem at a Facebook data center in the early days. The article then goes on to quote: “Data center operators live in fear of losing their jobs on a daily basis, and that’s because the business won’t back them up if there’s a failure.”

It turns out that the issue the author describes is not an isolated incident. As data centers get hotter, denser and more fragile, cooling becomes increasingly critical to reliability. Here are examples of cooling-related failures which have made the headlines in recent years.

Facebook: A BMS programming error in the outside air economizer logic at Facebook’s Prineville data center caused the outdoor air dampers to close and the spray coolers to go to 100%, which caused condensate to form inside servers leading to power unit supply failure.

Wikipedia: A cooling failure caused servers at Wikimedia to go into automatic thermal shutdown, shutting off access to Wikipedia from European users.

Nokia: A cooling failure led to a lengthy service interruption and data loss for Nokia’s Contacts by Ovi service.

Yahoo: A single cooling unit failure resulted in locally high temperatures, which tripped the fire suppression system and shut down the remainder of the units.

Lloyds: Failure of a “server cooling system” brought down the wholesale banking division of the British financial services company Lloyds Banking Group for several hours.

Google: For their 1800-server clusters, Google estimates that “In each cluster’s first year, … there’s about a 50 percent chance that the cluster will overheat, taking down most of the servers in less than 5 minutes and taking 1 to 2 days to recover.”

It is no surprise that data center operators live in fear.  What is surprising is that so few operators have mitigated risk through currently-available technology. It’s now possible to non-intrusively upgrade existing data centers with supervisory cooling management systems that compensate for and alert operators to cooling failures. Changes in IT load, environmental conditions, or even human error can quickly be addressed, avoiding what could quickly become an out-of-control incident that results in downtime, loss of availability, and something that’s anathema to colo operators: SLA penalties.

It’s incumbent on facilities operators and business management to evaluate and install the latest technology that puts not only operational visibility, but essential control, in their hands before the next avoidable incident occurs.

Posted by: Electric Thoughts™ | September 17, 2012

Data Center Risk

Surprising Areas of Data Center Risk and How to Proactively Manage Them

Mission critical facilities need a different level of scrutiny and control over cooling management.

It’s no surprise that cooling is critical to the security of these facilities.  With requirements for 99.999 uptime and multimillion dollar facilities at risk, cooling is often the thin blue line between data safety and disaster.

And yet, many mission critical facilities use cooling control systems that were designed for comfort cooling, versus the reliable operation of hugely valuable and sensitive equipment.

When people get warm, they become uncomfortable. When IT equipment overheats, it fails – often with catastrophically expensive results.

In one recent scenario, a 6-minute chiller plant failure resulted in lost revenue and penalties totaling $14 million.  In another scenario, the failure of a single CRAC unit caused temperatures to shoot up to over 100 degrees Fehrenheit in a particular zone, resulting in the failure of a storage array.

These failures result from a myriad of complex, and usually unrealized risk areas.  My recent talk at the i4Energy Seminar series hosted by the California Institute for Energy and Environment (CIEE) exposes some of these hidden risk areas and what you can do about them.

You can watch that talk here:

Posted by: Electric Thoughts™ | July 24, 2012

Cleantech Evolves

Smart Loading for the Smart Grid – New Directions in Cleantech

I recently participated in a TiE Energy Panel (The Hottest Energy Startups: Companies Changing the Energy Landscape), with colleagues from Primus Power, Power Assure, Mooreland Partners and Gen110.

The panel concurred that the notion of Cleantech – and the investment money that follows it – has shifted from a focus on energy generation to a focus on energy management.   To date, this is primarily because cheaper energy sources, hyped in early Cleantech press, haven’t materialized.  It’s hard to compete with heavily subsidized incumbent energy sources, much less build a business for what’s perceived as a commodity business.  There are exceptions, like solar energy development, but other alternative sources have languished financially despite their promise.

The investment shift toward energy management is also a result of emerging efficiency-focused technology.  Data Center Infrastructure Management or DCIM is all about smart management – with an emphasis on energy.  Gartner believes that there are some 60+ companies in this space, which is rapidly gaining acceptance as a data center requirement.

This shift is also supported by the convergence of other technology growth areas, such as big data and cloud computing, both of which play well with energy management.   As our increasingly sensor-driven environment creates more and more data – big data – its volume has surpassed the ability of humans to manage it.

And yet the availability of this data, accurate, collected in real-time, inclusive of the dimensions of time and location, represents real promise.  Availability and analysis of this information within individual corporations and perhaps shared more broadly via the cloud, will reveal continuous options for improving efficiency and will likely point to entirely new means of larger scale energy optimization through an integrated smart grid.

The days of facility operators running around with temperature guns and clipboards – although still surprisingly common today – is giving way to central computer screens with consolidated and scannable, actionable data.

This is an exciting time.  I’m all for new ideas and the creation of less expensive, less environmentally harmful ways to generate energy.  But as these alternative options evolve, I am equally excited by the strides industry has made for the smarter use of the resources we have.

The wave of next generation energy management is still rising.

Posted by: Electric Thoughts™ | May 9, 2012

Data Center Brains

If I Only Had a Brain… said the Data Center


My recent blog talked about the fact that intelligent cooling management system reduces wear and tear on cooling equipment.  It does this in part by avoiding short-cycling.  Additionally, intelligent cooling improves thermal stability, reducing further wear and tear on IT equipment.

Beyond reducing the life of equipment, undue wear and tear causes catastrophic failures which are always unbudgeted and expensive.  Intelligent cooling management extends the life of equipment and reveals potential equipment issues before they can cause problems.

Capacity Boost…

I’ve also described how intelligent cooling management allows you to do more with less.  When equipment is managed just right, and efficiency is managed moment by moment, the mixing of hot and cold air is avoided, return air temps are higher and the capacity of the cooling equipment increases.  This capacity boost allows you to add more IT equipment avoid buying/adding more cooling equipment and ultimately avoid or postpone co-locating or building a new data center as your IT needs expand.

Adding a Smart Layer…

Intelligent cooling management can be added in a lightweight overlay to legacy cooling infrastructures.   The benefits are instantaneous.  You gain system-level coordinated control, new insights through visualization of data center floor cooling operations, and sophisticated cooling control diagnostics  –  without buying a single piece of new cooling equipment or hiring professional service oversight.  And these benefits are equal opportunity – they can be gained from old, new and multi-vendor data centers.

Every data center has untapped potential to work better and deliver more.   By giving your data center a brain, you can increase its brawn as well as its endurance.

Posted by: Electric Thoughts™ | April 2, 2012

Thermal Instability

Reducing Thermal Instability

Today, the traditional means of data center cooling is through decentralized air conditioner control. And yet, decentralized control is inherently unstable. This instability occurs because most data centers are redundantly cooled. In these scenarios, there is a lot of cross talk between air conditioners. Cross talk causes positive feedback.

Decentralized control leads to at least three different types of thermal instability. The first is when air conditioners fight against one another creating the afore-mentioned crosstalk, based on redundant cooling. The effect of this real-time crosstalk is that temperatures cycle up and down, which can cause the temperature to exceed the rate of change limits specified by ASHRAE.

The second type is short-cycling of compressors for air conditioners which have direct expansion cooling. When these air conditioners operate at low load, the compressors will often short cycle. This causes the temperature at the rack inlet to oscillate in an unacceptably high frequency, which again can cause the temperature to exceed ASHRAE guidelines. The third type derives from the combination of points one and two, in which some air conditioners overpower others and cause them to deliver warm air – versus cool air – to raised floor plenums or other areas that should receive cool air.

An additional problem stemming from temperature instability is the increased wear and tear on air conditioning and IT equipment. For example, compressors that are continually cycling wear out faster than those that run at a steady speed. Perhaps more significantly, cycling temperatures impose a higher level of mechanical stress on electronic components – which can result in either intermittent or catastrophic failure of IT equipment.

Smart coordinated control of cooling equipment eliminates or attenuates a number of these common instabilities. This improved control can have a dramatic, positive affect on both mechanical and IT equipment life and ultimately, on system reliability.

Older Posts »



Get every new post delivered to your Inbox.