华为UPS高可用性:全面保障云时代数据中心供电安全

华为高可用性UPS:全面保障云时代数据中心供电安全

引言:随着信息化时代的发展,以开展数据托管业务为目标的Internet数据中心(简称IDC)、以数据联网集中为目标的企业自用数据中心及以数据安全为目标的数据备份中心得到了迅猛的发展,通信运营商、银行金融系统、政府及各大型企业等都在进行大型数据中心机房的建设。如何确保如此数据高度集中机房的安全运行是供电系统设计的首要命题

众所周知,数据中心所有营运负载几乎都是通过UPS电源来供电的,确保UPS以最安全的模式运行,是数据中心机房供电系统安全的核心。因此,积极引入先进的设计理念和业界成熟的电源设计技术,对数据中心UPS供电系统进行系统化的安全设计,提高数据中心机房UPS供电系统可用性显得至关重要。

何谓可用性?从学术上的可用性定义看,可用性指产品在任一随机时刻需要开始和执行时,处于可工作或可使用状态的程度。

在行业里,通常用“多少个 9”来代表系统可用性的高低。它是指一年内,系统在线运行及可进行生产的时间比例。系统可用性为 4 个“9” ,是指可用性达到 99.99%,即每年系统可能存在的宕机时间少于 53 分钟。5 个“9”(可用性可达到 99.999%),即每年可能存在的宕机时间少于 5.3 分钟。6个 “9”(可用性可达到 99.9999%),即每年可能存在的宕机时间少于 32 秒。UPS系统的目标是尽量提高 UPS 电源系统的可用性,减少来自市电的影响。                              

而对于使用UPS的用户而言,可用性就是指UPS好不好用、易不易用、从生命周期看是否使用成本够低。这一点,我们可以从回顾UPS的发展历程得出。

从UPS的发展历史看,可用性已成为UPS发展的驱动力

最古老的UPS是动态UPS,其利用机械储能以及发电机、电动机的能量传输机制以提供不间断电源,体积庞大、造价昂贵、噪声巨大,俨然一个小型电厂。使用起来占用大量场地资源,既不环保,又不易使用,更接近一个工程而不是设备。

工频机作为下一代UPS减小了体积,但仍然在安装、运输中存在巨大问题。因为庞大的体积导致无法通过门,内置的隔离变压器导致重量太重,无法使用电梯运输,安装此类UPS经常要打墙安装、吊车运输。同时,其维护也非常困难,如此庞大的机器,任何一个部件出错,都要转到维护旁路进行维护,这样造成业务中断的风险大增,成为供电基础设施的可用性短板。

高频机的出现进一步提升了功率密度,体积减小了50%,从功能模块上提升了维护性,缩短了MTTR时间,可在数小时内完成修复。重量较工频机进一步降低,有效提升了工程的可安装性。同时,高频机也大都采用了类模块化设计,在维护性方面也有较大改进。THDi可以做到5%以下,明显减少电网的谐波污染,效率也进一步提升到92-94%,体现出其节能优势。即使这样,也做不到类似直流供电系统的在线扩容、在线维护等特性。

模块化UPS的出现,大大提升了维护时间,可以在线扩容、在线维护。在安装、运输、维护等使用方面均已不再是供电系统可用性的短板。THDi在5%以内,效率进一步提升达96%以上。最新一代的华为模块化UPS,功率密度可以做到单柜320kW,一个模块40 kW /3U,使得原来需要2个机柜时,目前只需1个机柜。这是一个重大突破,减少设备占地面积,同时考虑到靠墙放置,减少维护占地面积,客户实际使用的空间减少了近70%!这在当前寸土寸金的时代,无疑是巨大的商业价值。

可靠性合理设计是UPS可用性的基石

业界对可靠性的定义为:产品在规定条件下和规定时间内完成预定功能的概率。

华为UPS5000-E系统,采用Markov模型可靠性建模方法,采用串联、并联、S中取T等基本结构的Markov模型求解,获得由并联的失效率和等效修复率。从系统最低层次的单元开始逐级往上,计算出一级串联、并联、S中取T模块的等效失效率和等效修复率,重复这一过程直至获得整个系统的等效失效率和等效修复率,最后得到系统的MTBF、可用度和宕机时间等可靠性指标。

对于FIT(失效率)值极低的部件可不设冗余,对于较易损坏或需经常维修、维护的部件采取多重冗余,如AC-DC\DC-AC,做到精细化可靠性分配,达到最佳的可靠性投资分配。通过计算,华为UPS5000-E单机系统,MTBF可达263821小时,可用性更高达99.9999%。大幅高出业界平均水平。

可维护性是UPS可用性的生命线

据统计,多数UPS设备供应商,每年需要完全下电一次进行预防性维护,根据系统配置的不同,该维护一般每年需要 1 到 4 小时的计划停机。系统设计必须允许同时维护电源系统的所有组件(包括 UPS 和配电设备),一部分 UPS 维修时,可使用其余的 UPS 系统向负载供电。显而易见,这时就需要多总线的解决方案。然而事实未必如此,很多用户业务也不允许这样维护。而解决方案也很简单,正如上文阐述,需要从可靠性分配模型进行分析其可维护性,这也是UPS的可用性的生命线。你会发现主要需要维护的部分是功率模块部分及控制模块部分,而这些部件华为UPS5000-E均提供热插拔功能(现场维护时间小于2min),可进行在线维护,守住了UPS系统维护性的生命线。

易用性是UPS可用性的升华

实际UPS的使用过程中,99%以上的时间是不需要维护人员参与的。需要人为参与的时候,除了上述的维护时间以外就是扩容、安装、巡检维护等情况,这些时候最能体现UPS是否好用、易用。

先说扩容,通常UPS塔式机型的扩容。其可操作性的难度是显而易见的。由于塔式UPS的输入输出连接是固定的,自成一体,如若更换升级则需要端到端的改变。需要增加UPS输入、输出路数、更换电缆规格、重新布置UPS摆放空间……等于重新开展一个UPS安装工程,难度还高于全新的安装。花费时间至少在48小时以上。

而华为UPS5000-E的扩容则不然,只要选择了充足容量的柜体,一旦需要扩容,则只需立在几分钟内完成了扩容,这便是UPS好用与否的直接体现。

易用与否还体现在运输和搬运过程中。塔式机型往往体积大,重量达几吨,不易运输和搬运,经常会在运输过程造成损坏,而这种损坏有些可以目测,有些需要上电才能发现,甚至上电运行一段时间才能发现。发现问题后,需停机维修或更换,费时费力。华为UPS5000-E则采用标准的IT机柜,模块化设计,可拔出模块运输,也可一体化运输和搬运,大大避免了上述的风险。这又是UPS好用的一个体现。

另外,在安装过程中也能体现出UPS的好用与易用。在实际安装现场,体积大、重量达几吨,无法进电梯,只得通过打墙、吊车出动,方才可以完成这样的安装工程。能否将UPS化整为零,从“门”进入安装现场呢?可以。华为UPS5000-E按照标准机柜设计,完全可以 “自由进出”电梯或机房的门,简易灵活。

最后,当UPS投入正常使用后,用户需要接触UPS的时候就是例行巡检。那么怎样的巡检的人机界面是友好的、方便的、节约的呢?首先,机器要占地面积小,空间越大越浪费资源。在当今地租昂贵的年代,一个机架一年的出租费用就在6-7万元,如果再加上维护空间,就要乘以3倍。这就体现在UPS的功率密度上。而华为UPS5000-E是功率密度高的机型。40kW/3U,大大节约占地空间。其次,能否前维护?如果维护UPS还需要前后维护,那么就直接提高了维护时间和难度,增加维护期间的断电风险。同时还要看是否可以靠墙安装。靠墙安装可以节省维护空间。华为UPS5000-E就是秉承上述理念,完全前维护、支持靠墙安装。

结束语:

随着大数据时代爆发式的发展,供电系统的可用性日益重要,华为高可用性UPS顺应了时代需求,为供电系统的安全可用保驾护航.


HUAWEI high availability UPS: fully protect the power supply of data center in cloud Era

Introduction: With the development of information age, the Internet Data Center (IDC), the Enterprise Data Center and the Data Backup Center for Data Security, which aim at developing data hosting business, have been developed rapidly. The telecommunication operators, the banking and financial systems, the government and all kinds of data backup centers have been developed rapidly. Large enterprises and so on are building the large data center computer room. How to ensure the safe operation of such a highly centralized computer room is the primary topic in the design of power supply system.


As we all know, almost all operating loads in the data center are powered by UPS, ensuring that UPS runs in the safest mode is the core of the security of the power supply system in the data center computer room. Therefore, it is very important to actively introduce advanced design concepts and mature power design technology to systematically design UPS power supply system in data center and improve the availability of UPS power supply system in data center computer room.


What is usability? Academically, usability refers to the extent to which a product is in a working or usable state when it needs to start and execute at any random time.




In the industry, the number of "9" is usually used to represent the availability of the system. It refers to the proportion of time that the system runs online and can be produced within a year. The availability of the system is 4 "9", which means 99.99% availability, i.e. the system may have a downtime of less than 53 minutes per year.  Five "9" (99.999% availability), i.e., the possible downtime is less than 5.3 minutes per year. The 6 "9" (availability can reach 99.9999%), that is, the possible downtime of each year is less than 32 seconds. The goal of UPS system is to improve the availability of UPS power system and reduce the impact from power supply.


For users using UPS, usability means that UPS is hard to use, easy to use, and low enough to use from the life cycle. We can draw a conclusion from the development process of UPS.


From the development history of UPS, usability has become the driving force for the development of UPS.


The oldest UPS is dynamic UPS, which uses mechanical energy storage and energy transmission mechanism of generators and motors to provide uninterrupted power supply. It is bulky, expensive and noisy. It is like a small power plant. It occupies a large number of site resources, is neither environmentally friendly, nor easy to use, closer to a project rather than equipment.


Power frequency machine as the next generation of UPS reduces the volume, but there are still huge problems in installation and transportation. Because of the huge size of the door can not be passed, built-in isolation transformer caused too heavy to use elevators to transport, installation of such UPS often have to wall installation, crane transport. At the same time, its maintenance is also very difficult, such a huge machine, any part of the fault, have to turn to maintenance bypass for maintenance, this will cause business interruption risk greatly increased, become the availability of power supply infrastructure shortboard.


The emergence of high-frequency machines further increased power density, reduced volume by 50%, improved maintenance from the functional modules, shortened MTTR time, can be repaired in a few hours. The weight is further reduced than the power frequency machine, which effectively improves the installation ability of the project. At the same time, most of the high-frequency machines have adopted modular design, and there have been great improvements in maintainability. THDi can achieve less than 5%, significantly reduce the harmonic pollution of the grid, and further improve the efficiency to 92-94%, reflecting its energy-saving advantages. Even so, there are no characteristics of DC power supply such as on-line capacity expansion and online maintenance.


The emergence of modular UPS greatly improves maintenance time and can be extended online and maintained online. In the installation, transportation, maintenance and so on, they are no longer the short boards for the availability of the power supply system. THDi is less than 5%, and its efficiency is further improved by more than 96%.  The latest generation of Huawei modular UPS, power density can achieve a single cabinet 320 kW, a module 40 kW / 3U, so that the original need for two cabinets, now only one cabinet. This is a major breakthrough to reduce equipment footprint, while taking into account wall placement, reduce maintenance footprint, the actual use of customer space reduced by nearly 70%. This is undoubtedly a huge commercial value in the current era of land and money.


Reliability and reasonable design is the cornerstone of UPS usability.


Reliability is defined as the probability that a product performs its intended function under specified conditions and within specified time.


In Huawei UPS5000-E system, the Markov model reliability modeling method is adopted, and the Markov model of series, parallel and S middle T is used to solve the problem. The failure rate and the equivalent repair rate of parallel connection are obtained. From the lowest level of the system, the equivalent failure rate and the equivalent repair rate of T modules in series, parallel and S are calculated step by step. The process is repeated until the equivalent failure rate and the equivalent repair rate of the whole system are obtained. Finally, the MTBF, availability and downtime of the system are obtained.


There is no redundancy for components with very low FIT (failure rate). Multiple redundancies, such as AC-DCDC-AC, are adopted for components that are easy to be damaged or need frequent maintenance and maintenance, so as to achieve fine reliability allocation and optimal reliability investment allocation. Through calculation, HUAWEI UPS5000-E single machine system, MTBF can reach 263821 hours, and the usability is higher than 99.9999%. Substantially higher than the industry average.


Maintainability is the lifeline of UPS usability.


According to statistics, most UPS equipment suppliers


推荐

  • QQ空间

  • 新浪微博

  • 人人网

  • 豆瓣

取消
建站ABC 建站ABC提供技术支持