Availability

Availability is the ability of the loss or a reduction in accessibility of elements of a distributed system. By proper use of Data Integrity# and Authentication Exchange#, it could counter Denial of Service (DoS)#.

Some would discuss Availability in terms of 24/7/365, meaning the service will be up 24 hours a day, 7 days a week, 365 days a year. Nines could be used to refer Availability in percentage, such as four nines referring to 99.990%. However, due to the ever distributed networks, devices, and services, defect per million (DPM) is adopted instead to measure Availability.

AvailabilityNinesDPMDowntime Per Year
99.000%Two Nines10,0003 days, 15 hours, 36 minutes
99.900%Three Nines1,0008 hours, 46 minutes
99.990%Four Nines10053 minutes
99.999%Five Nines105 minutes

A high availability service will prevent financial loss and productivity loss, reduces reactive support costs (the cost to fix when things break), and improves customer satisfaction and loyalty.

The availability of the service could be disrupted by operational errors, network equipment failures, software failures, and security holes. Examples of operational errors, which is typically human errors, are the result of poor change-management processes, and lack of training and documentation. Network equipment failures include hardware failures, power outages or service provider outages, overheating and backhoe (type of excavating equipment or digger). Causes of software failures could be software crashes, unsuccessful switch-overs, or latent code failures.

Design practices could improve the availability of a service including

  • hardware redundancy to avoid single point of failure
  • software availability such as Spanning Tree Protocol (STP) and Hot Standby Router Protocol (HSRP)
  • network/server redundancy
  • link/carrier availability#
  • clean implementation/cable management
  • backup power/temperature management
  • network monitoring simply network design,
  • change control management (testing before applying changes)
  • training
  • backup/automatic recover
  • security posture and policies
Links to this page
  • Switched LAN Architecture

    In a typical Local-Area Network (LAN)# network, it is designed as a Hierarchical Network# involving multiple Switches# with three layers: core layer, distribution layer, access layer. Access layer consists of end devices, and defines a basic configuration and constraint on them on network connection. Distribution layer is where we will define routing policies, Virtual LAN (VLAN)#, access control, broadcast domains, and access layer traffic aggregation or funnelling. Core layer should be capable of handling large amounts of traffic with high performance, this means that it needs to be highly available and redundant. For cost saving, we can collapse or combine distribution layer and core layer into one single layer.

  • Stackable Switch

    Stackable #Switches are interconnected with a backplane cable that provides high-bandwidth throughput between switches using technologies such as Cisco StackWise. They will operate as a single entity, which is quite desirable for building a highly available network.

  • Security Service

    Security Service is a service that improve the security data processing system and/or information transfers. It needs to at least secure four elements: confidentiality#, authenticity#, integrity#, and availability#. It has Security Mechanism# implemented in order to fulfil its promises and prevents potential Security Attack.

  • Network Reliability

    To achieve high reliability# with network traffics, we need to address the following issues:

#security #networking