Availability

Availability is the ability of the loss or a reduction in accessibility of elements of a distributed system. By proper use of Data Integrity# and Authentication Exchange#, it could counter Denial of Service (DoS)#.

Some would discuss Availability in terms of 24/7/365, meaning the service will be up 24 hours a day, 7 days a week, 365 days a year. Nines could be used to refer Availability in percentage, such as four nines referring to 99.990%. However, due to the ever distributed networks, devices, and services, defect per million (DPM) is adopted instead to measure Availability.

Availability	Nines	DPM	Downtime Per Year
99.000%	Two Nines	10,000	3 days, 15 hours, 36 minutes
99.900%	Three Nines	1,000	8 hours, 46 minutes
99.990%	Four Nines	100	53 minutes
99.999%	Five Nines	10	5 minutes

A high availability service will prevent financial loss and productivity loss, reduces reactive support costs (the cost to fix when things break), and improves customer satisfaction and loyalty.

The availability of the service could be disrupted by operational errors, network equipment failures, software failures, and security holes. Examples of operational errors, which is typically human errors, are the result of poor change-management processes, and lack of training and documentation. Network equipment failures include hardware failures, power outages or service provider outages, overheating and backhoe (type of excavating equipment or digger). Causes of software failures could be software crashes, unsuccessful switch-overs, or latent code failures.

Design practices could improve the availability of a service including

hardware redundancy to avoid single point of failure
software availability such as Spanning Tree Protocol (STP) and Hot Standby Router Protocol (HSRP)
network/server redundancy
link/carrier availability#
clean implementation/cable management
backup power/temperature management
network monitoring simply network design,
change control management (testing before applying changes)
training
backup/automatic recover
security posture and policies

Links to this page

Switched LAN Architecture

In a typical Local-Area Network (LAN)# network, it is designed as a Hierarchical Network# involving multiple Switches# with three layers: core layer, distribution layer, access layer. Access layer consists of end devices, and defines a basic configuration and constraint on them on network connection. Distribution layer is where we will define routing policies, Virtual LAN (VLAN)#, access control, broadcast domains, and access layer traffic aggregation or funnelling. Core layer should be capable of handling large amounts of traffic with high performance, this means that it needs to be highly available and redundant. For cost saving, we can collapse or combine distribution layer and core layer into one single layer.
Stackable Switch

Stackable #Switches are interconnected with a backplane cable that provides high-bandwidth throughput between switches using technologies such as Cisco StackWise. They will operate as a single entity, which is quite desirable for building a highly available network.
Security Service

Availability#

Security Service is a service that improve the security data processing system and/or information transfers. It needs to at least secure four elements: confidentiality#, authenticity#, integrity#, and availability#. It has Security Mechanism# implemented in order to fulfil its promises and prevents potential Security Attack.
Network Reliability

To achieve high reliability# with network traffics, we need to address the following issues: