Cloud Storage Infrastructures Report (Assessment)

Exclusively available on Available only on IvyPanda® • No AI

Table of Contents

Brief History of Data Storage
Storage Bandwidth, IOPS and Latency
Comparison of Protocols
Benefits and Functions of a Unity 450F Flash Storage Array
Disaster Recovery
References

Brief History of Data Storage

The history of data storage devices can be briefly outlined as follows (Foote, 2017):

The late 1950s – the early 1960s – hard disk drives. These work via magnetization of the film made out of ferromagnetic materials. HDDs are still used nowadays in most computers. Although their initial capacity was small, contemporary HDDs can contain terabytes of information.
1966 – semiconductor memory chips; these stored data in small circuits, which were called memory cells. Their capacity was 2,000 bits.
1969 – floppy disks – disks that were 8 inches in size; consisted of magnetic film stored in a flexible case made of plastic. Initially, their capacity was nearly 80 kB; later, more voluminous disks were created.
1976 – a new, 5.25-inch model of floppy disks; a smaller version of the large floppy disk. The new disk’s capacity was 110 kB.
The 1980s – yet another model of a floppy disk, 3.5 inches in size. Their capacity started at 360 kB.
1980 – first optical discs. (The spelling with “c” instead of “k” refers to optical rather than magnetic storage devices.) Using the principles of optical data storage, CDs and DVDs were later developed. Their initial capacity was several hundreds of megabytes; contemporary optical disks can store tens of GBs.
1985 – magneto-optical disks (5.25 inches and 3.5 inches). These employed optical and magnetic technology simultaneously so as to store information. Their capacity started from 128 MB to several GBs.
2000 – flash drives. Consist of chips and transistors. First flash drives had the minimal capacity of several hundred MBs; contemporary flash drives can store hundreds of GBs.
Cloud data storage – this new technology utilises remote servers for storing data that can be accessed via the Internet. The capacities of clouds are extremely large, for there can be numerous servers and storage arrays on which to keep the data.

Storage Bandwidth, IOPS and Latency

Storage Bandwidth, Storage IOPS, and Storage Latency

On the whole, storage bandwidth (which is also called storage throughput) is the maximum of data that can be transferred to or from a storage device in a unit of time (Crump, 2013). For instance, it can be measured in MB/sec, so the bandwidth of 100 MB/sec means that the storage device can transfer 100 Megabytes of information each second.

However, storage bandwidth is an omnibus notion that does not take into account several factors; in fact, it only shows the maximum amount of throughput per second. In this respect, it is important to explain the notion of IOPS (Somasundaram & Shrivastava, 2009, pp. 45-47). IOPS stands for “input/output operations per second”, and denotes the maximal number of storage transactions that can be accomplished each second on a given data storage device (Crump, 2013). The greater the IOPS, the larger number of transactions can be done per second; however, the actual rate of the transaction also depends upon the size of the pieces of input. So, in general, bandwidth = average size of input or output × IOPS (Burk, 2017). Also, it should be noted that IOPS is limited by the physical characteristics of the data storage device.

In addition, every input or output request will take a certain amount of time to finish; the mean amount of time to do this is called the average latency (Burk, 2017). Latency is usually measured in milliseconds (10^-3 seconds), and the lower it is, the better (Burk, 2017). In storage devices, latency depends upon the amount of time it takes the reading/writing head to find an appropriate place on the drive where the required data is stored or is to be stored (Crump, 2013). Rotational latency is equal to the half of the time needed for a full rotation, and, therefore, depends on the rotation speed of the drive (Somasundaram & Shrivastava, 2009, p. 46).

Thus, on the whole, IOPS shows the number of transactions of data (and IOPS may depend on latency, which is a property of the hardware), but it does not take into account the amount of data transferred per a transaction (Crump, 2013). Therefore, IOPS on its own is not enough to assess the rate at which a storage device can work; it is also needed to take into account the latency, the bandwidth, and the average input/output size (Burk, 2017).

Costs, Limitations, and Administrative Controls

When it comes to the need to meet the demand for the workload of a given storage device, it is paramount to take into account the IOPS of that device. For instance, Somasundaram and Shrivastava (2009) provide an example when the capacity requirements for a system of storage devices are 1.46 TB, but the maximum workload needed is estimated to be nearly 9,000 IOPS (p. 48). Simultaneously, a 146 GB drive may provide only 180 IOPS. In this case, it will be needed to use 9,000/180 = 50 disks only to meet the workload demand, although the capacity demand would have been met if merely 10 disks were employed (Somasundaram & Shrivastava, 2009, p. 48). Therefore, in order to work around the physical limitations of disks that do not allow for large IOPS of a storage device, several drives can be used, but this, clearly, greatly increases the cost of the storage system.

As for the bandwidth, it should be noted that in general, it is possible to reach quite high values of the bandwidth in a storage device (Somasundaram & Shrivastava, 2009). Also, when it is needed to use multiple storage devices connected together, it might be needed to choose between hubs and switches. For instance, fabric switches allow for using a full bandwidth between several pair ports, which increases the speed; however, this may be rather costly. On the other hand, the utilisation of hubs is significantly cheaper, but they only offer shared bandwidth, so hubs may be considered primarily as a low-cost solution for the need to expand via the process of connectivity (Somasundaram & Shrivastava, 2009, p. 124).

Finally, when it comes to the latency, it is stated that low latency is considered one of the main performance killers in storage devices (Poulton, 2014, p. 437). Achieving high levels of IOPS is useless unless latency is not adequate to that figure (Poulton, 2014, p. 438). Therefore, when considering the overall performance of a storage device or system, it is pivotal to ensure that the latency – and, consequently, the rotation speed – are adequate to the IOPS of that device or system.

Comparison of Protocols

SCSI

When it comes to Small Computer Systems Interface (SCSI) protocol, it should be pointed out that this protocol is generally utilised by operating systems with the purpose of conducting input and output operations to data storage drivers and peripheral devices. SCSI is typically used in order to connect tape drives and HDDs, but can also be employed to connect an array of other devices such as CD drives or scanners. On the whole, SCSI is capable of connecting up to 16 devices in a single data exchange network (Search Storage, n.d.).

One of the important limitations of the SCSI protocol is the above-mentioned limit on the maximal number of devices which can be connected so as to form a single network; for instance, SAS can connect up to 65,535 devices in a single network (by using expanders), in contrast to SCSI’s mere 16 devices (Search Storage, n.d.). The performance of SCSI is also inadequate for numerous purposes, in which case it is possible to utilise iSCSI; the latter preserves the command set of SCSI by employing the method of embedding the SCSI-3 protocol over the Internet protocol suite while also offering certain advantages over SCSI (Search Storage, n.d.).

FCP

The Fibre Channel Protocol (FCP) is a SCSI interface protocol which employs a fibre channel connection, that is, a connection utilising optical fibre cables which are capable of transferring data with high speed; initially, the offered throughput was 100 MB/s, but modern variants can provide the speed of several gigabytes per second (Somasundaram & Shrivastava, 2009, p. 118). Therefore, one of the major differences between SCSI and FCP is the cables they use (SCSI employs Ethernet cables).

FCP is most commonly utilised for establishing a storage network, and thanks to the high speed of transfer, it is capable of providing quick access to the data located on various devices in the network (Ozar, 2012; Somasundaram & Shrivastava, 2009). It also allows for significantly increasing performance, for it excludes the possibility of interference between storage and data traffic (Ozar, 2012). Nevertheless, the speed of the connection is still limited, for it is lower than what one can get using a single computer (Ozar, 2012). In addition, the process of establishing, configuring and troubleshooting an FCP network may be rather tedious (Ozar, 2012).

iSCSI

On the whole, the Internet Small Computer Systems Interface (iSCSI) is a standard of networking that is based on IP, which encapsulated the SCSI protocol, and which supplies block-level access to a variety of devices for storing data via transferring the commands of Small Computer System Interface through a TCP/IP network. In other words, the utilisation of an iSCSI entails mapping the storage via an Internet protocol suite (Ozar, 2012). In an iSCSI, every storage device and each of the servers which are connected to the network possess their own IP addresses, and a connection to a device which holds the required data is established by using the method of specifying an IP address that is associated with the needed storage device or drive. It is also noteworthy that Windows displays each of the drives connected to an iSCSI network as a separate hard drive (Ozar, 2012).

It should be observed that an iSCSI (for instance, a 1-gigabyte iSCSI) is comparatively cheap, and it is rather easy to configure due to the fact that a 1-gigabyte network switch infrastructure is already present when it is utilised. Nevertheless, there exist limitations to an iSCSI: it is quite slow, and because of this, it is generally not appropriate for an SQL server because of the temporal length of the operations conducted via an iSCSI (Ozar, 2012). Nevertheless, an iSCSI can be effectually used for virtualisation, for the latter does not require a considerable amount of storage throughput.

NAS

NAS (network-attached storage) is a computer data storage server and file-sharing device that is capable of being attached to a local network (Somasundaram & Shrivastava, 2009, p. 149). This device supplies multiple benefits, such as the consolidation of a server (one NAS is used instead of multiple servers), and the provision of file-level data sharing and access (Somasundaram & Shrivastava, 2009). NAS is attached to computers via a network; the network protocols used in this case typically include TCP/IP with the purpose of organising data transfer, as well as NFS and CIFS for managing remote file service (Somasundaram & Shrivastava, 2009).

Generally speaking, NAS devices supply a shared file service in a standard internet protocol network; they can also consolidate a number of multi-purpose file servers (Poulton, 2014). NAS devices offer multiple benefits when compared to other storage systems. For instance, in comparison to general-purpose servers, they focus on file serving and provide comprehensive and high availability of information, increased flexibility and efficiency, as well as centralised storage and simplified management procedure (Somasundaram & Shrivastava, 2009). However, NAS devices also have some limitations; for instance, they consume a considerable proportion of the bandwidth in the TCP/IP network. The fact that Ethernet connections are lossy also means that network congestions will occur sooner or later; this often makes the high performance of the network on the whole critical.

FCoE

Fibre Channel Over Ethernet (FCoE) protocol is quite similar to FCP, only it utilises Ethernet cables in order to carry out the protocol; more specifically, 10 gigabyte Ethernet cables are employed (Ozar, 2012). Generally speaking, FCoE carries out input and output operations over the network by using a block access protocol (Hogan, 2012). In addition, unlike iSCSI, FCoE does not employ the method of IP encapsulation, relying on the Ethernet instead, but retaining its independence from the forwarding scheme which is used by the Ethernet; however, FCoE is similar to iSCSI in other respects (Hogan, 2012; Somasundaram & Shrivastava, 2009, p. 186).

When it comes to the limitations of FCoE, it should be noted that FCoE is rather difficult to configure, and requires conducting the procedure of zoning at the FCoE switch level, after which it is needed to carry out the LUN masking process (Hogan, 2012). In addition, FCoE does not supply visualised MSCS support.

As for the utilisation of FCoE, it is mainly implemented in storage area networks (SANs) in data centres due to its usefulness with respect to reducing the total amount of cabling needed in such centres (Poulton, 2014). FCoE also comes in handy when there is a need for a server visualisation application because these often need a large quantity of physical input/output connections for each of the servers connected (Somasundaram & Shrivastava, 2009).

Benefits and Functions of a Unity 450F Flash Storage Array

Principles of Functioning

On an all-flash storage array, the data is persisted in flash cells (single-level, multi-level, or, very rarely, triple-level cells), which are grouped in pages, which are then grouped in blocks. Initially, all the cells have a value of 1; it can be changed by a “program” operation (application of low voltage); however, erasure is only possible on a block-level (high voltage to a whole block; Poulton, 2014). Flash cells eventually fail due to physical wear; therefore, information is commonly purposefully backed on redundant memory cells that are hidden from users (Poulton, 2014).

Storage arrays consist of front-end ports, processors, the cache, and the backend (Poulton, 2014). Front-end ports typically utilise FCP or Ethernet protocols (FCoE, iSCSI, SMB, or NFS); it should be noted that in order for a host to use resources from the storage array, it is necessary for that host to employ the same access protocol (Poulton, 2014). After the ports, the processors are located; these run the storage array’s firmware, deal with input/output and move it between ports and the cache. The cache, in turn, is located after the processors, and its main purpose is accelerating the performance of the array, which is paramount for mechanical disk-based arrays, and also quite important in flash storage arrays (Poulton, 2014). Finally, after the cache, the backend is to be found; in this part, additional processors may be located (or everything might be controlled by the processor from the front end), as well as the ports connecting the cache to the storage drives (Poulton, 2014).

The disks are divided into logical volumes. This is done via partitioning, that is, separating the physical volume into several regions and memorising the capacity of these areas, their locations, and the addresses of the information clusters inside it (Somasundaram & Shrivastava, 2009). The information about partitioning is stored in a special area of the physical storage drive which is labelled “a partition table”, and is accessed prior to any other part of the physical volume.

In a network, the storage array is accessed via a protocol. As has been noted, Ethernet protocols (FCoE, iSCSI, SMB, or NFS) or FCP are typically employed for this purpose (Poulton, 2014). Although it is important to take into account which protocol is currently being used, the logical partitions within the flash storage array may often be presented to the host as separate volumes for information storage (in iSCSI, for example). On the other hand, if file sharing is carried out in the network, the author of a file usually identifies what type of access other users will have to that file, and carries out the control over changes made by these users to that file (Somasundaram & Shrivastava, 2009).

Benefits

When it comes to a Unity 450F flash storage array, it should be noted that such an array may include from 6 to 250 drives, each of which has the memory of 128 GB (Dell EMC, n.d.; Dell EMC, 2017). As a flash drive array, it permits access to the files stored in the drives at a greater speed, featuring enhanced productivity when compared to hard disk arrays (Poulton, 2014). Finally, the presence of redundant (hidden) memory allows for making sure that the data which is stored in the drive will not be lost due to the wear of flash blocks or because of a failure of a controller (Poulton, 2014).

Disaster Recovery

Principles of Disaster Recovery

The term “disaster recovery” in the context of data storage refers to the actions, processes and procedures utilised in order to allow an organisation to recover or continue using key technological systems and infrastructure after a disaster (natural or technogeneous) has occurred. Generally speaking, disaster recovery planning starts with an analysis of potential business impacts; at this point, it is paramount to define a recovery time objective (RTO, the maximal acceptable period of time during which technologies can be offline) and a recovery point objective (RPO, the maximal acceptable period of time during which data may be missed from a piece of technology) (Google Cloud Platform, 2017). Next, it is recommended to create and follow a disaster recovery plan (Google Cloud Platform, 2017). Best practices include: identifying recovery goals; designing for full, rather than partial, recovery; making tasks and goals specific; introducing control measures; integrating standard security mechanisms; ensuring that the software remains currently licensed; maintaining several ways of data recovery; and regularly testing the recovery plan (Google Cloud Platform, 2017). It is also paramount to ensure that there are no SPOFs (single points of failure), i.e., no parts of the system the failure in which will cause the whole system to fail to exist.

Protecting from SPOF: Synchronous Remote Replication

In data storage disaster recovery, it is pivotal to make sure that a business does not lose all data due to a SPOF, for instance, if all its servers are located in the same place. For this purpose, it is possible to employ a method of synchronous replication of data. The crux of this method is that all the data which is stored on a storage array is also immediately transferred to a different array that is situated in a different place so that an exact replica of the data is created in a remote location (Poulton, 2014, p. 295). Thus, in the process of synchronous replication, the data is saved on a storage array, and then also copied on the remote server; the remote server sends confirmation that the data has been gained, and only upon gaining this confirmation, the process of writing is considered finished (Poulton, 2014). One of the large advantages of this method is that it is a method of zero data loss; however, the need to wait for a confirmation from an external server means a considerable drop in the performance of the storage array (Poulton, 2014).

Protecting from SPOF: Local Replication

The method of local replication refers to the creation of a backup copy of the data in the same array or data centre so that if the data is destroyed in the primary storage location, the exact replica of the data would be saved in the target LUN, that is, in a reserve location (Somasundaram & Shrivastava, 2009, pp. 283-284). Local replicas can be utilised as an alternate source of backup, for data migration, as a testing platform, and so on. These replicas can prevent the loss of data in case of failure of the main server or array (Somasundaram & Shrivastava, 2009).

There are numerous methods for local replication. For instance, host-based local replication can be employed. During it, file systems or logical volume managers (LVM) carry out the process of local replication (Somasundaram & Shrivastava, 2009). These LVMs create and control logical volumes at a host-level; logical volumes are mapped to two different physical partitions of the physical storage drive; data from both volumes can be accessed independently of one another (Somasundaram & Shrivastava, 2009). This allows for preserving information from a logical volume if one of the physical volumes in which the information from that logical volume is stored suffers from damage or loss of data due to any reason.

References

Burk, C. (2017). Storage performance: IOPS, latency and throughput. Web.

Crump, G. (2013). What is Latency? And how is it different from IOPS? Web.

Dell EMC. (n.d.). Dell EMC Unity 450F all-flash storage. Web.

Dell EMC. (2017). Dell Emc Unity all-flash storage. Web.

Foote, K. D. (2017).A brief history of data storage. Web.

Google Cloud Platform. (2017). How to design a disaster recovery plan. Web.

Hogan, C. (2012). Storage protocol comparison white paper: Technical marketing documentation, v 1.0 / Updated April 2012. Web.

Ozar, B. (2012). Storage protocol basics: iSCSI, NFS, Fibre Channel, and FCoE. Web.

Poulton, N. (2014). Data storage networking. Indianapolis, IN: John Wiley & Sons.

Search Storage. (n.d.). SCSI (Small Computer System Interface). Web.

Somasundaram, G., & Shrivastava, A. (Eds.). (2009). Information storage and management: Storing, managing, and protecting digital information. Indianapolis, IN: Wiley Publishing.