NJIT eTD: The New Jersey Institute of Technology's electronic Theses & Dissertations
Title:
Data allocation in disk arrays with multiple raid levels
Author:
Xu, Jun
Document Type:
Dissertation
Department:
Department of Computer Science
Degree:
Doctor of Philosophy
Major:
Computer Science
Advisory Committee:
Thomasian, Alexander
Nassimi, David
Mili, Ali
Borcea, Cristian
Yang, Jian
Thesis Date:
2008, May
Keywords:
RAID
Data allocation
Availability:
Unrestricted
Abstract:

There has been an explosion in the amount of generated data, which has to be stored reliably because it is not easily reproducible. Some datasets require frequent read and write access. like online transaction processing applications. Others just need to be stored safely and read once in a while, as in data mining. This different access requirements can be solved by using the RAID (redundant array of inexpensive disks) paradigm. i.e., RAIDi for the first situation and RAID5 for the second situation. Furthermore rather than providing two disk arrays with RAID 1 and RAID5 capabilities, a controller can be postulated to emulate both. It is referred as a heterogeneous disk array (HDA).

Dedicating a subset of disks to RAID 1 results in poor disk utilization, since RAIDi vs RAID5 capacity and bandwidth requirements are not known a priori. Balancing disk loads when disk space is shared among allocation requests, referred to as virtual arrays - VAs poses a difficult problem. RAIDi disk arrays have a higher access rate per gigabyte than RAID5 disk arrays. Allocating more VAs while keeping disk utilizations balanced and within acceptable bounds is the goal of this study.

Given its size and access rate a VA's width or the number of its Virtual Disks -VDs is determined. VDs allocations on physical disks using vector-packing heuristics, with disk capacity and bandwidth as the two dimensions are shown to be the best. An allocation is acceptable if it does riot exceed the disk capacity and overload disks even in the presence of disk failures. When disk bandwidth rather than capacity is the bottleneck, the clustered RAID paradigm is applied, which offers a tradeoff between disk space and bandwidth.

Another scenario is also considered where the RAID level is determined by a classification algorithm utilizing the access characteristics of the VA, i.e., fractions of small versus large access and the fraction of write versus read accesses.

The effect of RAID 1 organization on its reliability and performance is studied too. The effect of disk failures on the X-code two disk failure tolerant array is analyzed and it is shown that the load across disks is highly unbalanced unless in an NxN array groups of N stripes are randomly rotated.

Complete Thesis:
njit-etd2008-071 (178 pages ~ 6,674 KB pdf)
Feedback:
Please complete this Feedback Form to inform us about your experience using this website. It will assist us in better serving your information needs in the future. Thank You!
Created October 9, 2008
To view these documents you will need the Acrobat Reader Plug-in. If you do not have it you can download it free from