cs161 2004 Lecture 8: RAID

Lab 2:
 Questions?
 Read the source
 Collaboration - don't show your source
 socketpair()
 Subprocesses for reading
  pass the data back?
  mmap()
  piece by piece?
  multiple outstanding requests?
  how many subprocs?
  fix open/stat?
  fix the supplied code?
  memory / fd rules
  exec/fork/threads?
  waitpid()?
 Questions?

The rename() mystery
 Applications want the "all or nothing" property, not the "old or new"

Why are we reading this?
  Not just to understand RAID (it's really not that complicated)
  To understand how to attack bottlenecks
  As a case study in evolving design

So these guys invented RAID, huh?
 No, people are already selling them.
 Good salesmen, they just categorized and codified it.

Amdahl's Law: If you speed up a subsystem that responsible for only a
fraction of your total time, your speedup only applies to that
fraction.

Level 0:
 Aggregate volumes (no redundancy)
 Pro: multiple outstanding reads & writes
 Con: decreased MTTF (MTTF / disks) 

Basic idea to fix reliability, put disks in groups, add in some check
disks.  As long as only one disk dies at a time, no data loss.

D = total DATA disks
G = DATA disks / group
C = CHECK disks / group
n_G = # groups

What's the new MTTF?
 MTTF(Level-0) * 1/prob(failure during MTTR)
 prob(failure during MTTR) = MTTR / (MTTF(disk)/(G+C-1))

Each time, these assume: D=100, G=10, MTTF(disk)=30,000hr (3.5yr) MTTR=1hr

Workload
 Supercomputing: Throughput
 Transaction Processing: Individual I/O


Level 1: 500yr
 Mirror
 Performance is shown as ratios to what one disk would have done
 Examples: Reads=2D/S Writes=D/S
 What is S? (slowdown of having to wait for the worst case)
 Why not shown for small reads/writes?

Level 2: G=10: 50yr  G=25: 12yr
 Hamming over disks D=10,C=4  D=25,C=5 (note, hamming CORRECTS, parity detects)
 Identifies failing disk, and corrects
 Large reads/writes = D
 Small reads/writes, divide by G (Dismal)

Level 3: 90yr, 40yr (all same now, up b/c fewer total disks)
 Assumes identifying is easy, just corrects
 1 check disk
 Performance is "the same", but really it improves per disk (so, per $)

Level 4:
 Interleave blocks, not bytes, so a single disk can return a whole sector
 What did we fix? small reads
 Remaining problem, we always hit the check disk on writes
 D1 D2 D3 D4 C
 1d          1c
    2d       1c
       3d    1c
          4d 1c
 5d          2c
    6d       2c
       7d    2c
          8d 2c


Level 5:
 Distributed check disk
 What did we fix? small writes (no longer bottlenecked)
 D1 D2 D3 D4 D5
 1d          1c
    2d       1c
       3d    1c
          4d 1c
 5d       2c
    6d    2c
       7d 2c
          2c
          2c 8d   

Hardware vs Software
 hardware can do xor work in parallel (avoid "copying" in CPU)
 hardware can have all disks RAID / software must boot somehow
 bus bandwidth?
  hardware can send only the logical data (magic bus in the RAID)
  software can keep check data in mem, avoid reread
   so you really want hardware with a cache, located near disks

Problems?
 Disk failures are not independant
 Identical disks?
 Somehow I still hear story after story about RAID arrays being corrupted
 What about reconstruction time? (This affects MTTR!)
 What about performance during reconstruction?

Conclusions?
 For small number of disks, RAID 0 seems fine (you still have to backup)
 For 10s of disks, RAID 1 if you can afford it
 else RAID 5