# Storage + B-Trees View as a PDF

# Question 1

Please provide short answers to the following questions:

- Why might we want to store relations in separate files?
- Why might we want to store all relations, or even an entire database, in the same file?
- When might the leaf nodes of a B+-tree file organization lose sequentiality? Suggest how the file organization may be reorganized to restore sequentiality.
- What are the three parts of a slotted page structure header?
- Why does the data and keys in a slotted page block grow from either end?
- How does columnstore make file compression easier? What feature of column stores makes interacting with single tuples expensive?

# Question 2

## 2.1 Storage

Consider the following schema for an online game distributor:

CREATE TABLE games ( gameId INT PRIMARY KEY, gameTitle VARCHAR(48), distributorId INT, price FLOAT );

You can assume the following:

- The table contains 1024 tuples in total.
`INT`

s are 8 bytes long,`VARCHAR(48)`

s are 48 bytes long, and`FLOAT`

s are 4 bytes long.- All attributes of a tuple are stored in contiguous space within the same disk block.
- A disk block size is 512 bytes.
- The disk on average performs the sequential read at 1ms per disk block and the random read at 10ms per disk block.

- What is the maximum number of tuples that can be stored in a disk block?
- What is the minimum number of disk blocks that need to be allocated to store all tuples in the table?
- What is the minimum time to read all tuples (in no particular order), assuming that the minimum number of disk blocks are allocated?

## 2.2 Indexing

*Note: we've removed extraneous information from this problem.*

Suppose that a secondary index of B+Tree is created on the `distributorId`

column. Assume the following:

- The tree has an order of 5 (up to 4 keys, 5 children) and each leaf node is 60% loaded.
- You can assume distributorId is a candidate key with unique values for simplicity.

- How many leaf nodes does this secondary index require?
- How many disk reads (including index href=" and tuple retrieval) in the worst case are required to find a tuple by its distributorId? Hint: find the height of the tree.

# Question 3

## 3.1 Insertion

Starting with the given B tree (degree = 3), insert all of the following keys in the given order and show the resulting tree at the end. In the case of equal keys, insert to the left.

[11, 15, 3, 18, 20, 4, 6, 18, 8, 3, 11]

## 3.2 Deletion

Starting with the given B tree (degree = 4), remove all of the following keys in the given order and show the resulting tree at the end. (11pts)

[15, 11, 18, 3, 8, 5, 3, 7]

# Question 4

## 4.1

Consider the B-tree in 3.2 before we removed the entries. Suppose we want to retrieve all values between 7 and 19 that reside in a leaf node (assume that values in internal nodes only exist for organization purposes; they don't represent real tuples). Assuming our B-tree edges only point downward (i.e. we cannot traverse back up an edge), what is the least number of tree href="es it would take to perform this href="?

## 4.2

We've decided we need a faster way to href=" for a RANGE of records! Suppose we connected each leaf node in our B-Tree with a pointer to the leaf node directly to the right of it. How many tree href="es will our query now take and in which order will we visit all nodes?

## 4.3

Alternatively, instead of using the solution we proposed in 4.2, we could make all edges bidirectional. This allows us to visit neighbouring nodes by traversing up to an ancestor node, then traversing back down. Is it better to make every branch a doubly-linked pointer, or is our linked list solution better? Why?

# Feedback

As this is a new course, we appreciate any feedback you have for us! If you enjoyed this assignment, hated this assignment, or have other thoughts to share, please do so here!