# Question 1

Consider the following relation:

employee_id employee_name job_title years_exp department
1 Nick HTA 1 CS
2 Desmond HTA 2 CS
3 Stan Professor 30 CS
4 Junchi UTA 1 CS
5 Huiyuan UTA 2 CS

Given the above relation, provide the following:

1. A maximal super key (a super key with the largest possible size).
2. All possible candidate keys.
3. Which candidate key should we choose as our primary key (justify your response).

# Question 2

Consider the following relational schema:

student (student_id, name, age, year)
student_account (student_id, tuition, course)


Use relational algebra to answer the following queries:

1. Find names of students who have a tuition charge greater than or equal to \$10,000.
2. Find student_ids of students whose age is over 20, graduate in 2025, and will take the "Databases" course
3. Find all of pairs of ids of students who have the same name.
4. Find all of the students that pay more in tuition than another student with the same name as them.
5. Find the sum of tuition paid for each school year.

# Question 3

The division ($$\div$$) relational operator identifies attribute values from a relation that can be paired with allof the values from the other relation. One way to think about it is that it is the inverse of the Cartesian Product ($$\times$$) relational operator: given relations $$R$$ and $$S$$ , $$(R \times S) \div R = S$$ . Another way to think about it is that the tuples returned are the elements of $$R$$ that are associated with every element of $$S$$ . For another definition, see page 61 in the textbook.

Given this operator, consider the following relational schema:

student (student_id, name, age, year)
course (course_id, course_name, department)
takes (student_id, course_id)


Use relational algebra to answer the following queries:

1. Which courses were taken by every student?
2. Which courses were taking by every student graduating in 2023?
3. Which students took every course in the CS department?
4. Which students took every course in the CS department except for "Databases"? Note that you should not return any students that have taken "Databases".

# Question 4

Consider the following selected tables from some schema:

Bees:

name cluster_no job
Barry Benson 24601 7
Buzzwell 24601 7
Lou Lo Duca 42 7
Bob Bumble 42 7
Janet Benson 24601 8

Clusters:

cluster_no established
42 2147483647
47 1630026000
24601 1658797200

In this question, we will explore how we can leverage properties of relational algebra to be more efficient.

1. For each relational query, write out the intermediate tables after each calculation step.
1. $$\sigma_{cluster\_no=24601}(\sigma_{job=7}(bees)) \bowtie clusters$$
2. $$\sigma_{cluster\_no=24601}(\sigma_{job=7}(bees \bowtie clusters))$$
2. How do the final tables for both queries compare?
3. Sum the number of rows for each query that you counted in (2). Which query required you to write the least number of rows along the way in total?
4. What qualities about your "least-expensive" query made it select less rows in total? Which operators were located where? Is there something particularly special about that?

## Note

For any tabular data in this assignment, you may want to consider using an online table tool to help you save time when you type up your answers.

# Feedback

As this is a new course, we appreciate any feedback you have for us! If you enjoyed this assignment, hated this assignment, or have other thoughts to share, please do so here!