Relational Algebra View as a PDF

Question 1

Consider the following relation:

employee_id	employee_name	job_title	years_exp	department
1	Nick	HTA	1	CS
2	Desmond	HTA	2	CS
3	Stan	Professor	30	CS
4	Junchi	UTA	1	CS
5	Huiyuan	UTA	2	CS

Given the above relation, provide the following:

A maximal super key (a super key with the largest possible size).
All possible candidate keys.
Which candidate key should we choose as our primary key (justify your response).

Question 2

Consider the following relational schema:

student (student_id, name, age, year)
student_account (student_id, tuition, course)

Use relational algebra to answer the following queries:

Find names of students who have a tuition charge greater than or equal to $10,000.
Find student_ids of students whose age is over 20, graduate in 2025, and will take the "Databases" course
Find all of pairs of ids of students who have the same name.
Find all of the students that pay more in tuition than another student with the same name as them.
Find the sum of tuition paid for each school year.

Question 3

The division ($\div$) relational operator identifies attribute values from a relation that can be paired with allof the values from the other relation. One way to think about it is that it is the inverse of the Cartesian Product ($\times$) relational operator: given relations $R$ and $S$ , $(R \times S) \div R = S$ . Another way to think about it is that the tuples returned are the elements of $R$ that are associated with every element of $S$ . For another definition, see page 61 in the textbook.

Given this operator, consider the following relational schema:

student (student_id, name, age, year)
course (course_id, course_name, department)
takes (student_id, course_id)

Use relational algebra to answer the following queries:

Which courses were taken by every student?
Which courses were taking by every student graduating in 2023?
Which students took every course in the CS department?
Which students took every course in the CS department except for "Databases"? Note that you should not return any students that have taken "Databases".

Question 4

Consider the following selected tables from some schema:

Bees:

name	cluster_no	job
Barry Benson	24601	7
Adam Flayman	47	7
Buzzwell	24601	7
Lou Lo Duca	42	7
Bob Bumble	42	7
Janet Benson	24601	8

Clusters:

cluster_no	established
42	2147483647
47	1630026000
24601	1658797200

In this question, we will explore how we can leverage properties of relational algebra to be more efficient.

For each relational query, write out the intermediate tables after each calculation step.
1. $\sigma_{cluster\_no=24601}(\sigma_{job=7}(bees)) \bowtie clusters$
2. $\sigma_{cluster\_no=24601}(\sigma_{job=7}(bees \bowtie clusters))$
How do the final tables for both queries compare?
Sum the number of rows for each query that you counted in (2). Which query required you to write the least number of rows along the way in total?
What qualities about your "least-expensive" query made it select less rows in total? Which operators were located where? Is there something particularly special about that?

Note

For any tabular data in this assignment, you may want to consider using an online table tool to help you save time when you type up your answers.

Feedback

As this is a new course, we appreciate any feedback you have for us! If you enjoyed this assignment, hated this assignment, or have other thoughts to share, please do so here!