Due at 9:00pm on 12/6/2018.

Starter Files

Download lab13.zip. Inside the archive, you will find starter files for the questions in this lab, along with a copy of the OK autograder.


When you are done, submit the lab by uploading the lab13.py file to okpy.org. You may submit more than once before the deadline; only the final submission will be graded.

  • To receive credit for this lab, you must complete Questions 2, 3, 4, 5, and 6 in lab13.sql (NOT lab13.py) and submit through OK.
  • The rest are questions that are considered extra practice. They can be found in the lab13_extra.sql file. It is recommended that you complete these problems on your own time.


If you haven't set up SQLite on your computer yet, please refer to the second half of Lab 12 for instructions on how to install and run the software needed to execute SQL files on your computer

Getting to Know Last Semester's Students

Last semester, Berkeley Computer Science students were asked to complete a brief online survey through Google Forms, which involved relatively random but fun questions. In this lab, we will interact with the results of the survey by using SQL queries to see if we can find interesting trends in the data.

First, take a look at data.sql and examine the tables defined in it. Note their structures. There are two tables you will be working with:

  • students: The main results of the survey. Each column represents a different question from the survey, except for the first column, which is the time of when the result was submitted. This time is a unique identifier for each of the rows in the table.

    Column Name Question
    time The unique timestamp that identifies the submission
    number What's your favorite number between 1 and 100?
    color What is your favorite color?
    seven Choose the number 7 below.
    • 7
    • You are not the boss of me!
    • I do what I want.
    • I'm a rebel
    • Choose this option instead.
    • YOLO!
    song If you could listen to only one of these songs for the rest of your life, which would it be?
    • "Hotline Bling" by Drake
    • "I Want It That Way" by Backstreet Boys
    • "Shake It Off" by Taylor Swift
    • "Baby" by Justin Bieber
    • "Sandstorm" by Darude
    • "Hello" by Adele
    • "Thinking Out Loud" by Ed Sheeran
    date Pick a day of the year!
    pet If you could have any animal in the world as a pet, what would it be?
    denero Choose your favorite photo of John Denero! (Options shown under Question 1)
    smallest Try to guess the smallest unique positive INTEGER that anyone will put!

  • checkboxes: The results from the survey in which students could select more than one option from the numbers listed, which ranged from 1 to 10 and included 2015, 9000, and 9001. Each row has a time (which is again a unique identifier) and has the value 'True' if the student selected the column or 'False' if the student did not. The column names in this table are the following strings, referring to each possible number: '0', '1', '2', '4', '5', '6', '7', '8', '9', '10', '2015', '9000', '9001'.

A time in students matches up with a time in checkboxes. For example, the row with time "11/11/2015 9:54:03" in students matches up with the row with time "11/11/2015 9:54:03" in checkboxes. These entries come from the same Google form submission and thus belong to the same student. We used time to uniquely identify each student, rather than using their name or email.

Note: If you are looking for your personal response within the data, you may have noticed that some of your answers are slightly different from what you inputted. In order to make SQLite accept our data, and to optimize for as many matches as possible during our joins, we did the following things to clean up the data:

  • number and smallest: If you did not input a number, we put the number -1 in as a placeholder.
  • color and pet: We converted all the strings to be completely lowercase.

You will write all of your solutions in the starter file provided. As with other labs, you can test your solutions with OK. In addition, you can use either of the following commands. You may need to refer to the Usage section to find the appropriate command for your OS:

sqlite3 < lab13.sql
sqlite3 --init lab13.sql

Question 1: What Would SQL print?

First, load the tables into sqlite3. If you're on Windows or Mac OS X (Mavericks or older), use the following command:

$ ./sqlite3 --init lab13.sql

If you're on Ubuntu or Mac OS X (Yosemite or newer), use:

$ sqlite3 --init lab13.sql

Before we start, inspect the schema of the tables that we've created for you:

sqlite> .schema

A schema tells you the name of each of our tables and their attributes. In general, you can think of a schema as a map that describes the logical entities and relationships of a database. Just as the outline of a book tells a reader the order and category in which content is organized, a schema details the organizational hierarchy of information within a database.

Let's also take a look at some of the entries in our table. There are a lot of entries though, so let's just output the first 20:

sqlite> SELECT * FROM students LIMIT 20;

If you're curious about some of the answers students put into the Google form, open up data.sql in your favorite text editor and take a look!

For each of the SQL queries below, decide to yourself and/or your partner what the query is looking for, then try running the query yourself and see!

sqlite> SELECT * FROM students; -- This is a comment. * is shorthand for all columns!
selects all records from students;
sqlite> SELECT color FROM students WHERE number = 16;
selects the color from students who said their favorite number was 16;
sqlite> SELECT song, pet FROM students WHERE color = "blue" AND date = "12/25";
selects the song and pet from students who said their favorite color was blue and picked December 25th;

Question 2: Obedience

To warm-up, let's ask a simple question related to our data: Is there a correlation between whether students do as they're told and their favorite image of John DeNero?


Write a SQL query to create a table that contains the columns seven (this column representing "obedience") and denero (the image students selected) from the students table.

You should get the following output:

sqlite> SELECT * FROM obedience LIMIT 10;
7|Image 1
7|Image 2
Choose this option instead.|Image 2
7|Image 5
Choose this option instead.|Image 5
7|Image 5
7|Image 2
7|Image 4
I'm a rebel|Image 4
YOLO!|Image 5
CREATE TABLE obedience as
SELECT seven, denero FROM students;

Use OK to test your code:

python3 ok -q obedience --local

Question 3: Go Bears! (And Dogs?)

Now that we have learned how to select columns from a SQL table, let's filter the results to see some more interesting results!

It turns out that Computer Science students have a lot of school spirit: the most popular favorite color was 'blue'. You would think that this school spirit would carry over to the pet answer, and everyone would want a pet bear! Unfortunately, this was not the case, and the majority of students opted to have a pet 'dog' instead. That is the more sensible choice, I suppose...

Write a SQL query to create a table that contains both the column color and the column pet, using the keyword WHERE to restrict the answers to the most popular results of color being 'blue' and pet being 'dog'.

You should get the following output:

sqlite> SELECT * FROM blue_dog;
CREATE TABLE blue_dog as
SELECT color, pet FROM students WHERE color = 'blue' AND pet = 'dog';

Use OK to test your code:

python3 ok -q bluedog --local

Question 4: The Smallest Unique Integer

Who successfully managed to guess the smallest unique integer value? Let's find out!

Unfortunately we have not learned how to do aggregations, which can help us count the number of times a specific value was selected, in SQL just yet. As such, we can only hand inspect our data to determine it. However, an anonymous elf has informed us that the smallest unique value is greater than 6!

Write a SQL query with the columns time and smallest to try to determine what the smallest integer value is. In order to make it easier for us to inspect these values, use ORDER BY to sort the numerical values, and LIMIT your result to the first 20 values that are greater than the number 6.

The first 5 lines of your output should look like this:

sqlite> SELECT * FROM smallest_int LIMIT 5;
11/11/2015 10:01:03|7
11/11/2015 13:53:36|7
11/11/2015 14:52:07|7
11/11/2015 15:36:00|7
11/11/2015 15:46:03|7
CREATE TABLE smallest_int as
SELECT time, smallest FROM students WHERE smallest > 6 ORDER BY smallest LIMIT 20;

Use OK to test your code:

python3 ok -q smallest-int --local

After you've successfully passed the OK test, actually take a look at the table smallest_int that you just created and find the smallest unique integer value!

To do this, try the following:

$ sqlite3 --init lab13.sql
sqlite> SELECT * FROM smallest_int; -- No LIMIT this time!


We can use joins to include rows from another table that satisfy the where predicate. Joins can either be on different tables, or the same table if we include an alias. Here we are referencing the football table twice, once as the alias a and once as the alias b.

sqlite> SELECT a.Berkeley - b.Berkeley, a.Stanford - b.Stanford, a.Year, b.Year
...>        FROM Football as a, Football as b WHERE a.Year > b.Year;

What is this query asking for?

You may notice that it does not seem like we actually performed any operations to do the join. However, the join is implicit in the fact that we listed more than one table after the FROM. In this example, we joined the table Football with itself and gave each instance of the table an alias, a and b so that we could distinctly refer to each table's attributes and perform selections and comparisons on them, such as a.Year > b.Year.

One way to think of a join is that it produces a cross-product between the two tables by matching each row from the first table with every other row in the second table, which creates a new, larger joined table.

Here's an illustration of what happened in the joining process during the above query.


From here, the select statement examines the joined table and selects the values it desires: a.Berkeley - b.Berkeley and a.Stanford - b.Stanford but only from the rows WHERE a.Year > b.Year. This prevents duplicate results from appearing in our output!

Question 5: Sevens

Let's take a look at data from both of our tables, students and checkboxes, to find out if students that really like the number 7 also chose '7' for the obedience question. Specifically, we want to look at the students that fulfill the below conditions and see if they also chose '7' in the question that asked students to choose the number 7 (column seven in students).

  • reported that their favorite number (column number in students) was 7
  • have 'True' in column '7' in checkboxes, meaning they checked the number 7 during the survey

In order to examine rows from both the students and the checkboxes table, we will need to perform a join.

How would you specify the WHERE clause to make the SELECT statement only consider rows in the joined table whose values all correspond to the same student? If you find that your output is massive and overwhelming, then you are probably missing the necessary condition in your WHERE clause to ensure this.

Note: The columns in the checkboxes table are strings with the associated number, so you must put quotes around the column name to refer to it. For example if you alias the table as a, to get the column to see if a student checked 9001, you must write a.'9001'.

Write a SQL query to create a table with just the column seven from students, filtering first for students who said their favorite number (column number) was 7 in the students table and who checked the box for seven (column '7') in the checkboxes table.

You should get the following output:

sqlite> SELECT * FROM sevens;
Choose this option instead.
I do what I want.
Choose this option instead.
You are not the boss of me!
Choose this option instead.
Choose this option instead.
You are not the boss of me!
CREATE TABLE sevens as
SELECT s.seven FROM students as s, checkboxes as c WHERE s.number = 7 AND c.'7' = 'True' AND s.time = c.time;

Use OK to test your code:

python3 ok -q sevens --local

Question 6: Matchmaker, Matchmaker

Did you study Computer Science with the hope of finding your soul mate? Well you're in luck! With all this data in hand, it's easy for us to find your perfect match. If two students want the same pet and have the same taste in music, they are clearly meant to be together! In order to provide some more information for the potential lovebirds to converse about, let's include the favorite colors of the two individuals as well!

In order to match up students, you will have to do a join on the students table with itself. When you do a join, SQLite will match every single row with every single other row, so make sure you do not match anyone with themselves, or match any given pair twice!

Hint: You may want to enforce a sort of "ordering" on the column time (which is a unique identifier) from your joined tables, in order to do the above correctly.

Write a SQL query to create a table that has 4 columns:

  • The shared preferred pet of the couple
  • The shared favorite song of the couple
  • The favorite color of the first person
  • The favorite color of the second person

You should get the following output:

sqlite> SELECT * FROM matchmaker LIMIT 20;
dog|Shake It Off|blue|pink
dog|Shake It Off|blue|blue
dog|Shake It Off|blue|blue
dog|Shake It Off|blue|red
dog|Shake It Off|blue|orange
dog|Shake It Off|blue|blue
dog|Shake It Off|blue|blue
dog|Shake It Off|blue|blue
john denero|Hotline Bling|teal|orange
john denero|Hotline Bling|teal|seafoam
dragon|Sandstorm|dark blue|blue
dragon|Sandstorm|dark blue|blu
dog|Hotline Bling|grey|red
dog|Hotline Bling|grey|green
CREATE TABLE matchmaker as
SELECT a.pet, a.song, a.color, b.color FROM students AS a, students AS b WHERE a.time < b.time AND a.pet = b.pet AND a.song = b.song;

Use OK to test your code:

python3 ok -q matchmaker --local

Extra Questions

The following questions are for extra practice — they can be found in the lab13_extra.sql file. It is recommended that you complete these problems as well, but you do not need to turn them in for credit.

The COUNT aggregate

Recall how finding the smallest integer anyone chose was rather painful, because we could not simply count up how many times each integer was chosen by anyone.

Bring in SQL aggregation, which is commonly used to aggregate values in order to answer these types of questions!

In order to perform SQL aggregation, we need to group rows in our table by one or more attributes. Once we have groups, we can aggregate over the groups in our table and find things like the maximum value (MAX), the minimum value (MIN), the number of rows in the group (COUNT), the average over all of the values (AVG), and more! SELECT statements that use aggregation are marked by two things: an aggregate function (MAX, MIN, COUNT, AVG, etc.) and a GROUP BY clause. For example:

sqlite> SELECT song, MAX(number) FROM students GROUP BY song;
Hotline Bling|100
Shake It Off|98
That Way|100
Thinking Out Loud|99

This SELECT statement groups all of the rows in our table students by song. Then, within each group, we perform aggregation by MAXing over the attribute number. By selecting song and MAX(number), we then can see the highest number any student chose for any given song.

Question 7: The Smallest Unique Integer (Part 2)

Now, let's revisit the previous problem of finding the smallest integer that anyone chose, and take a closer look at the COUNT aggregate.

Just like MAX above, we can select a COUNT of some attribute in our query after grouping the rows in our table by some attribute.

Write a SQL query that uses the COUNT aggregate to create a table that pairs the attribute smallest with the number of times it was chosen by a student (this is the aggregation part). In order to cut out the people who chose not to respond, and the sneaky cheaters that tried to put small non-integer values, limit your results to the number 1 and greater!

Hint: You may find that there isn't a particular attribute you should have to perform the COUNT aggregation over. If you are only interested in counting the number of rows in a group, you can just say COUNT(*).

Hint: Think about what attribute you need to GROUP BY.

After you've defined your table, you should get something like:

sqlite> SELECT * FROM smallest_int_count LIMIT 10;
CREATE TABLE smallest_int_count as
SELECT smallest, COUNT(*) FROM students WHERE smallest >= 1 GROUP BY smallest;

Use OK to test your code:

python3 ok -q smallest-int-count --local

It looks like the number 9 only had one person choose it! Were you the lucky student that put it down?