Lab 12

In today’s lab, you have the chance to learn how to use some Python packages that might be useful in the future!

We’ll start with a brief explanation of what a package is and how to install one in PyCharm. Then, the rest of the lab will consist of a number of explanations and practice problems meant to familiarize you with numpy and matplotlib, which are popular Python packages.

What’s a package?

First, some terminology:

There are hundreds of thousands of Python packages available online. Some are so commonly used that you’ll find them in almost every large-scale Python application; others serve highly specific purposes.

PyCharm makes it very easy to download and use packages in your projects.

On a Mac computer: go to PyCharm --> Preferences --> Project Interpreter and press the + button.

On a Windows computer: go to File --> Settings --> Project --> Project Interpreter and press the + button.

This will bring up a list of common packages; search for the one that you want and press “Install Package”.

Section 1: Numpy

Numpy is a widely used library for manipulating numbers, vectors, and matrices (many other libraries are built on top of Numpy). Numpy has great support for manipulating vectors with arbitrary numbers of dimensions. We’ll go through a couple examples of problems where Numpy might be helpful. Note that a lot of this can be done in Python, without Numpy, but Numpy tends to make math-related computations easier and more efficient.

It might be helpful to look at documentation here. And in general, googling the name of a function should turn up some helpful documentation.

Getting started

Once you’ve installed Numpy, include import numpy as np at the top of your Python file.

Part 1

Numpy works with arrays which are essentially lists that can be one, two, or more dimensions. Let’s get our feet wet by creating some Numpy arrays. For each of these problems, assign the array to a variable and then print the array (no need to write functions):

  1. Make the array [0, 1, 2] with np.array
  2. Make the array [0, 1, 2] with np.arange
  3. Make the array [0, 0, 0] with np.zeros
  4. Make the matrix [[0, 1], [2, 3]] with np.array
  5. Make the matrix [[0, 1], [2, 3]] with np.arange and np.reshape.
    Note that the ‘shape’ of an array is essentially its dimensions. The above matrix has two rows and two columns, so its shape is (2, 2).
  6. Print the shape of the matrix from the previous part with .shape.
    Note that if v is an array, v.shape will return a tuple that contains its shape.
  7. Print the number of rows in the matrix from part 5 with .shape.
    Note that the number of rows is the first element of the shape tuple.

Note: For these last three functions, we’re relying on you to use the numpy documentation to figure out what parameters each of these functions take in! Ask a TA if you get stuck.

  1. Use .linspace to produce a list of evenly spaced numbers (“linear spacing”). Use the documentation to figure out what parameters you should pass into .linspace to produce the list [0, 0.5, 1, 2, 2.5, 3] and then make your own list using a different spacing.
  2. Use .random.rand to make an array of random numbers that has 3 rows and 4 columns.
  3. Oops! Turns out we wanted an array of random numbers that has 4 rows and 3 columns! Use .transpose to flip the array from question 9 above. Ask a TA if you are unsure about what this function does.

Part 2

Great, now let’s use some Numpy functions to manipulate arrays. Write a couple example vectors (these can be small) and print the result of each of these computations. Again, no need to write functions here.

  1. Take the average of elements in a vector (np.average)
  2. Add two vectors element-wise, which means that the each element in the result vector is the sum of the corresponding elements in two vectors.
    Note that you can add vectors element-wise with ‘+’!
  3. Take the sum of elements in a vector (np.sum)
  4. Take the square root of a number (np.sqrt)

Part 3

Nancy Drew is on her last case. She received a tip that someone was counterfeiting money and needs to figure out which bills are real and which are fake. It turns out that the counterfeiter didn’t do a great job with the serial numbers on the bills, making the numbers either way too low or way too high. Nancy needs to log the serial numbers of some bills and pick out those with serial numbers that appear to be outliers, since these are more likely to be the fakes!

Nancy needs you to tie everything together and write a function that does a substantial mathematical computation: finding outliers!

One simple way of finding outliers is to mark each element that’s more than two standard deviations from the mean as an outlier. Write a function that takes in a one-dimensional numpy array and returns a list of outliers using this method.

np.abs might be helpful. You can also use the first formula for standard deviation in this Wikipedia article.

Also make sure to verify that your function works by inputting an array that clearly has a couple outliers.

If you’re excited about learning more about Numpy, we recommend the following tutorials:

There are also some Numpy image manipulation tutorials linked in the Pillow section of the lab.

Section 2: Matplotlib

Setup

Matplotlib is a Python package used for plotting! It gives access to the same kind of plotting functions we used in Pyret, except with more power (since we’re in Python). Now that you know how to use lists, plotting will be much more flexible (before we were stuck with tables).

Get started to use matplotlib by installing it in Pycharm (same process as you used for numpy), then importing it with the following line:

import matplotlib.pyplot as plt
import numpy as np
import math
import csv

This imports the pyplot function from the matplotlib module, and renames it plt so you don’t have to write matplotlib.pyplot every time you want to plot something.

We are also importing numpy (and calling it np), as well as math (to help us do mathy calculations) and csv (to help us deal with csv data files).

Throughout these questions, you will need to use the online documentation for matplotlib. Try to figure out how to Google the approaches to these questions (for example, “how to plot a line in matplotlib”). The matplotlib official documentation and stackoverflow are both great sources.

Here’s a page to get started: https://matplotlib.org/api/_as_gen/matplotlib.pyplot.plot.html

Part 1: Simple Plotting

Sherlock is working on a logo for his new cyber-detective business. He likes things simple, so he wants you to make a (hollow) red rectangle with a (hollow) green triangle inside of it.

Using the plt.plot() function (imported above), make Sherlock his business card logo. Show the plot by using plt.show()!

You can use plt.plot multiple times to build up a plot with multiple lines before using plt.show(), but once you use plt.show(), the plot will be erased and plt.plot starts from scratch.

Part 2: Timeseries and Scatter Plotting

Part 2a: Timeseries

The plt.plot function is great for looking at trends over time. Download this csv file and load it using the csv module (which you can figure out by Googling), or by using tools from numpy (look up “loading a text file in numpy”).

This dataset gives the day of the year and the number of hours of daylight (for Providence RI). For example, day 350 has 9.054 hours of daylight, whereas day 173 has 15.014 hours.

  1. Plot hours per day versus day.
  2. Give the plot a title and informative labels on the axes.
  3. Show Sherlock the plot using plt.show()

NOTE: Be careful about types! You cannot plot a list of strings.

Part 2b: Marathon plot

The plt.scatter function is good for looking at trends in observations.

Download the data from the 2017 Boston Marathon here. It contains one row for each runner, with the first column being age (in years) and the second being how long it took that runner to complete the marathon (in minutes).

  1. Again use csv to load the data, then plt.scatter() to plot it. Put a title and labels on the axes. Is the data surprising to you, or what you expected?

Note: You may want to pass s=2 to plt.scatter so the data is more viewable: plt.scatter(..., ..., s=2)

  1. Use StackOverflow to figure out how to fit a line of best fit to this scatter plot!

Part 2c: Something fishy…

With your new scatter plotting skills, let’s try graphing something a little more complex.

Download this data from NOAA. This data contains commercial harvest data of three tuna species (Pacific Bluefin, Skipjack and Bigeye). The first column is the year, the second, third and fourth are the total pounds of each respective species harvested that year.

  1. Just like before, use csv to load the data, then plot the harvest data for each species on a single graph. Give your graph a title and lables for each line. Do you notice any trends in the data?

If you’re not sure how to plot multiple datasets on a single graph, try Google!

Part 3: Math Plotting

Part 3a: Mathy Math Fun Time Surprise

Try to use numpy arrays for this plotting. matplotlib accepts numpy arrays input to pyplot.plot.

  1. Plot these functions on top of each other for x between 0 and 5.

  2. Make a legend so it’s clear which function is which. You should only need plt.plot, not any of matplotlib's other functions.

Note: Python’s ** operator may be useful; it is the exponentation operator.

>>> 2 ** 3
8
>>> 4 ** 0.5 # (a ^ 0.5) power is the same as square root of a
2.0

HINT 1: You cannot plot a function in most programming languages; it’s impossible to apply the function to every single point in some domain. So you need to choose a range of x values, then apply the function to those x values. For example, x is all numbers between 0 and 5 with a step size of 0.05.

HINT 2: You may find map helpful for these problems.

Part 3b: Subplots

Look up matplotlib subplots and use them to arrange the first four plots from part 3 in a 2 by 2 grid. Be sure to put a title on each subplot so that you can tell which function is which.

plt.tight_layout() might be helpful for getting everything to look pretty.

Part 4: Tea time (A bit mathy)

Newton’s law of cooling states that the an object cools faster the colder its surroundings (and vice versa). So if you put an icecube in a pot of boiling water, it’ll melt faster than if you put it into a cup of tapwater. But a few seconds later, once the temperature of the ice cube has increased a bit, it’ll warm less quickly than before.

In any given second, you can write the change in temperature of an object (ΔT(t)) as some constant (which depends on the material of the object) multiplied by the difference between the current temperature of the object (T(t)) and its surroundings.

ΔTobj(t)=c(Tobj(t)Tsurroundings)

Tobj is written as a function of t since the temperature of the object relies on the time elapsed.

Here’s how you would want to write this programmatically:

Tnew=Toldc(ToldTsurroundings)

OK, now the question:

Sherlock has a cup of tea that starts off at 95 degrees (Celcius), and wants to drink the tea. He wants to see the trajectory of the temperature, and wants to know the temperature after 3 minutes (180 seconds) have elapsed.

It’s summer time, so things are pretty toasty. Set Tsurroundings=40. The constant c for tea has been experimentally found to be 0.02.

First, sketch out what you think the temperature of the tea will look like over the first 3 minutes. Then, use matplotlib to do so, so Sherlock can determine whether the tea is cool enough to drink or not.

Remember, you will need a list of x-coordinates and y-coordinates to do make the plot. The x-coordinates should correspond to the second, and the y-coordinates to the temperature of the tea in that second.

If you’re excited about Matplotlib, check out the following links:

So long, farewell

As this is the final lab, all of us at the detective agency want to bid you all adieu. We have learned so much from y’all, and hopefully, you have learned during your time at the agency as well! We are confident that you are all ready to take on the world.

As the curiosity and mysteries never cease, neither shall you.
– Douglas’s Detective Agency, 2019

You are all stars. Don’t ever let the fire in you stop burning.
– Beep and Boop, 2018