Activity 3-2

Plotting Data

In this activity, you'll be learning to visualize your data using the plotting library Plotly. Plotly is very powerful; this activity will be a basic introduction, but if you're interested in doing more complicated visualizations, we'll be going more in depth at the end of the semester.

Task 1

To get set up, we'll first need to get the Plotly library. You should be able to do this by opening up your terminal or command line and typing:

conda install plotly

Conda is a Python package manager included with Anaconda. It helps find Python modules that we don't currently have installed on our computer and installs them. This command should show a bunch of messages, including a list new packages to install and additional packages to update. It will ask you if you wish to confirm the installation. Press enter to let Anaconda install these packages.

This should print out a bunch of messages, ending with something along the lines of Linking packages COMPLETE. If you instead see an error message or red text, please call over a TA.

Download this stencil code and rename it with your name. It already provides a main function and a couple of lists of values for us to plot.

We'll start by plotting a bar chart of Kung Fu Panda character heights. Define a function called create_bar_chart() and call it from your main function.

Task 2

In order to create basic graphs in Plotly, we need to import Plotly graph objects: the set of functions that lets us create the types of data that Plotly can display. Place the following import statement at the top of your program:

import plotly.graph_objs as go

You can use the as keyword to rename any module on import. The syntax as go means that we're renaming the module from its original, unwieldy title plotly.graph_objs to go.

Before creating the actual plot, we first must create a Layout object, which allows us to set a title for the entire graph, among other features. Place the following code in your bar chart function, but replace the title with something that succinctly describes the plot. Also replace the yaxis title value with an appropriate descriptor for the y-axis values.

layout = go.Layout(title="NAME OF CHART GOES HERE", yaxis={'title': "Y-AXIS LABEL GOES HERE"})

Next, we create a graph object for the bars themselves. go.Bar takes in the x and y values as parameters. The x-axis is always the horizontal axis and the y-axis is the vertical axis. Put the following code in your program, replacing x_axis_values with the character names list and the y_axis_values with character height.

bars = go.Bar(x=x_axis_values, y=y_axis_values).

To put the bars and the layout together, we create a third graph object, a Figure object. Note that there are square brackets around bars object. The data for a figure is a list of graph objects: in this case, a one item list. If we wanted to plot their ages in the same plot, we would need to create another go.Bar object and place it in the data argument list.

fig = go.Figure(data=[bars], layout=layout)

Task 3

Now that we have created a figure, we can use Plotly's functions to display it. Add the following import statement to the top of your code:

from plotly.offline import plot

This sets us up to use Plotly's offline option, which lets you save plots to an html file without setting up an account at plot.ly. To actually plot the data, we simply call plot(fig) which has a single input parameter, the figure we defined above.

The plot function, like print(), does not need to be assigned to a variable because it displays the chart immediately.

plot(fig)

Run your program. An html file should have opened up with your new plot!

Task 3

In this task, we'll use Pandas to import a csv and plot it with Plotly. Download this csv file. This is a file containing blockbluster films, their budgets, grosses, and a column describing whether the films passed the Bechdel test. As described in Wikipedia, "this test asks whether a work of fiction features at least two women or girls who talk to each other about something other than a man or boy." It's a very low bar about film casting diversity.

Create a new function called create_scatter_line_plot and in this function, read the csv file using Pandas as you did in activity 3-1. Remember to call this new function from main()

Now let's make a scatter plot of film grosses versus film budgets (that is, film budgets on the x axis). This process is exactly the same as plotting a bar chart except that it uses a different type of graph object. Instead of creating bars, we use go.Scatter.

Place the following line of code in your new function and replace x_values and y_values with the columns from the DataFrame you want to use in your scatter chart. Here the x values should be the adjusted 2013 budget ("budget_2013$") and the y values should be the international adjusted gross ("intgross_2013$").

scatter = go.Scatter(x=x_values, y=y_values, mode="markers")

Here the parameter mode tells the scatter object how to plot our data. We can choose "markers", "lines", or "lines+markers" to plot accordingly.

Create a layout and a figure object like we did with the bar chart. Run your program to display your new plot. Importantly, your Layout object should also include a xaxis parameter to specify the label of the x-axis. It requires a dictionary just like the yaxis parameter we created with the bar chart.

Task 4

You can display multiple series of data on the same set of axes by simply creating different graph objects. Then combine them when you are creating the Figure object. For this task, you will create two Scatter objects, the first should contain the x and y values for the movies that passed the Bechdel test and the second should contain the values for movies that did not.

Remember, to create a new pandas DataFrame that consists of only the rows with movies that passed you can write:

data_passed = data_frame[data_frame["binary"] == "PASS"]

Additionally, create a DataFrame object that only contains the rows that did not pass

When placing multiple series on the same chart, it is important to give the graph objects a name, which you can do by specifying the name parameter. This will be the label shown in the plot legend. For example, for the movies that passed the Bechdel test, you could give them the name, "Passed" as shown in the code below. Also be sure to replace the x_values and y_values objects with the columns from the filtered DataFrame objects that you just created.

passed = go.Scatter(x=x_values, y=y_values, mode="markers", name="Passed")

This will automatically place the name of the series in a legend when we plot the data. Create a similar Scatter object for the films that did not pass.

Creating the Figure object now requires adding both Scatter objects to the Figure object like this:

fig = go.Figure(data=[passed, not_passed], layout=layout)

Re-create your plot with both these series of data plotted

Task 5

You'll notice that hovering your mouse over distinct data points provides information about individual points. However, for this data, the budget and gross aren't terribly helpful. We'd like to include the name of the film!

Adding meaningful text labels just requires adding a text parameter with a collection of strings the same length as the number of data points. Replace film_titles with the 'title' column from your DataFrame object

passed = go.Scatter(x=x_values, y=y_values, \
    mode="markers", name="Passed", text=film_titles)

Run your program to see your plot now with useful hover labels!

Task 6

We can modify the size and color of the markers used in the scatter plot, by providing an additional marker parameter when creating the Scatter object.

Using the code below, replace the variable marker_size with a float value. Something small like 1 or 2 will make individual markers harder to see, but will clear up dense cluster. A larger value like 10-20 will make individual markers much larger, but make the clustering issue worse

For coloring the markers, plotly recognizes any named html color. Replace the marker_color variable with colors of your choice.

passed = go.Scatter(x=x_values, y=y_values, \
    mode="markers", name="Passed", text=film_titles, \
    marker={'size': marker_size, 'color' : marker_color})

Run your program to see your updated results

If you have extra time

Work through some of the following examples at plot.ly. When working through these examples, change iplot to plot and modify the imports so that you're working in offline mode.


Once you're done, please check off your lab with a TA or share your file with cs0030handin@gmail.com by midnight, 4/13.