CSCI 0050: Python Notes, 7/31/19¶

From office hours: defying expectations¶

We'll start by defining a small program that doesn't work how we want.

from dataclasses import dataclass
from typing import List
from datetime import date

L = [1, 2, 3, 4]

for e in L:
    if e == 2 or e == 3:
        L.remove(e)

# expect [1, 4]
print(L)

[1, 3, 4]

Huh... we're getting a list of [1, 3, 4], but we thought we programmed to remove 2 and 3!

We can write this weird behavior as a function to better see what is going on. Students had a couple hypotheses about what was happening.

Stopping at the first instance of true (due to or?)
3 gets skipped becuase list shifts after .remove

from dataclasses import dataclass
from typing import List
from datetime import date

#L = [1, 2, 3, 4]

def mystery(L: List):
    for e in L:
        if e == 2 or e == 3:
            L.remove(e)
    return L

# expect [1, 4]
print(mystery([1, 2, 3, 4]))

[1, 3, 4]

We first tried writing some test cases:

print(mystery([1, 2, 3, 4]))
print(mystery([1, 3, 2, 4]))
print(mystery([1, 2, 3, 4, 3, 2, 1, 3]))

[1, 3, 4]
[1, 2, 4]
[1, 4, 2, 1, 3]

Python Memory

label	slot
1021	`1`
1022	`2`
1023	`3`
1024	`4`

Now when we run remove, we could imagine it looking something like:

Python Memory

label	slot
1021	`1`
1022
1023	`3`
1024	`4`

But Python can't leave the list like this! We need to fix up our list by shifting things up.

Python Memory

label	slot
1021	`1`
1022	`3`
1023	`4`
1024

However, Python has already checked location 1022 so the 3 gets skipped over.

Tl;dr Don't modify the data structure you're using a for loop on while you're for looping over it. To get around this, just build up a new list.

Gathering votes from the prompt¶

Consider if we were to want to write code to gather votes entered at a prompt while the program is running. We first need to introduce you to a couple new ideas that will help with this. The first of these new things is input().

This will create a prompt for a user to input data. This gets collected as a string.

input("enter a number \n")

enter a number 
56

'56'

This creates some problems if we were to want to do math with user input. Such that the below creates a TypeError since you're trying to add a string to a number.

4 + input("enter a number \n")

enter a number 
56

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-6-51f426980be2> in <module>
----> 1 4 + input("enter a number \n")

TypeError: unsupported operand type(s) for +: 'int' and 'str'

We can get around this using type conversions. You can use a type name to convert from one type to another. See below:

4 + int(input("enter a number \n"))

enter a number 
56

60

Alright. We know how to gather one input from a user, but our hypothetical wanted to collect and collate votes from a number of different users. When you're asking people to input, you generally give them an escape. Below we can illustrate doing this with the string done.

votes = []

def cast_vote():
    """record vote and repeat, unless user enters 'done'"""
    v = input("Enter your vote: \n")
    if v != "done":
        votes.append(v)
        cast_vote()
        
cast_vote()
print(votes)

Enter your vote: 
A
Enter your vote: 
A
Enter your vote: 
B
Enter your vote: 
C
Enter your vote: 
done
['A', 'A', 'B', 'C']

This is all stuff we've seen before, using a repeated call to the function. The difference here is that we needed some special signal to indicate "we're done with data input".

However, this is not how a Python programmer would program this. Python programmers would rather do something similar to a for loop. We want to have this program run for a while before it terminates (see what we did there?)

Programming with uncertainty: the `while` loop¶

If we don't know upfront the length of our data, the while loop is a good choice to let us emulate the behavior above in a more elegent way. We can do this like so:

votes2 = []

v = input("Enter your vote: \n")
while v != "done":
    votes2.append(v)
    v = input("Enter your vote: \n")
    
print(votes2)

Enter your vote: 
A
Enter your vote: 
A
Enter your vote: 
B
Enter your vote: 
C
Enter your vote: 
done
['A', 'A', 'B', 'C']

Getting data in to Python: using `csv.reader`¶

In the real world, people will not give you nicely formated data. You'll likely have to deal with .csv filetypes. To work with this kind of data type, you'll need to be able to read that into your code.

Roughly what we want to do is tell Python to feed us rows one at a time so we can construct data as we want it. We can use this to illustrate the differences between for and while loops:

for loops: used when you have a known and fixed amount of data
while loops: used when the size of data can't be known up front

In the case of a .csv file, there is a predictable amount of data. Python can look and know that for a given .csv, there is a specific number of rows. We will therefore be using for loops to read in a .csv file.

The .csv file we are using in these notes can be found here. Make sure you put this file in the same folder as your code.

from datetime import date
import csv

@dataclass
class Reading:
    type: str
    when: date
    level: float
    location: str
        
# open csv file
with open("weather.csv", newline='') as csvfile:
    weatherreader = csv.reader(csvfile, dialect='excel', delimiter=",")
    # convert each row to a Reading
    for row in weatherreader:
        print(row)

['\ufeffDate', 'Place', 'Type', 'Data']
['1/6/16', 'Houston', 'ozone', '1']
['1/7/16', 'Houston', 'ozone', '0.07']
['1/8/16', 'Houston', 'ozone', '0.5']
['1/9/16', 'Houston', 'ozone', '0.23']
['1/10/16', 'Houston', 'ozone', '0.6']
['1/11/16', 'Houston', 'ozone', '0.42']
['1/12/16', 'Houston', 'ozone', '0.3']
['1/15/16', 'Houston', 'ozone', '0.32']
['1/16/16', 'Houston', 'ozone', '0.35']
['1/17/16', 'Houston', 'ozone', '0.2']
['1/18/16', 'Houston', 'ozone', '0.08']
['1/19/16', 'Houston', 'ozone', '0.09']
['2/1/16', 'Houston', 'ozone', '0.4']
['2/3/16', 'Houston', 'ozone', '0.42']
['2/4/16', 'Houston', 'ozone', '0.43']
['2/5/16', 'Houston', 'ozone', '0.53']

The above illustrates a proof of concept--that Python recognizes the file and can show us the data contained within. We now need to do some cleanup. Chiefly this includes:

Get rid of header row
Convert strings with slashes to dates

from datetime import date
import csv

@dataclass
class Reading:
    type: str
    when: date
    level: float
    location: str
        
# need a list to store readings in
readings = []
        
# open csv file
with open("weather.csv", newline='') as csvfile:
    weatherreader = csv.reader(csvfile, dialect='excel', delimiter=",")
    # convert each row to a Reading
    for row in weatherreader:
        # we're putting off doing that date for now, just set something up to test
        readings.append(Reading(row[2], date(2000, 1, 2), row[3], row[1]))
  
# print out first 2 elements to look
print(readings[0])
print(readings[1])

Reading(type='Type', when=datetime.date(2000, 1, 2), level='Data', location='Place')
Reading(type='ozone', when=datetime.date(2000, 1, 2), level='1', location='Houston')

This illustrates some of the downsides of Python not having a type checker. We want our level to be a float value, but it is being read in as a string! We need to fix this by converting to a float. We can convert, we can use float(). However, we will have an issue trying to convert "Data" (from the header row) to a float. We can get rid of the header by skipping over it with the line next(weatherreader).

from datetime import date
import csv

@dataclass
class Reading:
    type: str
    when: date
    level: float
    location: str
        
# need a list to store readings in
readings = []
        
# open csv file
with open("weather.csv", newline='') as csvfile:
    weatherreader = csv.reader(csvfile, dialect='excel', delimiter=",")
    # skip over header row
    next(weatherreader)
    # convert each row to a Reading
    for row in weatherreader:
        # we're putting off doing that date for now, just set something up to test
        readings.append(Reading(row[2], date(2000, 1, 2), float(row[3]), row[1]))
  
# print out first 2 elements to look
print(readings[0])
print(readings[1])

Reading(type='ozone', when=datetime.date(2000, 1, 2), level=1.0, location='Houston')
Reading(type='ozone', when=datetime.date(2000, 1, 2), level=0.07, location='Houston')

Now we can focus on collecting and inputing the date value. The date function needs dates input as date(2016, 1, 18); however, our data provides date info as 1/18/16. We can pull out some info using .split() where, given a character, we can split up data between occurances of that character. An example would be:

"1/5/16".split("/")

['1', '5', '16']

from datetime import date
import csv

@dataclass
class Reading:
    type: str
    when: date
    level: float
    location: str
        
# need a list to store readings in
readings = []
        
# open csv file
with open("weather.csv", newline='') as csvfile:
    weatherreader = csv.reader(csvfile, dialect='excel', delimiter=",")
    # skip over header row
    next(weatherreader)
    # convert each row to a Reading
    for row in weatherreader:
        # convert dates
        date_list = row[0].split("/")
        new_date = date(int("20" + date_list[2]), int(date_list[0]), int(date_list[1]))
        # we're putting off doing that date for now, just set something up to test
        readings.append(Reading(row[2], new_date, float(row[3]), row[1]))
  
# print out first 2 elements to look
print(readings[0])
print(readings[1])

Reading(type='ozone', when=datetime.date(2016, 1, 6), level=1.0, location='Houston')
Reading(type='ozone', when=datetime.date(2016, 1, 7), level=0.07, location='Houston')

Now, we should also know how to print out a .csv file. Using our votes construct from before.

# we will continue this function tomorrow
votes = ["a", "b", "a", "a", "c", "d", "b"]

with open('election.csv', 'w', newline='') as csvfile:
# election.csv indicates the name of the new file
# 'w' indicates we will be writing to the file
#  newline = '' indicates how we want to end a line
    voteswriter = csv.writer(csvfile, delimiter = ',')
    for cand in list(set(votes)):
        # list(set(some_list)) == L.distinct(some_list) from Pyret
        voteswriter.writerow(cand, votes_for(cand, voteslist))

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-30-2fec7cd0c8aa> in <module>
      7     voteswriter = csv.writer(csvfile, delimiter = ',')
      8     for cand in list(set(votes)):
----> 9         voteswriter.writerow(cand, votes_for(cand, voteslist))

NameError: name 'votes_for' is not defined

CSCI 0050: Python Notes, 7/31/19¶

From office hours: defying expectations¶

Gathering votes from the prompt¶

Programming with uncertainty: the while loop¶

Getting data in to Python: using csv.reader¶

Programming with uncertainty: the `while` loop¶

Getting data in to Python: using `csv.reader`¶