Week 2 Python and Predictive Analytics

Brain Crunching

Week 2 Python and Predictive Analytics

Brain Crunching

Week 2 of my Micromasters in Predictive Analytics for Business Applications and we move on to looking at using Python for Predictive Analytics, it begins with some basic reminders of Python, then follows up with some utterly brain crunching summary statistics calculations. It’s been a while since i really looked at Python for anything at all and my knowledge runs more to the “i can use it a wee bit” more than the “i can programme the crap out of it” which is really where i’m heading to at the moment with this course.

It’s been a really interesting run through with it though and as this week has all been about learning I’ve been doing some activities and exercises to test it out and then once i have a correct solution the system provides the solution coded by the lecturers, which has been interesting to see as although on occasion we have used the same method, much of the time i have been coding a little bit more longhand.

My college Principal, Jackie, loaned me the book Coders: Who They Are, What They Think and How They Are Changing Our World by Clive Thompson, which i am slowly getting through – it turns out i’m a bit of a slow reader when trying to take everything in. In this book there are some sections about code and making code efficient and i think it resonated a bit with me during this as sometimes i have made my code go the long way around to get to something that can be done much quicker and simpler in less code.

I was a little concerned about this and had a discussion about this with those teaching the course but they were happy with the correct output and reminded me that the long way round is often easier to read and understand for someone coming behind you, so i felt less bad.

The following are a couple of examples of how my code was a little more long hand.

Create a Definition that will accept your name and calculate the length of your name in characters. … My Code:

first_name = ""
last_name = ""

def get_length_name(first_name, last_name):
    length = 0

    for letters in first_name:
        length = length + 1
        
    for letters in last_name:
        length = length + 1
    
    return length

output = get_length_name('Tom','Thomson')
    
print (output)

You can see from my code that what i was doing was looping through the first name adding the letters to the length variable, then looping through the last name and doing the same. i went for this way to do things as the lessons before had been looking at For and While loops.

The Taught Solution

first_name = ""
last_name = ""

def get_length_name(first_name, last_name):
    length = 0

    length = len(first_name)+len(last_name)
    
    return length

output = get_length_name('Tom','Thomson')
    
print (output)

This solution simply uses the len() feature to get the length of the first name and last name and add them together, a really simple solution here for something i really over complicated.

This one is a little more complicated, look at 2 dictionaries (lists) one featuring the organiser name and course they teach, the one featuring the course and how many points it is worth, Calculate the total number of points taught by a organiser. My Code :-

courses = {"Awesomeness": "Jackie", "Mega Awesomeness" : "Jackie", "Basic Awesomeness" : "Simon"}
points = {"Awesomeness": 15, "Mega Awesomeness": 10, "Basic Awesomeness":5}

def find_points_for_organizer(organizer, courses, points):
    total = 0
    for course, organiser in courses.items():
        if organizer == organiser:
            for ref, point in points.items():
                if ref == course:
                    total = total+point
    return total

output = find_points_for_organizer('Jackie', courses, points)
    
print (output)

My code first loops through the course and organiser list to find the course(s) that the organiser teaches, then uses that list in another for loop against the course and points dictionary then adds the points to the total variable.

The Taught Solution

courses = {"Awesomeness": "Jackie", "Mega Awesomeness" : "Jackie", "Basic Awesomeness" : "Simon"}
points = {"Awesomeness": 15, "Mega Awesomeness": 10, "Basic Awesomeness":5}

def find_points_for_organizer(organizer, courses, points):
    total = 0
    for course, organizer_c in courses.items():
            if organizer_c == organizer:
            total += points[course] 
    return total

output = find_points_for_organizer('Jackie', courses, points)
    
print (output)

This taught solution does the same thing as my code but a lot cleaner looking, using a single for loop then an if/else function to check. I also found the += element odd rather than the total = total + points[Course] that i would have used to do this. a nice learning curve though.

Summary Statistics

There was also a bit of in-depth summary statistics in Week 2 looking at mean, median, Interquartile Ranges etc and some of these are new to me and i found them a bit difficult to understand, i’m never a fan of using formula to express some of these things.

Calculation for Mean (or average)

For example this is the formula to calculate the mean (or average to us lay folk) of a field or variable and every time i look at it it makes me want to cry, whereas someone saying “calculate the average” i understand much easier.

The Median was one i was familiar with calculating, not something i am going to show the formula here as it really is horrific to look at.

Interquartile Range

This one calculates the interquartile range and i got really confused with this one. I know to calculate the median (which is the 0.50 version of one of these) is pretty easy and i thought these would be too but i kept getting the wrong answer as i was trying to take the point at 25% or 75% through the list, which isn’t actually the correct way to do it.

This Statistics How To shows ways of calculating statistics manually or through code on different platforms and i found the explanation below (for a set of even numbers) so much easier to understand and get through.

How to Calculate Interquartile Range in a data set of even numbers.

My way of calculating this was to look at the points at 25% and 75% through the list so for me the 25% fell between 5 and 7 (so 6) and the 75% fell between 15 and 16 (15.5) making a IQR of 9.5 and thus giving me an incorrect figure.

just a FYI i found the method described to calculate quartiles and IQR on Excel didn’t work as expected and i got a different solution, so be careful when using that to calculate IQR

It just shows that it pays to do some research around a subject, especially when you aren’t fully understanding the way in which it is being demonstrated, that really does help to find a different way of understanding that same solution, just like the code in Python, there are always different ways of doing things.

Until Next week ….

  • Prev Post
  • Next Post