Week 1 Introduction To Predictive Modelling


Week 1 Introduction To Predictive Modelling


The advantages of working for an Educator are numerous but one of them is most definitely the willingness to support continuing professional development and with that a huge thanks from me to the whole senior team at West Lothian College for supporting me in beginning this Micromasters in Predictive Analytics for Business Applications

The college are looking to leverage Predictive Analytics in the future supporting the students, and prospective students, to get the best out of their time at the college, and it is a definite move forward into analytics that i want to see personally.

The course is a year long, made up of four, six week, blocks followed by a six week final project and counts as 30 credits towards a Masters Degree at the University of Edinburgh.

I have no doubt that it’s going to be a challenge that i can rise to and i am really looking forward to getting through this and my plan is to blog some of what is going on as i go along, as long as i have enough time.

Week One

So week one of the Introduction to Predictive Analytics began with some discussion about what predictive analytics is and how it could be used and an overview of the types of models that can be used in predictive analytics.

The first discussion was asking some questions on what you would collect, and why, for an example of predicting visitor demand at Edinburgh Castle, below are my answers to this.

What information do we need to find?

Date, time, Name, Age, home address / where visitor from, reason for visit, who travelling with (eg on own, a couple, friends, family group, school group etc), planned or ad hoc visit, visitor experience rating (did you have a good time?), weather conditions, what did they buy.

Where will you collect this information?

There are several places this could be collected – online from visitors buying tickets in advance, at the ticket office when buying in advance, visitor questionnaires (less likely to get a full picture though), also possibly an Exit questionnaire (that would allow you to find if they had a good time – those that do tend to tell more people)

What will we predict exactly?

Attempting to predict the types and number of people who would come depending on many factors, weather, date, time of day, whether it is national holidays in other countries/areas, the size of groups, the number of tour guides that may be needed, how many staff to have on duty in the castle shop, what items to have in the castle shop

How should we interpret the outcome?

Looking to build a profile of the visitors and turn that into a predictive model of the future, then compare that model to the actual to find out how accurate it is. There needs to be a look at the data to find correlations between factors (when it’s raining do we sell more umbrellas etc) and deciding which factors are the ones with the biggest effects on visitor numbers.

Most importantly, how can we use it for marketing purposes?

We would have an idea of when visitors from specific countries normally visit (so local holidays in those countries) so able to target those visitors arriving at airports (maybe even in a relevant language), to target advertising to the correct demographics, including possibly social media channels.

Also to allow offers to be made at times that would be predicted to have lower numbers.

Also allows marketing of products that generally sell well at specific times or dates to increase sales revenue.

I received some good feedback for this and it has made me ask some questions myself as to how much data would be needed to predict these and how you would decide which factors are the ones with the biggest effect.

Next up was a literature review then research on examples of how the three main model types (Classification, Regression, Time Series Analysis) are used in real life, my answers to this are below.


  • Whether a music album will be positively or negatively received by the public by analysing the press reviews
  • Whether an individual is a Low, medium, high credit risk
  • Predictions on whether a loyalty card will increase a users spending in store or not.


  • Whether someone is likely is to buy online based on how easy the product search is in comparison to the delivery cost
  • Attempting to predict the winner of the US presidential Election
  • Whether a whisky will taste good or not, based on flavours

Time series analysis

  • Predict ticket sales demand for an annual festival (eg Glastonbury)
  • Predict sales for a flight from when it is released to the time of actual flight, would be based on same flight in previous years etc
  • Predicting the number of people who will view a regular or special occasions TV show

This led on to a lot of thoughts for me, looking at which of these could be used for my day job. I really need to keep a track on how i think these things could be applied at the college as I go along, that will allow me to produce documents at work to offer these options to our senior team and look at ways in which they can be implemented and of practical use at the college.

I’ve come up with some examples of how this could be used at the college:

(please note these are just me dumping thoughts from my brain and not things that are being implemented, they would require a lot of thought and discussion first)


Whether a student is at risk of not completing their year at college! This would use multiple factors and could be used to give a risk rating to a student of Low, Medium, High allowing the college to intervene early in a students career with us, making best use of the support for learning resources that are available.


This could look at predicting the overall number of successful candidates by examining factors such as age, gender, whether in receipt and amount of bursary support, deprivation index of home area and other factors that are determined to have a correlation with success from historic data.

Time series analysis

This could look at predicting the numbers of online applications for courses that will be received based on historical information by month, week, day, hour, occasion (eg Day the courses go live, Exam Results Day) which can be used to predict how many resources are needed for interviews etc.

It has certainly been an interesting first week and already my brain is being filled with new knowledge and potential thoughts for implementation.

Wish me luck as I progress through this, I have 4 Multiple Choice Assessments and 2 Coding Assessments (in Python) over the next 5 and a bit weeks.

Week 2 is Python and Predictive Modelling and i’m really looking forward to getting started with that.

  • Prev Post
  • Next Post