Studying Data Science: edX MS Course on Data Sci and Machine Learning (part 2)

The “Data Science and Machine Learning Essentials” course moved slow over the past few weeks as some personal issues did not allow me to be as active as I would have hoped. However, I did complete the first module which essentially is the foundation of the theory of data science and machine learning.  It defines the basic algorithms and methodologies used in the field including regression, recommendation, clustering, classification. In addition, I created the first sample experiment on Azure Machine Learning platform and explored some parts of the next two modules.

Being a course offered by Microsoft the theory is implemented on the Azure Machine Learning platform. You can sign into the platform with any existing Microsoft account or your business email if you are using Microsoft Cloud. Below is the startup page of the Azure platform once you have logged in:

Microsoft Azure Machine Learning login page

The experiment option allows you to create a workflow which will pull the data into a dataset and then run it though the selected model. In addition, the feature most applicable to anyone trying to develop custom modules is the integration of code which can be written in either R, Python or SQL. What I really liked about the platform is the simple interface which allows one to drag and drop modules into the workflow space and run the algorithm. You can see this below.

Azure ML experiment sample

Due to the start of “The Analytics Edge” on the 12th April, I will unfortunately have to leave another MOOC incomplete. However, I have looked through all the material and I really like the pace, as well as the material in the presentations. Module 2 discusses data acquisition, sampling and cleansing. It introduces methods to input data into Azure (ranging from reading in a CSV file to connection to web services). This is followed by a very interesting presentation on using R vs Python. Data types are presented and grouped into continuous and categorical (or discrete) and quantizing data is presented, i.e. transforming continuous data into categorical data (for example stating a range of salaries vs the exact salary of someone). Module 3 talks about visualizations exploring the ggplot2 R package and the Panda and matplotlib packages in Python. Module 4 covers regression, classification and unsupervised learning, whilst the final module is on recommenders and publishing your work via Azure.

My final note on the course is that it is definitely recommended and worth the effort. Since this is a self-paced course I can afford to pause it and return later on, while “The Analytics Edge” is a course with tight deadlines running for a specific time period.

This version of the course ended on 31st March, which means you can still review the classes but cannot register for a verified certificate. However, there should be an updated version appearing some time in the near future as is the usual case with self-paced courses on edX. The main reason for this “sudden” retirement of a few courses on edX is a change in policy regarding the certificates. You can find more details here; the essence is that honor code certificates (unpaid certificates for auditing the course) are not being offered anymore.

Michael Lazarou
Michael Lazarou
Michael Lazarou manages revenue assurance and fraud at Epic, a Cypriot telco, having joined their RA function in March 2011. His background includes a double major in Computer Science and Economics, as well as an MBA. Before being lured into the exciting world of telecoms he worked as a software developer.

Michael is interested to gain a better understanding of different aspects of RA and data analysis. He shares his insights on training courses he participates in with Commsrisk. Michael's accumulated experience of online training also led him to volunteer for the role of Coordinator of the RAG Learning online education platform.