The piecemeal review of the Udacity Data Analyst Nanodegree (DAND) continues with project 4, which focuses on exploratory data analysis (EDA), a technique associated with mathematician John Tukey. The lectures in this module are presented by Facebook data scientists and in the introduction they define EDA as “understanding data using visualization and statistical tools”. EDA is our “initial interaction with data” during which we “test our intuitions about the data set and develop new intuitions”.

For the quizzes and project of this module we use R and R studio, which is a GUI for R. The lectures are essentially split into an introduction to R and then 3 sections which guide you through exploring data with one, two and multiple variables.

The lecture videos are interspersed with short insights from Facebook data scientists and case studies of their own work. An overview of various plots and visualization packages in R are presented as well as techniques to gain better perspective on the data. Correlation is discussed and visualized using scatterplots.

In the project you are required to choose and explore a dataset using R and the ggplot2 package. The result has to be presented as an RMD file (R markdown).

At this point in the nanodegree I am a bit off schedule. I have until the end of June to complete three more projects in order to “graduate” and receive 50% of my fees back. Project 3 turned out to be more challenging than expected as converting the data from osm (xml) to csv to importing into the database was met with many different small issues which were, however, enough to cause a delay. The difficulties forced me to turn to the Udacity forums where mentors provide assistance. The feedback arrives within a few hours and is always extremely accurate and on target. This along with many additional searches online helped me to progress.

I would like to also briefly share how the project submission works. Each project is submitted for review against a given rubrik. Reviewers of projects provide detailed feedback and comments. Points that need improvement are noted and one can resubmit their project until it meets the requirements. Reviewers are then graded out of five stars (plus comments) by the students.

Apart from the structure and the lectures, Udacity’s chief advantages over other MOOC platforms lies in the forums with the mentors, and the reviewers of the projects. This is what sets Udacity apart and if you decide on a naondegree you should take advantage of these features.

Now I had better get back to class…

Michael Lazarou
Michael Lazarou has worked as a Revenue Assurance Analyst for MTN Cyprus since March 2011. His background includes a double major in Computer Science and Economics, as well as an MBA. Before being lured into the exciting world of telecoms he worked as a software developer.

Michael is interested to gain a better understanding of different aspects of RA and data analysis. He shares his insights on training courses he participates in with Commsrisk.