Managing revenue assurance and fraud for MTN Cyprus does not leave me plenty of time for extracurricular study, but last month I completed the Udacity Data Analyst Nanodegree (DAND). As mentioned in my previous post Udacity has modified this program, splitting it into two terms and reducing the number of projects required to graduate. I had started the previous version of the program in the summer of 2016 but finally stopped it in March 2017 due to the cost (both financial and in terms of time). I restarted the new version, going directly to Term 2, i.e. “Advanced Data Analysis” in October (with sponsorship from my company) and graduated successfully, as I said, last month.
I will be using the Class Central template for reviewing this course.
In an earlier post I mentioned why I was so keen on taking Udacity’s program. My goal when deciding to take this course was to gain a deeper understanding of data analysis and improve my knowledge of analytics. Although I knew that there was very little SQL in this course, which is what I work with daily, I felt that the data science field which has become so popular lately is worth exploring in more depth. Given my background in programming and working with databases I felt this was a good path to follow.
As was mentioned before, this course had a different structure. It was a 12-month program with 9 projects. Now it consists of two terms allowing you to skip the first if you feel confident enough. The cost used to be USD200 per month, whilst now there is a specific price for each term: USD499 and USD699 for terms 1 and 2 respectively.
The lecturers in these courses are usually data scientists and analysts from cooperating companies e.g. Facebook.
My background in computer science, working as a programmer and SQL scripting definitely helped me in this course. If you have no prior experience or very little experience I recommend starting from term 1 and first getting a good feel for R and Python.
What classes have you taken that prepared you for it?
Any previous programming experience or data analysis or statistics background will aid you in this course. After all it is aimed at this group of people, but also people that want to switch careers (who will obviously also need to begin from term 1). I had also completed a Data Analysis course on edX that used R exclusively for its projects.
In order to graduate you have to complete all projects.
The projects required for graduation in DAND Term 2 are:
- Test a perpetual phenomenon: In this project you must use statistics to describe data collected while investigating the Stroop effect. I received credit for this project from my previous enrollment. This is a simple project to introduce you to the topic and to get a feel of the environment. You only have to answer some questions using descriptive statistics and hypothesis testing. No specific tools are required to complete the work.
- Exploratory data analysis: Choosing from a list of data sets, you have to perform exploratory data analysis using R. The process has to be documented which means writing your thoughts and adding visualizations along the way. A final section will draw attention to the main findings and most important aspects of the data. Keep in mind, this is raw data which might be messy and dirty. This means you need to explore the data bearing in mind some of it might be clutter that has to be removed from the core analysis. Finally, a short reflection on the process has to be submitted in addition to the workings.
- Wrangle and analyze data: This was the most challenging project. It was broader in scope and required more to be done to get the the final result than any of the other projects. The project requires (as copied from the instructions):
- Data wrangling, which consists of:
- Gathering data
- Assessing data
- Cleaning data
- Storing, analyzing, and visualizing your wrangled data
- Reporting on 1) your data wrangling efforts and 2) your data analyses and visualizations
- Data wrangling, which consists of:
Data was gathered in three ways, downloading a list of tweets of @WeRateDogs in csv format, then using the Python requests library to get a tsv (tab separated values) file consisting of image predictions of the dog breed in each of the tweets of the first data set and finally querying the Twitter API using a Python library to download each tweet’s JSON data which includes the favorites and retweets counts.
This data had to then be assessed programmatically in Python (note: the new version of DAND used python 3, while the previous version used Python 2). The requirement is to identify eight quality and two tidiness issues. The data then has to be cleaned, combined and stored. So you can join your list of tweets to the image predictions data and then to the API downloaded data to get the favourites and retweets count. Analysis and visualizations follow. Two reports have to be provided: one documenting the wrangling process and a written report documenting insights and one visualization which has to be written as an external document (e.g. a blog post).
Just the gathering of the data is quite involved because of the requirement to use the Twitter API to get the third data set. There is a rate limitation of 15 minutes from Twitter which has to be considered. In all honesty, a project that requires more from you also stays with you longer, i.e. you learn more. I assume that depending on your background each project can seem more or less demanding. The exploratory data analysis is also demanding if you have no R experience; it helped that I had taken some previous courses in R.
- Data storytelling: In this project you have to select a data set from a provided list and create data visualizations in Tableau. This includes exploring the data visually and finally creating a story. You can view the final result of my project here. This is a relatively simple project where the goal is to get acquainted with Tableau and data visualization tools. If you do not have experience with Tableau or similar visualization tools you need some time to familiarize yourself, but overall the task is straightforward.
There is no additional grading apart from passing the projects. Each project is submitted and you will receive a review based on a given set of parameters. You receive feedback and comments on your work and can re-submit until your project meets the standard.
How hard was this class to pass?
Personally I did not complete all the coursework, going straight to the projects and referring to the lectures when/if required. I would skim over some of the notes and lectures to get the basic idea and then take on the project. I have always felt that doing something and forcing yourself to complete a task is more beneficial. This is one of the reasons I preferred the Udacity program; the projects and the fact that you receive feedback on each one. Overall this is a big commitment if you are going to listen through each lecture. The projects are also long and complicated so you do need to set time aside. I would not advise on taking this course if you cannot work for at least one to two hours daily on the course. I personally did less, but I have a background in the subject matter and skipped a lot of the lectures.
I recommend the Data Analyst Nanodegree provided this is something you really want to do. It requires a commitment and patience to complete, but is a great foundation in data analysis. Even if you are working with SQL and not Python or R, the concepts are helpful as well as the methodology used to explore/analyse data. I believe ot would be more beneficial for someone starting out in their career or someone wanting to move from one field to data analysis.
Were you successful in completing the course?
My goal is not to proceed with AI or machine learning which for many is the next step, but to improve on data analysis. So I will continue with some courses I have found on edX, such as:
Let me reiterate that my end goal is to improve and get a deeper understanding of specific topics. This is also achieved by reading articles online, solving specific problems at work, talking to others. I look at courses as something to add to my skills and knowledge, or even simply to have in mind. We can boost not only our level of skills but also our level of confidence by showing ourselves what can be done, and then doing it.
Final note: if you would like to read another review of the DAND you can check David Venturis’ article here. He is a now a content developer at Udacity.