Happy New Buzzword: Why I Study Data Science

Big data. Data Science. Analytics. Buzzwords galore. Even the Dilbert cartoon in the previous link has been repeated so often it’s become part of the hype, along with the host of articles on the theme: “Big data? Big deal.”

Data science is the interdisciplinary field which primarily utilizes computer science and statistics to analyze data and extract information. This information can be in the form of a report or presentation, but it might also be a prediction, a study of patterns, or an instance of machine learning (algorithms that “learn”). The skills required to be a “proper” data scientist are diverse. Data scientists need a foundation in programming and mathematics, and also the ability to communicate and present information. As an indicator of how hot data analytics/science jobs are at the moment, you can check the jobs that have the highest salaries currently, according to the World Economic Forum.

Analytics is hot not only in business but also in sports. In basketball – the king of sport :) – the field of analytics is APBR metrics and working for a team analysing data and basketball statistics requires a skillset as complete as any demanding job in the “real” corporate world. One of the greatest and most respected coaches in the NBA Gregg Popovich succinctly remarks: “I look at the analytics. Some of it is very worthwhile. Some of it is superfluous poppycock.” I didn’t mention that he is one of the most outspoken and direct coaches as well.

The same goes for the training and courses available for big data/analytics/data science. For the past year I’ve looked into several courses, or tracks leading to certifications of various kinds. As I have noted in various posts, there is a lot of depth and there is quality in the courses if you have the time to look into them. However, training is not a one size fits all deal. I might gain from a course on statistics on any MOOC website, but what is my end goal? And, above all, what will give me the greatest return for the little time I have to put in?

Therefore, beginning with the end in sight, there will be no more experimentation during 2016. There is no need for it anymore. Reviewing stats or learning R for the sake of it is not valuable if I cannot transfer those skills to where I work or to meaningful projects. After looking around and taking some courses I have decided that in 2016 I will complete the Data Analyst nanodegree on Udacity.

In alignment with this, I have dropped the courses I had planned to take on edx. The only course I might take from here, time-permitting, is “The Analytics Edge”. There are a small number of unrelated courses that I might take but these have to do with professional development on a more general level and only to take a break from the “serious” stuff.

I have registered to take the introductory courses on Udacity before I actually register for the nanodegree track. These include: Intro to Statistics, Intro to Descriptive Statistics, Intro to Inferential Statistics, Intro to Computer Science, Intro to Data Science, and Intro to Data Analysis. For most of these I know the theory but I am re-taking them in order to review or learn a new tool. For example, Intro to Computer Science uses Python and since it is at a beginner level I can go through it quickly while I learn the basics of Python. You can take the courses for free without being part of the nanodegree track but you will not have access to one-on-one coaching. Once I complete these courses (hopefully by the end of March) I will register for the nanodegree (200 US dollars per month – 50 percent back on completion of the nanodegree). I intend to complete the nanodegree by the end of the year.

The main reason for choosing the Udacity nanodegree, despite the high cost in terms of time and effort, is that the projects are a better match for my learning style and current needs. Also, I wanted to learn Python in the context of data analysis while completing actual projects, and not simply by running some commands in the editor. Completing the projects provides a good portfolio to showcase, but also forces you to immediately apply the knowledge you have gained. The UI of the courses is straight forward and interactive; once a video is done the next “slide” is a quiz you complete online, right there and then. My feeling and hope is that the high investment will also result in a high return.

Having said all this, I will explain why Data Science excites me, and it is not just because it is trendy. My university education included a double major in Computer Science and Economics, as well as an MBA. Data science feels like the perfect marriage of stats/econometrics and programming, while the analysis of the results and reporting of findings is another challenge I find intriguing. In addition, this process is also very similar to that followed in RAFM departments in telecoms, where operational units are in charge of developing controls and scripts, whilst a different function is responsible for the design of controls and reporting of findings.

At the moment I have completed the first lesson in the Intro to Computer Science course, which is related to string manipulation. The plan is to complete Computer Science and Data Analysis first, then the three statistics courses, and finally the Data Science course. By the end of March (or on the first good offer from Udacity) I will register for the nanodegree. I will be writing about my journey to complete the coursework and occasionally update you on offerings from other MOOC websites.

Michael Lazarou
Michael Lazarou
Michael Lazarou has worked as a Revenue Assurance Analyst for MTN Cyprus since March 2011. His background includes a double major in Computer Science and Economics, as well as an MBA. Before being lured into the exciting world of telecoms he worked as a software developer.

Michael is interested to gain a better understanding of different aspects of RA and data analysis. He shares his insights on training courses he participates in with Commsrisk.