Over the past two weeks I have started the “Intro to Computer Science” course on Udacity. As noted in my previous post, I have knowledge and experience with programming but took this course in order to learn the basics of Python (logo pictured above) as a first step on the path to the Data Analyst nanodegree. Python is one of the tools (along with R) that is used for data analysis. It has a number of packages/libraries that aid in analysing data and the goal is to learn the basics in order to prepare for the projects in the nanodegree. At the moment I am taking myself through bootcamp before taking on the actual nanodegree.
While I was on the third module of the Udacity course, however, I started a second Python related course on edX: Introduction to Python for Data Science. I actually froze my participation in the first course as I found it quite basic. If you have never programmed before then this is an excellent introduction. The Udacity user interface is excellent; once the “lecture” ends the practice questions for programming are done interactively on an online code editor. I highly recommend this course for anyone new to programming. Personally, I will still skim the remainder of the course but diving into the detail is not of value to me.
The second course, which is hosted on edX, is provided by Microsoft in cooperation with Datacamp. It is more tailored to my needs as well as being organised in a very efficient manner. Each lecture ends by linking to the interactive code sessions on Datacamp. Again, this is an excellent UI which allows one to run code online without the need to install anything locally. I have covered the first four sections on: variables and types, lists, functions/methods, and the NumPy package. I am currently finishing the fifth one which discusses the basics of the matplotlib package that provides the means to create visualizations with Python. There are a total of six sections – the final one is on the pandas data analysis library – and the course ends on the 19th of February. I found the lectures to be quick and to the point. Along with the practice questions on Datacamp, these lectures were ideal for my requirements: a quick intro to the Python programming language with an emphasis on Data Science. There is no assumption of prior programming knowledge to take the course, but having it will allow you to move quickly through the material and get the information you need to build upon.
In addition, I started reading “Doing Data Science: Straight Talk from the Frontline”, a book that begins by questioning whether “Data Science” is in fact a new field or whether it is simply fancy statistical tools hyped up by the media. Essentially, the author concludes that “this is something new” but that “it’s being paraded around as a magic bullet, raising unrealistic expectations that will surely be disappointed”. Before beginning the discussion on the most basic step in any data analysis (i.e. exploratory data analysis: gaining an intuition about the data by generating basic statistics for it), data scientists are described as those individuals that “possess a practical knowledge of tools and materials, coupled with a theoretical understanding of what’s possible”. The book is based on a class taught at Columbia University which aimed to explore whether Data Science is indeed a new field (both in academia and in the industry), as well to define it and remind us of the foundations it is built on, statistics and computer science. Once I finish the book I will review it as a whole.
To conclude, I have found the abovementioned courses on Udacity and edX to be of high quality and value, but you need to pick and choose where to focus based on your experience and needs.