In this post I will provide a quick update on my current studies and provide some insight on another topic I have come across recently: data ethics and data to do good.
Over the past two weeks I’ve continued the Intro to Statistics course on Udacity reviewing probability theory. In addition, I’ve started the Intro to Data Science course which begins by defining data science and introducing some basic concepts of pandas and numpy data structures (Python packages).
One of the definitions given for a data scientist comes from Josh Wills and is well worth remembering…
Data Scientist (n.): Person who is better at statistics than any software engineer and better at software engineering than any statistician.
— Josh Wills (@josh_wills) May 3, 2012
The Intro to Data Science course does not demand any programming knowledge (although such knowledge is helpful). Any code used is pre-written and you run it in order to view the results only. Therefore, it is different from other courses that demand you sit down and actually work on code or other exercises. So far, the course explores data structures in Python that can be utilized to analyse data. Upcoming lessons are on data wrangling (manipulation of raw data), visualizations and map reduce (processing and generating large data sets in parallel on distributed systems).
In this post I also want to discuss the ways that data is being used or could be used. The motivation for looking into this topic in more detail is the upcoming edX course on Data Science Ethics which begins May 1st. Privacy and security, both personal and corporate, as well as government spying are of concern to all of us. In the latest RSA conference several opinions were voiced regarding the US government’s request to unlock the iPhone of one of the San Bernardino killers. As with most things it’s a matter of balancing freedom and rights against safety and security threats. And, as always, the question is who you can trust and whether there is a system in place to check for abuse. There is a series of Ted talks regarding “The Dark Side of Data” and the first is appropriately titled “Your Phone Company Is Watching”, for all of us in telecoms.
There are additionally a bunch of companies or institutions that are using data for good, ranging from cultural issues to service projects or analysis of medical data. I’ve noted Data and Society, DataKind and the Center for Data Science and Public Policy at the University of Chicago. These institutions either do research on the topics of data used for societal good, or actively take part in projects to enhance data and analysis for social good projects. There is in addition a number of Kaggle competitions that are, in nature, oriented towards good social causes (e.g. from medical institutions) and even a similar competition website, Driven Data, dedicated to competitions “to save the world”.
In addition to the Data Ethics course on edX, I will also mention the Applied Cryptography course on Udacity for anyone that might be interested in something more hardcore and concrete on the subject.
The issues of data privacy and security have concerned Commsrisk numerous times. My belief is that people that work with data and networks have a responsibility to act and behave ethically. This means being responsible with the data we are privy to, as well as implementing techniques that safeguard data in transit and the rights of private individuals.