Data Analytics [DA] and Machine Learning [ML] are structured, quantitative approaches to answering difficult questions about datasets. The promise of DA and ML is that the insights gained about the world can be much more complex than those which can be found by humans, and also that those insights will be free of human bias. This essay will focus on the second promise, the total objectivity of DA/ML. It has long been recognized that the outcomes of DA/ML can vary significantly depending on the choice of methodology, which already strikes a blow to the claims of objectivity. However, lately a more fundamental problem has emerged — the data used for DA/ML often contains human biases and DA/ML performed on such data replicates them.
Computers are reasonably good at analyzing large datasets, but there is one class of problem where they require a bit of help from puny humans – high dimensional datasets. By “high-dimensional” we mean “wide”, as in lots of columns. When we have wide data, it’s very hard to spot commonalities across a number of those columns. For example, if we have data from a large number of sensors, and all of them have something to say about what’s going on, it’s very hard to detect what is similar about all those readings when a particular type of event occurs.
Topics: data analysis