28 July 2012
Machine Learning and Public Policy
The realm of policy and politics is considered to be distant from the latest findings in computer science. Nevertheless, this is no more true. We are now experiencing exciting developments in the subset of artificial intelligence called “machine learning”, the engineering of algorithms aimed at identifying patterns in large and complex datasets. For instance, anti-spam filters are machine learning algorithms that are trained in identifying and filtering emails that can be labeled as malicious or trivial. High quality results are achievable because machine learning software is trained with “examples” in order to learn how to identify specific characteristics in the underlying data, then it is exposed to real data with the aim to label them and separate those with the desired qualities.
There are different paradigms of machine learning, such as Bayesian networks and reinforcement learning, and some of them are mimicking biological processes, as in the case of neural networks and genetic algorithms. There are broad applications, from speech recognition to game playing, and the policy implications are huge.
One of the frontiers of this field is “unsupervised learning”, the development of algorithms able to deliver good results without the prior exposure to a training data set. Few weeks ago, in June 2012, a major development was in this discipline: a group of researchers, based at Google and Stanford University, developed a strong advancement in “deep learning” techniques, creating a neural network able to identify high-level features in a very complex dataset. Specifically, they developed a large-scale neural network with 16.000 processors and 1 billion connections that was capable of identifying very complex structures, such as cat profiles, in YouTube videos.
The evolution of machine learning is coupling itself with the more broader “big data” revolution that is now affecting business and the academia. Harvard University has recently announced a new Master in Computational Science and Engineering, Stanford University has a certificate in Data Science and there are also new tailored programs, such as the joint Ph.D. program in Machine Learning and Public Policy at Carnegie Mellon.
Most of the applications of these techniques for government and policy are yet to discover: a lot of them will be in the realm of national security and military science. For instance, intelligence, counterintelligence, antiterrorism and cryptography will be enriched with new methods to extract information and meaningful patterns from an increasing amount of data collected from both open and confidential sources. Smart algorithms will be able to analyze log files and find cyber-intrusions, detect fraudulent financial transactions and spot potential pandemics in a very early stage. In the governmental area, data from the public administration, the environment agencies and the municipalities have the potential to be processed in smart ways, increasing the potential for high-impact and tailored public policies based on empirical evidence.
It is highly probable that machine learning tools will have an increasing impact because of the scientific research, that is empowering algorithms and their efficiency, and the increased available computational power, based on technological trends such as the Moore’s law and the new large data centers with parallel computing that are the technological base of Google, Amazon and other internet businesses. Moreover, the widespread diffusion of sensors and input devices is going to expand dramatically the amount of available data.
Our world of chaotic and unstructured data may change soon, the ability of computer algorithms to extract meaning from them is already affecting the way in with we live and work.