Machine Learning: The Basics

As machine learning (also called data mining) becomes a more integral part of technology everywhere, it will become increasingly important for lawyers and businessmen to be able to relate to and understand how it works. Machine learning is a subfield of AI which encompasses the creation, study, and use of techniques that allow computers to process new information, learn from it, then use that learning to perform some task.  The most important machine learning concept to understand, and the focus of this post, is the distinction between supervised and unsupervised learning, the two methods by which systems are educated.  There is an enormous diversity of specific algorithms and techniques used in machine learning, but all fit inside of one of these two methods. The goal of supervised learning is to create a system which can successfully predict or classify input data; how the system uses those predictions/classifications is irrelevant.  In order to develop a supervised learning scheme, a sample dataset which is representative of the universe — i.e. the data that the system will have to read after development is finished — is required.  This dataset is divided into a training set and a testing set.  This division can be accomplished in a number of different ways, and the two datasets do not have to be of equal size, as long as each new dataset is representative of the full sample dataset and universe.  There are a number of supervised learning algorithms, many of the most popular use Bayesian statistics, neural nets, or decision trees. When conducting supervised learning, the testing set is held in reserve while the training...