Machine learning (ML) is a field of Artificial Intelligence (AI) that allows computer programs to learn from data and construct decision models without explicitly programming them
The picture above is a comparaison between classical programming and machine learning. In Classical programming, computers can only do what we program them to do. While with machine learning computers can gain knowledge based on previous data.
For example, social media apps use machine learning to create your feed based on your preferences.
There are three types of machine learning:
- supervised learning
- unsupervised learning
- reinforcement learning
Supervised machine learning is used with labeled datasets to predict a target y based on features X Now let's define these terms:
- a dataset: a collection of data
- a target: The variable that we want to predict (output). We call all the entries of the target column labels.
- features: All the variables other than the target (input)
- labeled dataset: It's a dataset that consists of input and ouput that means the target variable is already known
Both features and target variables may be of different data types:
- Numerical: values in R
- Integer: values in Z
- categorical: values in a finite set
- binary: values in {0,1}
Example: the Iris
dataset is a labeled dataset that has 4 numerical features (sepal length, sepal width, petal length and petal width) and a target variable that has three classes (setosa, versicolor, and virginica)
Read more about Iris dataset
There are two types of supervised learning:
- classification: target variable is discrete (can take limited values)
- regression: target variable is continuous (for example a numerical target)
Read more about classification and regression
Unsupervised machine learning is learning by making use of unlabeled data. Unlabeled data have an unknown outcome (we only have input, the target is unknown). So, unsupervised machine learning is about figuring out structure in unlabeled data and finding patterns.
The picture below is for further clarification
Unsupervised machine learning can be divided into clustering, density estimation and dimensionality reduction.
That being said what's the difference between supervised and unsupervised learning?
The table below summarizes the main differences:
supervised learning | unsupervised learning |
---|---|
input data is labeled | input data is unlabeled |
uses training dataset | uses just input dataset |
used for prediction | used for analysis |
classification and regression | clustering, density estimation and dimensionality reduction |