## Glossary

This post consists of common terminologies along with their explanation that I will be using in my other articles.

## Principle Component Analysis

Principle Component Analysis is a technique used to reduce the dimensions of a given set of data features (fields) and their values. It is used to reduce the number of features in our data without compromising the accuracy too much. Basically, we try to emphasize variation and bring out strong patterns in a dataset. This is done by combining a group of possibly related features into a single new feature called a ‘principle component’.

As an example, consider a set of features about houses.

- It may have combinable features, like the length and breadth of a plot of land. These can be combined into “area”.
- It may list similar data multiple times, like area in square meters as well as square feet.
- Even without clear correlations, we can often combine features and receive a satisfactory result that needs less space and makes algorithms process this data faster.

### Working

In 2 dimensions, PCA combines features by finding a new vector (1 dimension) and projecting the values onto it. For larger dimensions, PCA reduces values in a ‘space’ of n dimensions to a ‘subspace’ of k dimensions (k<n).

- First, we standardize the data set by Scaling (reduce all data ranges to [0,1]) and Normalization (subtract the means to get a mean of 0 for each feature)
- The subspace is chosen to maximize the variance in the dataset. This means the magnitude of their projections (distance from the origin in the subspace) must be maximized.
- This can be achieved by choosing new vectors along eigenvectors of the covariance matrix. The covariance matrix defines the ‘spread’ of the data. Its values show how the values of its axes change with respect to each other, thus providing the shape of data. By definition, eigenvectors denote directions along which projection lengths will scale linearly, so the variance is not reduced.
- Since we need to maximize the values and we only need k vectors, we choose k eigenvectors, corresponding to the k largest eigenvalues.
- Now that the new vectors are obtained, multiply them by the data to get create the reduced set of values.

## AutoEncoders

## Regularization

The dataset used in most Machine Learning problems consist of two things: **pattern + noise**. The job of the machine learning model is to learn the pattern in the data and ignore the noise. If it is also learning the noise then it’s **overfitting**. Let’s take an example of *pattern + noise* in a house pricing dataset. So the features in this dataset can be the number of rooms, area, location, etc. So based on these given features we can estimate the price of the house. But as we know that not all the houses that have the same features have the same price. This variation in price — of houses with same features — is called *noise*.

Our model must only learn the pattern (simpler model) but learn to ignore the noise(higher order polynomial). So to make the model ignore the noise, we need to have a mechanism that penalizes the model everytime it considers the noise(higher order polynomial) while training. This mechanism of penalizing every time the model chooses higher order polynomial — which has an insignificant reduction in error — is called **regularization**.

## Latent Space

## Random Variables

### 1. Continuous Random Variable

It can store an infinite amount of values in it. Usually, when continuous variables are plotted, they create a line in the graph.

For example, the weight of a person, the height of a person, the temperature of water, etc can be represented using any real number.

### 2. Discrete Random Variable

It can store a limited amount of values in it. Usually, to plot discrete variables, histograms are used.

For example, the number of candies in a jar, no. of days in a week, etc can be represented using any positive integer.

## Knowledge Base

Knowledge Base is a large collection of curated knowledge, such as Freebase, YAGO, or CYC. The word knowledge base is also applied when this knowledge is automatically constructed, such as with NELL. Most of the knowledge bases include subject-verb-object triples automatically extracted from large text corpora. We formally define a knowledge base as a collection of triples, (es, r, et), where each triple expresses some relation r between a source entity es and a target entity et. As stated above, the relations r could be from an underlying ontology (such as /TYPE/OBJECT/TYPE, from Freebase, or CONCEPT

Referred From: Doctoral Thesis of Matthew Gardner