14 Jul 2018

Many deep learning libraries like TensorFlow use graphs to represent the computations involved in neural networks. Not only are these graphs used to compute predictions for a given input to the network but they are also used to backpropagate gradients during the training phase. The main advantage of this graph representation is that each computation can be encapsulated as a node on the graph that only cares about its input and output. This level of abstraction gives you the flexibility to build neural networks of (nearly) arbitrary sizes and shapes (eg. MLPs, CNNs, RNNs, etc.). This blog post will implement a very basic version of a computational graph engine.

22 May 2017

This post is about the Gaussian mixture model which is a generative probabilistic model with hidden variables and the EM algorithm which is the algorithm used to compute the maximum likelihood estimate of its parameters.

06 Sep 2015

Presently, most deep neural networks are trained using GPUs due to the enormous number of parallel computations that they can perform. Without the speed-ups provided by GPUs, deep neural networks could take days or even weeks to train on a single machine. However, using GPUs can be prohitive for several reasons

18 May 2015

In the previous post the concept of word vectors was explained as was the derivation of the *skip-gram* model. In this post we will explore the other *Word2Vec* model - the *continuous bag-of-words* (CBOW) model. If you understand the *skip-gram* model then the *CBOW* model should be quite straight-forward because in many ways they are mirror images of each other. For instance, if you look at the model diagram

12 Apr 2015

In many natural language processing tasks, words are often represented by their *tf-idf* scores. While these scores give us some idea of a wordâ€™s relative importance in a document, they do not give us any insight into its semantic meaning. *Word2Vec* is the name given to a class of neural network models that, given an unlabelled training corpus, produce a vector for each word in the corpus that encodes its semantic information. These vectors are usefull for two main reasons.