Alex Minnaar

Distributed Online Latent Dirichlet Allocation with Apache Spark

In the past, I have studied the online LDA algorithm from Hoffman et al. in some depth resulting in this blog post. Before we go further I will provide a general description of how the algorithm works. In online LDA, minibatches of documents are sequentially processed to update a global topic/word matrix which defines the topics that have been learned. The processing consists of two steps:

Deep Learning Basics: Neural Networks, Backpropagation and Stochastic Gradient Descent

In the last couple of years Deep Learning has received a great deal of press. This press is not without warrant - Deep Learning has produced stat-of-the-art results in many computer vision and speech processing tasks. However, I believe that the press has given people the impression that Deep Learning is some kind of imprenetrable, esoteric field that can only be understood by academics. In this blog post I want to try to erase that impression and provide a practical overview of some of Deep Learning’s basic concepts.

Building a Distributed Binary Search Tree with Akka

In this blog post I will descibe an interesting Akka mini-project that I came across which helped me gain a deeper understanding of Akka’s asynchronous actor model. In this project we use Akka to build a distributed binary search tree where each node in the tree is an actor which allows it to be a completely asynchronous, concurrent, and distributed version of the traditional data structure. But before we get into the Akka stuff, it would be helpful to remind ourselves of some of the basic properties of a binary search tree.

Introduction to the Multithreading Problem and the Akka Actor Solution

Nowadays, computers have multiple execution cores meaning that they can execute multiple tasks at the same time rather than sequentially. Obviously this makes things much faster but it also presents some new problems. The term multithreading refers to the process in which multiple threads execute code in the same program simultaneously. The inherent problem with multithreading lies in the fact that although each thread acts independently, their memory is shared. Therefore, it is possible for threads to change shared memory values without other threads knowing which can create problems. Let’s use a bank account as an example. Consider the following code that implements a bank account with deposit and withdraw methods.

ScalaNER: A Scala Wrapper for the Stanford NER Tool with Some Added Features

The Stanford NER (named entity recognizer) tool is a widely-used, general purpose named entity recognition tool that Stanford has made available as part of its CoreNLP Java library. It performs named entity recognition via a CRF-based sequence model which has been known to give near state-of-the-art performance results which makes it a popular choice for open-source NER tools.