Alex Minnaar

Reinforcement Learning Notes Part 3: Temporal Difference Learning

Temporal difference learning shares many of the benefits of both dynamic programming methods and Monte Carlo methods without many their disadvantages. Like dynamic programming methods, policy evaluation can be updated at each time step but unlike dynamic programming you do not need a model of the environment. Like Monte Carlo methods, you do not need a model of the environemt but unlike Monte Carlo methods you do not need to wait til the end of an episode to make a policy evaluation update. All three of these methods use the same policy iteration strategy which iterates between policy evaluation (different for each method) and policy improvement (in a greedy fashion for each method).

Reinforcement Learning Notes Part 2: Monte Carlo Methods

In the last reinforcement learning blog post we covered dynamic programming methods. In this blog post we will cover Monte Carlo (MC) methods. The biggest difference between these two methods is that dynamic programming methods assume a complete knowledge of the environement (via a MDP), but Monte Carlo methods do not. Instead, with Monte Carlo methods, knowledge of the environment is learned through experience. Another significant difference is that Monte Carlo methods can only learn from episodic tasks i.e. ones that start and terminate.

Reinforcement Learning Notes Part 1: Dynamic Programming

This series of blog posts is intended to be a collection of short, concise, cheat-sheet-like notes on different topics relating to reinforcement learning. This first one will cover dynamic programming methods applied to reinforcement learning.

Calling CUDA from Python to Speed Up Linear Algebra

Numpy is the go-to library for linear algebra computations in python. It is a highly optimized library that uses BLAS as well as SIMD vectorization resulting in very fast computations. Having said that, there are times when it is preferable to perform linear algebra computations on the GPU i.e. using CUDA’s cuBLAS linear algebra library. For example, the linear algebra computations associated with training large deep neural networks are commonly performed on GPU. In cases like these, the vectors and matrices are so large that the parallelization offerred by GPUs allows them to outperform linear algebra libraries like numpy.

A CUDA Implementation of the K-Means Clustering Algorithm

This blog post will cover a CUDA C implementation of the K-means clustering algorithm. K-means clustering is a hard clustering algorithm which means that each datapoint is assigned to one cluster (rather than multiple clusters with different probabilities). The algorithm starts with random cluster assignments and iterates between two steps