Baxter Eaves
2020-01-21
[news]

# Announcing rv 0.8.0

## A rust crate for building probabilistic programming tools

RV is a statistical toolbox for the Rust programming language designed for our work in probabilistic AI and machine learning. In this post I'll discuss the history of RV and the Rust programming language at Redpoll, RV's design principles, and walk through a few simple examples.

# Rust at Redpoll

At Redpoll, we use Rust for everything in the back end including our AI and services. Safety is our selling point and what's not to love about a safe, performant modern language with a great type system? Well, there's the common problem with all young languages: lack of libraries. We're a research organization, so we don't use much off-the-shelf machine learning code in our product, but still, coming from a python background it sure hurts not to be able to drop scipy and numpy into a project for easy access to scientific computing. What hurt most for the type of machine learning we do was the lack of statistical tools. If I was going to use Rust I'd have to write my own.

# RV

RV — which stands for Random Variables — is a crate for random variables in rust. It's designed for developing probabilistic programming tools. This means that it needs to be flexible, general, and to simplify common statistical pattens.

## Probability distributions

The first ingredient of any probabilistic programming toolbox is probability distributions. In rv 0.8.0 we've implemented 30 distributions. Distributions have different functionality depending on which traits they have implemented. At the minimum, each distribution must implement the Rv trait, which allows you to do the bare essentials to be useful in Monte Carlo. Let's create a standard Gaussian (Normal) distribution.

use rv::traits::Rv;
use rv::dist::Gaussian;

// Standard Normal distribution with mean 0 and stddev 1
let gauss = Gaussian::standard();


We can draw values with the draw method,

let mut rng = rand::thread_rng();

let x: f64 = gauss.draw(&mut rng);


and we can compute likelihoods with the f method.

let fx = gauss.f(&x);
println!("f({}) = {}", x, fx);

// -- Output --
// f(-0.41559754609242056) = 0.3659351330603808


The Rv trait has a sample method, which draws many values into a vector; and a ln_f method, which is the log likelihood.

We can also compute the distribution's descriptive statistics like mean, median, mode, etc:

use rv::traits::{Mean, Median, Mode, Variance};

let mean: f64 = gauss.mean().unwrap();
let median: f64 = gauss.median().unwrap();
let mode: f64 = gauss.mode().unwrap();
let variance: f64 = gauss.variance().unwrap();

println!("{} mean: {}", gauss, mean);
println!("{} median: {}", gauss, median);
println!("{} mode: {}", gauss, mode);
println!("{} variance: {}", gauss, variance);

// -- Output --
// N(μ: 0, σ: 1) mean: 0
// N(μ: 0, σ: 1) median: 0
// N(μ: 0, σ: 1) mode: 0
// N(μ: 0, σ: 1) variance: 1


You'll notice a lot of unwrapping. These methods return Option because some of these quantities are not always defined. For example the variance of an inverse gamma distribution is not always defined.

use rv::dist::InvGamma;

let ig_var = InvGamma::new(3.0, 1.0).unwrap();

// The inverse Gamma variance is undefined if the first parameter, shape, is
// less than 2.
let ig_no_var = InvGamma::new(1.0, 1.0).unwrap();

assert!(ig_no_var.variance().is_none());
assert!(ig_var.variance().is_some());


## Conjugate analysis

We have built in a few examples of conjugate priors to enable simple conjugate analysis. For example, if we wanted to infer the unknown weight of a coin given a set of observations (flips) we could model the flipping of the coin as a Bernoulli process and our prior belief about the weights of coins as a Beta distribution. Beta is conjugate to Bernoulli, which means that the posterior distribution is also Beta. So we can make posterior inferences about the coin weight without fussing with a Bernoulli distribution at all.

In rv, we can implement the ConjugatePrior trait for Beta. The ConjugatePrior trait looks like this

pub trait ConjugatePrior<X, Fx>: Rv<Fx>
where
Fx: Rv<X> + HasSuffStat<X>,
{
// ...
}


where X is the type of data and Fx is the likelihood we're implementing the conjugate prior for. We see that the prior must implement Rv<Fx>, and Fx must itself implement Rv<X> (be a likelihood of X), and implement HasSuffStat<X>, which we'll discuss more later.

ConjugatePrior<bool, Bernoulli> and Rv<Bernoulli> are implemented for Beta. That's right, you can sample Bernoulli distributions from a Beta distribution.

// Beta(1, 1), which is uniform over (0, 1)
let beta = Beta::uniform();
let berns: Vec<Bernoulli> = beta.sample(10, &mut rng);


In a conjugate analysis, we pack the data into an enum DataOrSuffStat because, for many distributions, we don't want or need to store the raw data, we can store a summary or sufficient statistic that is much more compact. This is why we need to implement HasSuffStat<X> for Fx. The DataOrSuffStat enum also tells the prior what type of likelihood these data are meant for. In this case, we have bool data for a Bernoulli likelihood.

use rv::dist::{Beta, Bernoulli};
use rv::data::DataOrSuffStat;
use rv::traits::ConjugatePrior;

let flips = vec![true, false, false, false];
let xs: DataOrSuffStat<bool, Bernoulli> = DataOrSuffStat::Data(&flips);


Now we initialize the Beta prior and simply ask it for the posterior distribution given the observations.

let prior = Beta::jeffreys();
let posterior: Beta = prior.posterior(&xs);
println!("Coin weight Posterior: {}", posterior);

let ml_weight: f64 = posterior.mode().unwrap();
println!("Coin weight ML: {}", ml_weight);

let var_weight: f64 = posterior.variance().unwrap();
println!("Coin weight variance: {}", var_weight);

// -- Output --
// Coin weight Posterior: Beta(α: 1.5, β: 3.5)
// Coin weight ML: 0.16666666666666666
// Coin weight variance: 0.035


## Inference on Custom Types

Some distributions, namely those with support for categorical data, support user-defined data types by way of traits that convert those types to and from a primitive data type. For example, instead of representing coin flips as bool, we can go ahead and define a Coin enum. We will need to implement the Booleable trait, which requires that our type implements Copy.

use rv::data::Booleable;

// Explicitly represent each variant as a u8
#[repr(u8)]
#[derive(Clone, Copy, PartialEq)]
enum Coin {
Tails,
}

impl Booleable for Coin {
fn from_bool(heads: bool) -> Coin {
} else {
Coin::Tails
}
}

fn try_into_bool(self) -> Option<bool> {
}
}


Now we can re-write the above example more explicitly using Coin

let flips = vec![Coin::Heads, Coin::Tails, Coin::Tails, Coin::Tails];

let prior = Beta::jeffreys();

let xs: DataOrSuffStat<Coin, Bernoulli> = DataOrSuffStat::Data(&flips);

let posterior: Beta = prior.posterior(&xs);
println!("Coin weight Posterior: {}", posterior);

let ml_weight: f64 = posterior.mode().unwrap();
println!("Coin weight ML: {}", ml_weight);

let var_weight: f64 = posterior.variance().unwrap();
println!("Coin weight variance: {}", var_weight);

// -- Output --
// Coin weight Posterior: Beta(α: 1.5, β: 3.5)
// Coin weight ML: 0.16666666666666666
// Coin weight variance: 0.035


## Other Stuff

### Information theory

use rv::traits::{Entropy, KlDivergence};

// This is continuous (differential) entropy
let h = gauss.entropy();
println!("{} entropy: {}", gauss, h);

// KL Divergence
let other = Gaussian::new(2.0, 1.0).unwrap();
let kl = gauss.kl(&other);
println!("KL({} | {}): {}", gauss, other, kl);

// -- Output --
// N(μ: 0, σ: 1) entropy: 1.4189385332046727
// KL(N(μ: 0, σ: 1) | N(μ: 2, σ: 1)): 2


### Errors

Constructors validate input unless they are marked with _unchecked.

use rv::dist::GaussianError;

match Gaussian::new(0.0, -3.0) {
Err(GaussianError::SigmaTooLowError) => println!("Expected error"),
Err(_) => panic!("Unexpected error"),
Ok(_) => panic!("There should have been an error"),
}

// -- Output --
// Expected error


Unchecked constructors let whatever you want through (quickly). This could lead to panics or undefined behavior if you can't guarantee that the inputs are valid.

let gauss = Gaussian::new_unchecked(0.0, -3.0);

// panics here because the PDF will try to compute ln(-3.0)
let ln_fx = gauss.ln_f(&0.0);


# Building bigger things

These are all very simple examples but they hopefully give you a sense of the building blocks of RV. In the examples directory, we have built an infinite Gaussian mixture model (an unsupervised clustering and density estimation algorithm), in about 100 line of code.