by MostlyStatic

Monday, July 31, 2017

Random MATLAB musings

Even though 2006 was a while ago, it still seems unlikely that software as mainstream as MATLAB would have shipped in that year with a bug in the default pseudorandom number generator. And yet, that’s exactly what happened [1]. You can still reproduce this bug (in MATLAB 2017a) by rewinding to the old defaults and plotting the output of the rand function.

rand('state', 0);
Z = rand(28, 100000);
condition = Z(1, :) < 1/4;
scatter(Z(16, condition), Z(28, condition), '.');


RNG FAIL [1]


The variable Z should have independent entries, but conditioning on the first row induces this ugly dependence between the 16th and 28th rows. The bug is long since squashed, but if you aren’t controlling the environment your software is run on, you shoud add rng('default'); to the top of your code. That will ‘fix’ your random stream in the unlikely event that your software is either run with an ancient version of MATLAB, or run just after some old legacy code that messes with the random stream.

rng('default'); also means that your code should produce the same output each time it’s run. That’s useful when scientific rigour is required, or if you’re hitting a bug triggered by the random stream, which often happens with Markov chain Monte Carlo (MCMC) for example.

If you want independent runs (i.e., random restarts in MCMC), you could require that your user provides the seeds. But if you don’t want to bug your users for seeds, how can you make sure the seeds are random? K&R suggests something like srand((unsigned)time(NULL)); [2]. Converting this C code to MATLAB (or any other language) can lead to dependence between runs if you launch a bunch of jobs all at once, and that is not acceptable. One option to get around that (which works on Linux/MacOS/cygwin, etc ...) is to initialise from the hardware random data device, like so:

rng('default');
fp = fopen('/dev/random', 'rb');
seed = fread(fp, 1, 'uint32');
fclose(fp);
rng(seed); % Set the seed for MATLAB's SRNG.

I usually let the user provide an optional argument seed to my main function, and then hit the hardware random device for the seed if they don’t provide it. Make sure to tell the user what seed was used, so they can sub it in to reproduce their result:

rng('default');
if ~exist('seed', 'var')
    fp = fopen('/dev/random', 'rb');
    seed = fread(fp, 1, 'uint32');
    fclose(fp);
end
rng(seed);
fprintf(1, 'Using seed: %ld\n', seed);

The hardware random device should be enough, but if you want overkill, use a genuine random number generated by the radioactive decay of a Cæsium-137 isotope. You can get that from this more-than-a-little-jank looking setup in Lignieres, Switzerland.

The hotbits setup


You need an API key to request these Cæsium-137 random numbers, which you can get here. Once you have your API key, sub it in for 'HB1XXXXXX' in this following code:

rng('default');
command = ['curl -s ' ...
  '"https://www.fourmilab.ch/cgi-bin/Hotbits' ...
  '?nbytes=4' ...
  '&fmt=hex' ...
  '&apikey=HB1XXXXXX"' ...
  ' | grep -A1 "<pre>"' ...
  ' | tail -n 1'];
[~, seed] = system(command);
seed = hex2dec(strtrim(seed));
rng(seed);
fprintf(1, 'Using seed: %ld\n', seed);

All of this reproducibility suff is contingent on your code not being multithreaded. If you’re threading with parfor for example, you won’t get reproducible results even if you control the seed (the order of the threads is not guaranteed and not governed by your seed). You can recover reproducibility by spamming rng calls everywhere, which works great for MCMC algorithms that can thread within each MCMC iteration. You also have to use the 'twister' random number generator, as 'default' lets parfor override the rng calls (in MATLAB 2017a at least). Here's some skeleton code for reproducible parfor:

seed = 2666;
rng(seed, 'twister');
seeds = randi(2^32 - 1, n + 1, 'uint32');
seeds(:) = seeds(randperm(n + 1));
% A 
parfor i = 1:n
  rng(seeds(i), 'twister');
  % B
end
rng(seeds(n + 1), 'twister');
% C

Your code is % A, % B and % C.

Happy hacking!

[1] P. Savicky. 2006. A strong nonrandom pattern in MATLAB default random number generator. Tehcnical Computing Prague.
[2] B. Kernighan and D. Ritchie. 1987. The C Programming Language, Second Edition. Prentice Hall.