Mike Croucher
MathWorks
Twitter: @walkingrandomly“all I can hope is that future historians note that one of the core empirical points providing the intellectual foundation for the global move to austerity in the early 2010s was based on someone accidentally not updating a row formula in Excel”
This 2003 trial, done in Kenya, found that deworming whole schools improved children’s health, school performance, and school attendance.
In 2013, the data was reanalysed independently using new computer programs
Many mistakes found.
Software is critical to our research but is treated as a third class citizen
Assume Croucher's law is true
Adapt our working practices
Open package foo. Click, Click, drag, Click, Click, Click, Right-Click, Save, 'results.csv'.
Load into Excel. Click, drag, generate graph, right click, save, 'pretty-graph.png'
I analysed my data in foo using the bar analysis. Here's a graph of the results.
aka 'learn to program'
The Ideal
Results = TheAnalysis(MyData)
Reality
It is knowledge transfer
It is the foundation of reproducible computational research
(Partial) Solutions
Write code in a (very) high-level language
(Best Practices for Scientific Computing, PLOS Biology, Wilson Et Al)
(Partial) Solutions

If can't be fully open, be as open as possible within your organisation
Nothing else contains the information required to fully reproduce your work.
We use K-means in Python with 50 clusters and K-means++ initialisation
No need to share code. It's 2 lines. Trivial!
Also took me 2 lines of Python
Several differences
We used different libraries
Imagine how many gotchas there might be here
(Partial) Solutions
They say: It's too much extra work
Do work on file1, file2 and file6
git add file1 file1 file6
git commit -m "Description of why you modified those files"
git push origin master
Speak to IT about installing an in-house GitLab instance
source: https://twitter.com/bobearth/status/571154995506122755
(Partial) Solutions
Someone sends you this
Your experience
Think of all those constantly shifting dependencies
...and control it with Conda
Install Miniconda from https://repo.continuum.io/miniconda/
You are told it works using scikit-learn 0.17
conda create --name pca_project python=3.5 scikit-learn=0.17 jupyter
conda activate pca_project
jupyter notebook
Set up the exact environment I used
git clone https://github.com/mikecroucher/pca_demo
cd pca_demo
conda env create -f environment.yml
conda activate old_scikit
jupyter notebook
(Partial) Solutions
Doesn't have to understand your research
Remit: Tell me where I could do better?
Problem 1: Get the code running on THEIR machine
(Partial) Solutions
Traditional reports are just advertisements
A Literate computing document IS the research
(Partial) Solutions
$ nosetests ./unittests.py
..............................
----------------------------------------------------------------------
Ran 30 tests in 0.152s
OK
(Partial) Solutions
Easy!
h = sqrt(x*x + y*y)
So why is the hypot function in math.h, Python and MATLAB?
max = maximum(|x|, |y|)
min = minimum(|x|, |y|)
r = min / max
return max*sqrt(1 + r*r)
openlibm - 132 lines of code
https://github.com/JuliaMath/openlibm/blob/master/src/e_hypot.c
10,000+ times speed difference between worst and best of the same algorithm
Which algorithms interest you the most?
Tell me at:
(Partial) Solutions
No!