## Is Your Research Software Correct?

Mike Croucher

EPSRC Research Software Engineering Fellow

www.walkingrandomly.com
@walkingrandomly
M.Croucher@Sheffield.ac.uk
Sheffield Research Software Engineering

## but wrong

Mike Konczal
“all I can hope is that future historians note that one of the core empirical points providing the intellectual foundation for the global move to austerity in the early 2010s was based on someone accidentally not updating a row formula in Excel”

## What were the real errors?

• They used Excel (subject to debate)
• They didn't share their code and data (Vital!)

This 2003 trial, done in Kenya, found that deworming whole schools improved children’s health, school performance, and school attendance.

In 2013, the data was reanalysed independently using new computer programs

Many mistakes found.

# Croucher's law

## You are no different!

### What you did

Open package foo. Click, Click, drag, Click, Click, Click, Right-Click, Save, 'results.csv'.

Load into Excel. Click, drag, generate graph, right click, save, 'pretty-graph.png'

### What you said

I analysed my data in foo using the bar analysis. Here's a graph of the results.

# Automate

aka 'learn to program'

The Ideal

Results = TheAnalysis(MyData)

Reality

## Problem

### I am an idiot and will make mistakes

(Partial) Solutions

• Automate (aka learn to program)

Write code in a (very) high-level language

• Python
• MATLAB
• R
• Mathematica
• Julia

## Why high level languages?

"Programmers write roughly the same number of lines of code per unit time regardless of the language they use" (Best Practices for Scientific Computing, PLOS Biology, Wilson Et Al)

1. Computer time is cheap. Programmer time is expensive.
2. We all have supercomputers now!
3. Ensure it's correct, then worry about speed.
4. If it's slow: Use a profiler to find the hot spots.
5. Call a Research Software Engineer (RSE) to help with the slow bits

## Problem

### I am an idiot and will make mistakes

(Partial) Solutions

• Automate (aka learn to program)
• Write code in a (very) high-level language
Two facts that, combined, worry me:

Scientists typically spend 30% or more of their time developing software

90% or more of them are primarily self-taught

Hannay JE, Langtangen HP, MacLeod C, Pfahl D, Singer J, et al.. (2009) How do scientists develop and use scientific software? In: Proceedings Second International Workshop on Software Engineering for Computational Science and Engineering. pp. 1–8. doi:10.1109/SECSE.2009.5069155.

Prabhu P, Jablin TB, Raman A, Zhang Y, Huang J, et al.. (2011) A survey of the practice of computational science. In: Proceedings 24th ACM/IEEE Conference on High Performance Computing, Networking, Storage and Analysis. pp. 19:1–19:12. doi:10.1145/2063348.2063374.

## Get some training

Just enough Software Engineering to Perform

## Problem

### I am an idiot and will make mistakes

(Partial) Solutions

• Automate (aka learn to program)
• Write code in a (very) high-level language
• Get some training

# Is this familiar?

• code_ver1.m
• code_ver1b_BROKEN.m
• code_ver1b_BROKEN_Working_march20.m
• code_ver1b_BROKEN_Working_march20_Bobs_mods_ForMike.m

## True Story

• Me: Can I see the code please?
• Them: I'll just get the changes from Bob folded in and email it
• Me: Shouldn't we be using version control?
• Them: No need - it's overkill. We don't have a VC problem.
• Me: The code you sent me doesn't work
• Them: Sorry. I sent the wrong version.

## Which version control system should you use?

I like and use 'git' but use whatever your colleagues are using.

## The version control life cycle

1. git? No thanks, I'm scared!
2. well this is handy.
3. we're not using git? I'm scared!

## Problem

### I am an idiot and will make mistakes

(Partial) Solutions

• Automate (aka learn to program)
• Write code in a (very) high-level language
• Get some training
• Use version control

## Get a code buddy (Maybe an RSE!)

Doesn't have to understand your research

Remit: Tell me where I could do better?

Problem 1: Get the code running on THEIR machine

## Research Software Engineer (RSE)

All good universities have a RSE team you can consult.

## Problem

### I am an idiot and will make mistakes

(Partial) Solutions

• Automate (aka learn to program)
• Write code in a (very) high-level language
• Get some training
• Use version control
• Get a code buddy (Maybe an RSE!)

## You've come so far...

• You can get your results by entering one command
• Your code buddy has seen your code -> Show it to the world

## Benefits

• It's the right thing to do
• Others will use, debug and enhance your work
• Others will reproduce and cite your work
• More opportunities to collaborate

## Literate computing

A Literate computing document IS the research

## Problem

### I am an idiot and will make mistakes

(Partial) Solutions

• Automate (aka learn to program)
• Write code in a (very) high-level language
• Get some training
• Use version control
• Get a code buddy (Maybe an RSE!)
• Share your code and data openly
• Use literate computing technologies

## Write tests

• Every decent language has a testing framework
• Learn how to use it (software carpentry)
• Tests give you confidence to make changes

\$ nosetests ./unittests.py
..............................
----------------------------------------------------------------------
Ran 30 tests in 0.152s

OK


## Problem

### I am an idiot and will make mistakes

(Partial) Solutions

• Automate (aka learn to program)
• Write code in a (very) high-level language
• Get some training
• Use version control
• Get a code buddy (Maybe an RSE!)
• Share your code and data openly
• Use literate computing technologies
• Write tests

## Code citation

We all cite papers -- papers contain ideas

## Code citation

Code is the implementation of those ideas

No!

## You are not alone!

Research Software Engineer