Is Your Research Software Correct?

Mike Croucher

EPSRC Research Software Engineering Fellow

www.walkingrandomly.com
@walkingrandomly
M.Croucher@Sheffield.ac.uk
Sheffield Research Software Engineering

Imagine...

Your results are amazing!

but wrong

Mike Konczal
“all I can hope is that future historians note that one of the core empirical points providing the intellectual foundation for the global move to austerity in the early 2010s was based on someone accidentally not updating a row formula in Excel”

What were the real errors?

  • They used Excel (subject to debate)
  • They didn't share their code and data (Vital!)
  • They didn't use good software practice

Further examples

We have a problem!

Croucher's law

I can be an idiot and WILL make mistakes.

You are no different!

Your Analysis?

What you did

Open package foo. Click, Click, drag, Click, Click, Click, Right-Click, Save, 'results.csv'.

Load into Excel. Click, drag, generate graph, right click, save, 'pretty-graph.png'

Your Analysis?

What you said

I analysed my data in foo using the bar analysis. Here's a graph of the results.

How reproducible is a mouse click?

Automate

aka 'learn to program'

The Ideal

Results = TheAnalysis(MyData)

Reality

Write code in a (very) high-level language

Some suggested languages

  • Python
  • MATLAB
  • R
  • Mathematica
  • Julia
  • GAP

Why high level languages?

"Programmers write roughly the same number of lines of code per unit time regardless of the language they use" (Best Practices for Scientific Computing, PLOS Biology, Wilson Et Al)
Two facts that, combined, worry me:

Scientists typically spend 30% or more of their time developing software

90% or more of them are primarily self-taught

Hannay JE, Langtangen HP, MacLeod C, Pfahl D, Singer J, et al.. (2009) How do scientists develop and use scientific software? In: Proceedings Second International Workshop on Software Engineering for Computational Science and Engineering. pp. 1–8. doi:10.1109/SECSE.2009.5069155.

Prabhu P, Jablin TB, Raman A, Zhang Y, Huang J, et al.. (2011) A survey of the practice of computational science. In: Proceedings 24th ACM/IEEE Conference on High Performance Computing, Networking, Storage and Analysis. pp. 19:1–19:12. doi:10.1145/2063348.2063374.

Get some training

Just enough Software Engineering to Perform

Use version control

Is this familiar?

  • code_ver1.m
  • code_ver1b_BROKEN.m
  • code_ver1b_BROKEN_Working_march20.m
  • code_ver1b_BROKEN_Working_march20_Bobs_mods_ForMike.m

Which version did the results come from?

What broke the code?

Taking you back to your happy place

True Story

  • Me: Can I see the code please?
  • Them: I'll just get the changes from Bob folded in and email it
  • Me: Shouldn't we be using version control?
  • Them: No need - it's overkill. We don't have a version control problem.
  • Me: The code you sent me doesn't work
  • Them: Sorry. I sent the wrong version.

Afraid to change your code?

Write tests

  • Every decent language has a testing framework
  • You write additional code that ensures your code gives the answers you expect
  • Tests give you confidence to make changes

Literate computing

Traditional academic papers are just advertisements

A Literate computing document IS the research

Literate computing technologies

Share your code and data openly

You've come so far...

  • You can get your results by entering one command
  • Your work is beautiful -> Show it to the world
  • Your code is in git -> upload to public github

Benefits

  • It's the right thing to do
  • Others will use, debug and enhance your work
  • Others will reproduce and cite your work
  • More opportunities to collaborate

Problem

I am an idiot and will make mistakes

(Partial) Solutions

  • Automate (aka learn to program)
  • Write code in a (very) high-level language
  • Get some training
  • Use version control
  • Get a code buddy (Maybe an RSE!)
  • Share your code and data openly
  • Use literate computing technologies
  • Write tests
  • Cite code

Is this enough?

No!

You are not alone!

Research Software Engineer

Full version of this talk: http://mikecroucher.github.io/MLPM_talk/