Open Data Science Initiative at Sheffield

Mike Croucher

EPSRC Research Software Engineering Fellow

www.walkingrandomly.com
@walkingrandomly
M.Croucher@Sheffield.ac.uk
Sheffield Open Data Science Initiative
Research Software Engineering at Sheffield

The idea of open data science is to:

  1. Make new analysis methodologies available as widely and rapidly as possible with as few conditions on their use as possible.
  2. Educate our commercial, scientific and medical partners in the use of these latest methodologies.
  3. Act to achieve a balance between data sharing for societal benefit and the right of an individual to own their data.

How do we achieve this?

The Data Hide

Open Data Science meet ups at Sheffield

The Data Hide

Open Data Science meet ups at Sheffield

The Data Hide

Open Data Science meet ups at Sheffield

The Data Hide

Open Data Science meet ups at Sheffield

Education

Software

2 Research Software Engineering Fellows in Sheffield

First Fellowship of its kind

Hiring soon for the OpenDreamKit post!

Jupyter notebook and High Performance Computing

Teaching Support

Jupyter notebook and SageMathCloud

A new way to teach computation

Online demo:

https://cloud.sagemath.com/projects/3999d983-e8aa-463c-869a-60500d95d82c/files/KronigPenney.ipynb

Jupyter: What is it?

  • Open
  • Text, maths, results and code combined
  • Interactive research papers
  • Interactive lecture notes
  • Conversion to .pdf, .html, etc is trivial
  • Pervasive computation

Jupyter: In use at Sheffield

Some benefits of Jupyter notebook

  • Frictionless code execution
  • Free
  • Works on Raspberry Pi to supercomputers.
  • All Operating systems.
  • Students can work anywhere, on any device, locally or in cloud

Collaboration between lecturer and RSE

  • Marta Milo (lecturer)
  • Mike Croucher (Research Software Engineer)

Bioinformatics for Biomedical Science

  • Undergraduate course
  • We'll use R in the notebook (R kernel very new at the time)
  • Teach students who've never coded before
  • Zero to Bioinformatics hero in 6 weeks!
  • Managed desktop hell

    The solution

    SageMathCloud Demo

    https://cloud.sagemath.com/projects

    SageMathCloud benefits

    • Jupyter with R, Python, Julia and Sage
    • Linux terminal access
    • Easy course administration
    • Students only need a browser
    • Open Source
    • Superb support
    • Automatic back-ups
    • Inexpensive
    • Students keep their work after the course

    Collaborative editing of notebooks

    Like Google-docs for the notebook!

    We ran the entire course in SageMathCloud

  • No need for Blackboard
  • No need for managed desktop
  • SageMathCloud usage

    https://github.com/sagemathinc/smc/wiki/Teaching

    The Future?

    • Fully interactive, computable lecture notes for the entire syllabus?
    • e.g. Fourier series

    Next steps

    • SageMathCloud seminar for lecturers
    • Jupyter on Iceberg (Sheffield HPC)