STFC School on Data Intensive Science 2020
from
Monday, 12 October 2020 (09:00)
to
Friday, 16 October 2020 (16:00)
Monday, 12 October 2020
09:00
Welcome and Introduction
-
Philip James
Carsten Welsch
(
University of Liverpool
)
Welcome and Introduction
Philip James
Carsten Welsch
(
University of Liverpool
)
09:00 - 09:15
09:15
No Silver Bullet -- Pitfalls and Limitations in Machine Learning
-
Kurt Rinnert
No Silver Bullet -- Pitfalls and Limitations in Machine Learning
Kurt Rinnert
09:15 - 10:00
10:00
Break
Break
10:00 - 10:30
10:30
10:30 - 12:30
Contributions
10:30
HEP NN training
-
Gregor Ksieczka
10:30
Scikit/Keras
-
Adrian Bevan
10:30
Introduction to ML
-
Meirin Oan Evans
12:30
Break
Break
12:30 - 13:30
13:30
continued
continued
13:30 - 15:30
Contributions
13:30
Scikit/Keras
-
Adrian Bevan
13:30
Introduction to ML
-
Meirin Oan Evans
13:30
HEP NN training
-
Gregor Ksieczka
Lisa Benato
15:30
Break
Break
15:30 - 15:45
15:45
Session 1 + 2
Session 1 + 2
15:45 - 17:00
Tuesday, 13 October 2020
09:00
Human-aware AI
-
Paulo Lisboa
Human-aware AI
Paulo Lisboa
09:00 - 10:00
Machine learning is often synonymous with predictive models of exceptional accuracy. In classification they are commonly evaluated with summary measures of predictive performance - but is this enough to validate a complex algorithm? Non-linear models will exploit any artefacts in the data, which can result in high performing models that are completely spurious. Examples of this will be shown. This leads onto to the need for a clear ontology of model interpretability, for model design and usability testing. This reinforces the emerging paradigm of AI not as a stand-alone oracle but as an interactive tool to generate insights by querying the data, sometimes called xAI or AI2.0 – AI with a person in the loop. In this talk, Professor Paulo Lisboa will describe how probabilistic machine learning models can be presented as similarity networks and how SVMs and neural networks generate simpler and transparent models including globally accurate representations with nomograms. Perhaps surprisingly, this can buck the accuracy/interpretability trade-off, by producing self-explaining neural networks that outperform black box models and match deep learning. The dependence on the main predictive variables will be made explicit for a range of benchmark data sets commonly used in the machine learning literature. Paulo Lisboa is Professor in the Applied Mathematics at Liverpool John Moores University, UK and Project Director for LCR Activate, an ERDF funded £5m project to accelerate the development of SMEs in the Digital Creative Sectors in the Liverpool City Region. His research focus is advanced data analysis for decision support, in particular with applications to personalised medicine and public health. His research group on data science has developed rigorous methodologies to make machine learning models interpretable by end users.
10:00
Break
Break
10:00 - 10:30
10:30
10:30 - 12:30
Contributions
10:30
Preparation of large datasets for machine learning
-
Isabell Melzer-Pellmann
10:30
Big Data Python ecosystem for HEP
-
Eduardo Rodrigues
10:30
Demystifying "Big Data"
-
Andy Newsam
12:30
Break
Break
12:30 - 13:30
13:30
continued
continued
13:30 - 15:30
Contributions
13:30
Preparation of large datasets for machine learning
-
Isabell Melzer-Pellmann
13:30
Big Data Python ecosystem for HEP
-
Eduardo Rodrigues
13:30
Demystifying "Big Data"
-
Andy Newsam
15:30
Break
Break
15:30 - 15:45
15:45
Session 3 + 4
Session 3 + 4
15:45 - 17:00
19:00
19:00 - 20:00
Wednesday, 14 October 2020
09:00
Making data work for you
-
Louise Butcher
Making data work for you
Louise Butcher
09:00 - 10:00
Explore how data-driven technologies can improve productivity and strengthen competitive advantage, as well as some practical tips on getting data ready to make it useable and useful. This talk will look at established data tools and techniques, and explore how they have been used in practice to solve business problems, including: Brief introduction to the Hartree Centre; The data science process in theory and reality; Data gathering: collection bias and open data; How to make your data work for you; Structuring data and creating value. Dr Louise Butcher is a Senior Data Scientist at the STFC Hartree Centre. Although working on all areas of data science and machine learning, Louise has a particular interest in the analysis of geospatial data including both satellite data and GPS/sensors. Projects have included analysing energy use for South West Water; analysing patient needs on discharge for Liverpool NHS Clinical Commissioning group; and improving GPS filters for mobile phone tacking for Glow Media. As part of a varied career to date, Louise previously worked at the University of Manchester on computer vision and face recognition, and founded a spin out company to exploit the technology.
10:00
Break
Break
10:00 - 10:30
10:30
10:30 - 12:30
Contributions
10:30
Virtual universes vs. the real thing
-
Andreea Font
Ian McCarthy
10:30
Git Demystified
-
Mark Dawson
12:30
Free afternoon
Free afternoon
12:30 - 16:00
20:00
20:00 - 22:00
Thursday, 15 October 2020
09:00
09:00 - 10:00
Contributions
09:00
Case Study 1
-
Edward Jones
09:20
Case Study 2
-
Paul Graham
09:40
Case Study 3
-
Sunil Mistry
10:00
Break
Break
10:00 - 10:30
10:30
Project Management
Project Management
10:30 - 12:30
12:30
Break
Break
12:30 - 13:30
13:30
International Collaboration
International Collaboration
13:30 - 15:30
15:30
Break
Break
15:30 - 16:00
16:00
16:00 - 18:00
18:20
Online Escape Room
Online Escape Room
18:20 - 20:00
Friday, 16 October 2020
09:00
09:00 - 12:15
12:15
A student's placement experience
-
Alexander Hill
A student's placement experience
Alexander Hill
12:15 - 12:30
12:30
Break
Break
12:30 - 13:00
13:00
13:00 - 13:30