Date: Tuesday 8th October 2024 – 15:00 (Europe/London)
Speaker: Dr Mike Walmsley, Dunlap Fellow at the University of Toronto
Abstract
Neural scaling laws suggest that larger datasets enable larger models, which then act as generalisable “foundations” for fine-tuning on downstream tasks. But what does this mean for practitioners working with images and tasks that are wildly different to those found in generic pretraining datasets (Imagenet, JFT, etc.)? This talk will cover the development of foundation models in astronomy, applying the principles from computer science and adapting them for this new context. I will show how foundation models give astronomers new tools (similarity search, anomaly search, segmentation, etc). I will focus on the combination of foundation models with citizen science, particularly the Galaxy Zoo (galaxyzoo.org) project, which recruits tens of thousands of volunteers to annotate millions of galaxy images. What should Galaxy Zoo look like in a world with ever-more-capable models? I'll highlight our steps towards live volunteer-AI collaboration for finetuning new models in weeks rather than years.
Biography
Dr Mike Walmesley works on applying deep learning research breakthroughs to astrophysics. He focuses on combining crowdsourcing and deep learning to do better science than with either alone. Much of this work is as Technical Lead for citizen science project Galaxy Zoo. You can read more about his research projects here.
He is currently a Dunlap Fellow at the University of Toronto. He was previously a postdoc (PDRA) at the University of Manchester, supervised by Anna Scaife and working on the foundation model experiments that led to his Fellowship. He did his DPhil at Oxford, where he developed a state-of-the-art active learning approach to classify million galaxies with crowdsourcing and Bayesian convolutional neural networks (see this official TensorFlow blog). He also set a new benchmark at detecting merging galaxies. Before that, he worked at fintech startup Cytora using machine learning and messy open data to price insurance.