{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "
\n", "

Big Data Python ecosystem for HEP analysis

\n", "

Eduardo Rodrigues
University of Liverpool

\n", "\n", "

STFC School on Data Intensive Science 2024, Liverpool, 14-19 July 2024

\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Abstract\n", "\n", "Data analysis in High Energy Physics (HEP) has evolved considerably in recent years, with \"Big Data\" tools being ever more used.\n", "Python as a programming language for analysis work is established and a HEP-specific ecosystem connecting well with the wider scientific Python ecosystem\n", "is both mature at this point and under continuous development.\n", "I will discuss HEP data as Big Data, Python and its analysis ecosystem provided by various community domain-specific projects.\n", "I will dwell in particular on the Scikit-HEP project, which I started in late 2016 with a few colleagues from various backgrounds and domains of expertise.\n", "It is now part of the official software stack of the experiments ATLAS, Belle II, CMS, KM3NeT and HCb.\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### **Aside intro - \"PyHEP community projects\"**\n", "\n", "A series of Python projects and software libraries have seen the light in the recent years, where by *projects* I select endeavours that provide one or more Python libraries *with a community around it*. \n", "Popular such projects are `Coffea`, `ComPWA`, `GooFit`, `Scikit-HEP` and `zfit`.\n", "\n", "**Scikit-HEP** is:\n", "- The one I co-founded in late 2016 with a few colleagues, hence intimately involved with.\n", "- The oldest of such projects.\n", "- The one with more libraries provided.\n", "- The project on which most other \"Big Data projects\" depend on (they depend on at least one of Scikit-HEP libraries).\n", "\n", "For these reasons I will be presenting some of the Scikit-HEP packages.\n", "\n", "---" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "## **The Scikit-HEP project**\n", "\n", "The scientific Python ecosystem can be organised, schematically, as a layered set of libraries and packages ever more specialised, from foundational and key libraries such as NumPy, Pandas and matplotlib, to domain-specific projects. In HEP we now also have our own \"ecosystem shell\". Looking from a Scikit-HEP centric perspective this ecosystem looks as follow:" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "
" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "Note that several packages or projects are part of the \"grand PyHEP ecosystem\". That's the case of `Coffea`, `GooFit` and `zfit`." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "### **Project topics and packages**\n", "\n", "Very many topics are addressed within the project!\n", "- Data manipulation and interoperability\n", "- Data aggregation and histogramming\n", "- Modeling and fitting\n", "- Statistics\n", "- Visualisation\n", "- HEP-specific utilities e.g. to deal with particles and decays\n", "- Simulation\n", "- Interoperability with HEP-specific libraries" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Here is an overview of the Scikit-HEP packages that are most popular and/or most actively used and maintained:" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "fragment" } }, "source": [ "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "A \"whetting your appetite\" mini gallery ...:\n", "\n", "\n", " \n", " \n", "\n", "
\n", "\n", "\n", " \n", " \n", "\n", "
" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### **For reference - some useful links collected in one place:**\n", "\n", "- Website: https://scikit-hep.org/\n", "- GitHub: https://github.com/scikit-hep/" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "---\n", "\n", "### **How to explore these lectures**\n", "\n", "- **The notebooks are topical and self-consistent for you to run though them at your own pace and leisure.\n", "Run what topics sound appealing to you ...**\n", "\n", "- You liked these tutorials? Consider dropping a line and/or giving the GitHub repository a star, as that's a trivial way to convey positive feedback." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "#### **The scikit-hep metapackage**\n", "\n", "The project has a special package, `scikit-hep`, which is a *metapackage*. Unlike all others, which target specific topics, this metapackage simply provides an easy way to have a compatible set of project packages installed via a simple `conda install scikit-hep` (or `pip install scikit-hep`) command.\n", "\n", "The Scikit-HEP packages used in these notebooks are in fact installed via the metapackage. It is trivial to check the available versions:" ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "slideshow": { "slide_type": "subslide" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "System:\n", " python: 3.12.4 | packaged by conda-forge | (main, Jun 17 2024, 10:04:44) [MSC v.1940 64 bit (AMD64)]\n", "executable: C:\\home\\sw\\anaconda3\\envs\\STFC_DIS_2024\\python.exe\n", " machine: Windows-10-10.0.19045-SP0\n", "\n", "Python dependencies:\n", "setuptools: 70.3.0\n", " pip: 24.0\n", " numpy: 1.26.4\n", " scipy: 1.14.0\n", " pandas: 2.2.2\n", "matplotlib: 3.9.1\n", "\n", "Scikit-HEP package version and dependencies:\n", " awkward: 2.6.6\n", "boost_histogram: 1.4.1\n", " decaylanguage: 0.18.3\n", " hepstats: 0.8.1\n", " hepunits: 2.3.4\n", " hist: 2.7.3\n", " histoprint: 2.4.0\n", " iminuit: 2.26.0\n", " mplhep: 0.3.50\n", " particle: 0.24.0\n", " pylhe: 0.8.0\n", " resample: 1.10.0\n", " skhep: None\n", " uproot: 5.3.10\n", " vector: 1.4.1\n" ] } ], "source": [ "import skhep\n", "skhep.show_versions()" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "subslide" } }, "source": [ "
\n", "THANK YOU\n", "\n", "to Hans Dembinski, Henry Schreiner, Jim Pivarski, Jonas Eschle and others for knowingly (or unknowingly) providing material and/or inspiration for these tutorial notebooks!\n", "
" ] } ], "metadata": { "celltoolbar": "Slideshow", "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.4" } }, "nbformat": 4, "nbformat_minor": 4 }