September 05, 2023

Pangeo and openEO offer a training session that will guide you through the Pangeo (http://pangeo.io/) and openEO (https://openeo.org/) ecosystems for developing efficient Big Earth science data pipelines. Emphasis will be put on the complementarities of the two ecosystems with a goal to teach attendees how to fully exploit both frameworks to run complex data workflows. Attendees will learn about open, reproducible, and scalable Earth science. The workshop is designed to help anyone interested in starting their journey with Pangeo and OpenEO while avoiding common pitfalls.

All the Python packages used during this training are Open-source. Sample datasets used in the tutorial are EO datasets that are freely available to everyone and can also be used for real scientific analysis.
The Training material is open-source (CC-BY-4) too.
The workshop aims at empowering attendees to learn new skills and build confidence in using them in their work. The tutorial will have work along with hands-on exercises to check the understanding of attendees. Multiple opportunities to ask questions and discuss with the Pangeo and openEO communities will be offered.

Expected audience:

This workshop will assume prior knowledge of the Python programming language and basics of Xarray. We recommend learners with no prior knowledge of Python or Xarray to get familiar with them, for instance using Software Carpentry training material (https://swcarpentry.github.io/python-novice-gapminder/), Project Pythia (https://foundations.projectpythia.org/core/xarray.html), the xarray tutorial (https://tutorial.xarray.dev), or Pangeo Galaxy Training material (https://training.galaxyproject.org/training-material/topics/climate/tutorials/pangeo-notebook/tutorial.html).

Agenda:
9:00 Welcome (5 minutes)
9:05 Introduction and Motivation (15 minutes)

Part-1: Pangeo
9:20 Overview of the Pangeo ecosystem (20 minutes)
9:40 Understanding Xarray to avoid common pitfalls (30 minutes)
10:10 Interactive Visualization with Hvplot (20 minutes)
10:30 Break (30 minutes)

Part-2: OpenEO (Intro Session)

11:00 Getting started with OpenEO (15 minutes) – EODC
• Intro presentation and questionnaire on who is in the audience

11:15 Finding Data, Running first graphs, difference to client-side processing – EODC / EURAC
• Logging in people and showing them around
• Notebook showing how to search data & find metadata
• Different Clients (R, Python & Web editor)
• Aggregate temporal period, Corine Landcover – Change detection , Tone Mapping
• Local / Client Side Processing

12:10 Integrate custom code into your workflow using User Defined Functions (30 minutes) – Sinergise / VITO
• Take over logged-in people showing them workflow creation and custom code introduction into their workflow
• Show viewing functionality in webeditor

12:30 Lunch

Part-3: Unlocking the Power of Space Data with Pangeo & OpenEO
14:00 Understanding what OpenEO does best and how to exploit it to easily streamline your data analysis (25 minutes)
14:25 Scaling with OpenEO (25 minutes)
14:50 Understanding when and how to exploit Pangeo to customise your algorithm and analyse multiple data sources (20 minutes)
15:10 Introduction to chunking (20 minutes)
15:30 Break
16:00 Scaling with Dask (30 minutes)
16:30 Cloud-friendly access to archival data with kerchunk (25 minutes)
16:55 Create Analysis Ready Cloud Optimised (ARCO) data (25 minutes)
17:20 Common workflow that combines the best of the two “worlds” (30 minutes)
17:50 Wrap-up and feedback survey (10 minutes)