Apache Spark: first steps

ESCAPE data science summer school 2021: Apache Spark

Welcome to the Apache Spark lecture! Please, read carefuly the instruction at this page before starting the lecture as it contains materials and important information to set it up.

Session 1

I will review the landscape of cluster computing by addressing some of the most pressing questions today: what is cluster computing? What does it mean working in a distributed environment? What are the data and computing challenges that the scientific community is facing nowadays, and how can we tackle those? Some useful concepts like functional programming and implicit parallelisation will be discussed. I will also introduce Apache Spark, a cluster computing framework for analysing large datasets that proved successful in the industry. I will specifically focus on the Apache Spark SQL module and DataFrames API, and we will start practicing through a series of simple exercises.

Session 2

In this session, we will use the Apache Spark Python API (PySpark) and learn on concrete examples how to interface and play with popular scientific libraries (Numpy, Pandas, …). We will also see how to test and debug a code written with Spark, and integrate it in a Continuous Integration pipeline.

Session 3

For the last session, we will finish with concrete applications in the domain of astronomy: catalog & image manipulation, machine learning and streaming data (if time permits).

Videos

Part 1

Part 2

The program of the school is devoted to project development for astrophysics, astroparticle physics & particle physics. The aim of the school is to provide theoretical and hands-on training on Data Science and Python development.

INDICO

Agenda and registration of the school

escape-EU

This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 824064 (ESCAPE, the European Science Cluster of Astronomy & Particle Physics ESFRI Research Infrastructures).

Our Pledge

In the interest of fostering an open and welcoming environment, we as organisers, attendants, students and instructors pledge to make participation in our project and our community a harassment-free experience for everyone, regardless of age, body size, disability, ethnicity, gender identity and expression, level of experience, education, socio-economic status, nationality, personal appearance, race, religion, or sexual identity and orientation.

Our Standards

Examples of behaviour that contributes to creating a positive environment include:

Using welcoming and inclusive language
Being respectful of differing viewpoints and experiences
Gracefully accepting constructive criticism
Focusing on what is best for the school and the community
Showing empathy towards others

Examples of unacceptable behaviour by participants include:

The use of sexualised language or imagery and unwelcome sexual attention or advances
Trolling, insulting/derogatory comments, and personal or political attacks
Public or private harassment
Publishing others' private information, such as a physical or electronic address, without explicit permission
Other conduct which could reasonably be considered inappropriate in a professional and academic setting

…continue in next column…

Attribution

This Code of Conduct is adapted from the Contributor Covenant.

Our Responsibilities

Organisers are responsible for clarifying acceptable behaviour standards and are expected to take appropriate and fair corrective action in response to any instances of unacceptable behaviour.

Organisers have the right and responsibility to remove, edit, or reject comments, commits, code, changes, issues, and other contributions that are not aligned to this Code of Conduct, or to ban temporarily or permanently any participant for other behaviours that they deem inappropriate, threatening, offensive, or harmful.

Scope

This Code of Conduct applies both within the school, project spaces, and public spaces when an individual represents the project or its community. Examples of representing a project or community include using an official project e-mail address, post via an official social media account, or act as an appointed representative at an online or offline event. Representation of a project may be further defined and clarified by the organisers.

Enforcement

Instances of abusive, harassing, or otherwise, unacceptable behaviour may be reported by contacting the organisers at escape_school@lapp.in2p3.fr.

All complaints will be reviewed and investigated and will result in a deemed necessary and appropriate response to the circumstances. The organisers' team is obligated to maintain confidentiality concerning the reporter of an incident. Further details of specific enforcement policies may be posted separately.

Instructors and organisers team members who do not follow or enforce the Code of Conduct in good faith may face temporary or permanent repercussions as determined by other members of the project’s leadership.

Indico: https://indico.in2p3.fr/event/20306/

GitHub: https://github.com/escape2020/school2021

Slack: https://escape-data-school.slack.com/ (by invitation only)

YouTube live: https://www.youtube.com/channel/UC05braEQdP2rCSUamHm9I_Q/live

GatherTown: https://gather.town/app/Rww2ZWwsxiA2Usz3/ESCAPE%20school (by invitation only)