training

Startgarantie

NL/EN

Apache Spark for Data Engineers Masterclass

Verdiep je kennis van Apache Spark om je dataworkflow te optimaliseren.

18 mei 2026

- Veenendaal / Remote

- Bekijk meer data

2 dagen

1610 (ex BTW)

Meld je aan

Beschrijving

In deze cursus leer je technieken en best practices voor het optimaliseren van Apache Spark-toepassingen. Je bestudeert de architectonische elementen van Spark en werkt met de Spark UI. Je identificeert en pakt veelvoorkomende prestatieproblemen veroorzaakt door shuffles en skew aan. Daarnaast leer je geavanceerde optimalisatiestrategieën voor join-, union- en merge-operaties, gegevensformaten, cachingmechanismen, garbage collector-instellingen, gegevenspartitionering, bucketing en Delta Lake-optimalisaties. Je verkent ook reguliere onderhoudstaken voor Spark-toepassingen en leert hoe je Spark-sessieconfiguraties kunt aanpassen voor optimale prestaties.

Leerdoelen

-Describe the architecture of a spark application.
+Remember
-Explain the structure and functionality of the Spark UI.
+Understand
-Predict common performance issues casued by shuffling and data skew.
+Apply
-Optimize join, union, and merge operations in Spark.
+Analyze
-Change the data format for optimal performance.
+Apply
-Implement caching mechanisms and garbage collector settings for enhanced performance.
+Apply
-Use data partitioning and bucketing in Spark workloads.
+Apply
-Apply Delta Lake optimizations for better performance in Spark.
+Apply
-Describe regular maintenance tasks for Spark applications.
+Understand
-Customize Spark session configurations for optimal performance.
+Apply

Voor bovenstaande leerdoelen gebruiken we de Taxonomie van Bloom

Benodigde voorkennis

Python
Apache Spark fundamentals

Onderwerpen

Introduction to Spark Architecture and Ecosystem
Understanding the Spark UI
Common Performance Issues in Spark
Optimizing Data Operations in Spark
Data Formats and Performance
Caching and Garbage Collection in Spark
Data Partitioning and Bucketing
Delta Lake Optimizations
Maintenance of Spark Applications
Customizing Spark Session Configurations

Introduction to Spark Architecture and Ecosystem

Overview of Spark architecture
Key components: Driver, Executors, Cluster Manager
The ecosystem: JVM, Kubernetes, Yarn, HDFS, Hive Metastore

Understanding the Spark UI

Structure of the Spark UI
Functionality of different tabs (Jobs, Stages, Storage, Environment, Executors)
Monitoring and diagnosing Spark applications

Common Performance Issues in Spark

Shuffles and Data Skew
Sorting
Narrow and Wide transformations

Optimizing Data Operations in Spark

Join operations: broadcast joins, shuffle joins
Union and merge operations

Data Formats and Performance

Common data formats such as json, csv and parquet
Impact of data format on performance
Making optimal use of data formats for Spark applications

Caching and Garbage Collection in Spark

Caching mechanisms in Spark (cache(), persist())
Data persistence
Garbage collection settings and their impact on performance

Data Partitioning and Bucketing

Partitioning strategies and impact in Spark
Bucketing techniques and their benefits

Delta Lake Optimizations

Introduction to Delta Lake
Performance optimization in Delta Lake
Delta Lake housekeeping

Maintenance of Spark Applications

Regular maintenance tasks for Spark applications
Monitoring and diagnostics tools

Customizing Spark Session Configurations

Spark session configurations and their impact on performance
Common spark session parameters
Customizing configurations for specific workloads

Lees meer

Planning

Startdatum	Duur	Locatie
18 mei 202619 mei 2026 Startgarantie	2 dagen	Veenendaal / Remote Dit is een hybride training die remote gevolgd kan worden. Meer informatie Veenendaal / Remote Dit is een hybride training die remote gevolgd kan worden. Meer informatie	Inschrijven

Incompany of persoonlijk advies nodig?

Onze opleidingsadviseurs denken graag met je mee om een persoonlijk advies te geven of een incompany training binnen jouw organisatie te vinden.

Trainers

Voorkennis trainingen

training - ASFStartgarantieNL/EN

Apache Spark Fundamentals

Leer data te verwerken met PySpark op Apache Spark

2 dagen
€ 1610
Klassikaal
11 mei 2026

Databases
Cloud
AI/Machine Learning

training - PYTHONDEVStartgarantieNL/EN

Python Fundamentals

Vorm een solide basis om software te ontwikkelen in Python

3 dagen
€ 2175
Klassikaal
12 november 2025

Python

"Deze training was direct toepasbaar op het project"

Cursist

Hoge waardering
Praktijkgerichte trainingen
Gecertificeerde trainers
Eigen docenten

training

Apache Spark for Data Engineers Masterclass

Beschrijving

Leerdoelen

Benodigde voorkennis

Onderwerpen

Introduction to Spark Architecture and Ecosystem

Understanding the Spark UI

Common Performance Issues in Spark

Optimizing Data Operations in Spark

Data Formats and Performance

Caching and Garbage Collection in Spark

Data Partitioning and Bucketing

Delta Lake Optimizations

Maintenance of Spark Applications

Customizing Spark Session Configurations

Planning

Incompany of persoonlijk advies nodig?

Trainers

Douwe van den Berg

Josquin Booij

Voorkennis trainingen

Apache Spark Fundamentals

Python Fundamentals

Blogs

Announcing the Stryker VS Code Plugin

arc42 chapter 9: Architectural decisions

Introducing the Microsoft Testing Platform runner for Stryker.NET

training

Apache Spark for Data Engineers Masterclass

Beschrijving

Leerdoelen

Benodigde voorkennis

Onderwerpen

Introduction to Spark Architecture and Ecosystem

Understanding the Spark UI

Common Performance Issues in Spark

Optimizing Data Operations in Spark

Data Formats and Performance

Caching and Garbage Collection in Spark

Data Partitioning and Bucketing

Delta Lake Optimizations

Maintenance of Spark Applications

Customizing Spark Session Configurations

Planning

Incompany of persoonlijk advies nodig?

Trainers

Douwe van den Berg

Josquin Booij

Voorkennis trainingen

Apache Spark Fundamentals

Python Fundamentals

Gerelateerde trainingen

Microsoft Fabric Data Engineer (DP-700)

Blogs

Announcing the Stryker VS Code Plugin

arc42 chapter 9: Architectural decisions

Introducing the Microsoft Testing Platform runner for Stryker.NET