course

Class is guaranteed to run

NL/EN

Apache Spark Fundamentals

Name: Apache Spark Fundamentals
Price: 1530 EUR

Get started processing data with Apache Spark and PySpark

September 8, 2025

- Utrecht / Remote

- View more dates

2 days

1530 (excl. VAT)

Description

With the rise of cloud computing, distributed storage and (big) data processing, many organisations are starting to use Apache Spark for their data processes. Whether it is for data science, data analysis or data engineering, Apache Spark can be the right tool for the job. It is a foundation under Azure Synapse Analytics, Microsoft Fabric and Databricks.

This training aims to walk you through the fundamentals of working with Apache Spark, starting with what it is and how it works. You will then continue to read, transform and write data using PySpark.

Finally, to make sure your code can be safely used in production, there will be an added focus on using development best practices.

Prior Knowledge

Python Development

Subjects

1: About Spark

What is Spark, where did it come from, why was it created? And how does it work?

Lessons

History of Apache Spark
Technical Architecture (Driver, Cluster Manager, Executors)
RDD and Dataframe
Pyspark
Benefits of using Spark
Running Spark locally

After completing this module, students will be able to:

Explain how Spark works

2: Reading Data

To work with data, we first need to retrieve it from wherever it is located. This is done through spark.read.

Lessons

spark.read
read options
read modes
Using regex in the filepath(s)

Lab

Read your first files in Spark

After completing this module, students will be able to:

Read data using PySpark

3: Transforming Data

After retrieving our data we need to perform transformations on it. Operations such as joins, filters, grouping, aggregating, splitting and renaming are necessary in most data pipelines. How do they work in Spark?

Lessons

Filtering
Narrow and broad transformations
Column operations
JSON transformations
Window functions
UDF and Lambdas

Lab

Perform transformations with PySpark

After completing this module, students will be able to:

Transform data using PySpark

4: Writing Data

After completing the necessary transformations in memory, it is time to write our data to our target location. This may sound like a plain operation, but there are things to consider such as file formats and partitioning.

Lessons

Common file formats
Apache Parquet
Delta Lake
Data partitioning
Bucketing

Lab

Write data with PySpark, with partitions and buckets

After completing this module, students will be able to:

Write data using PySpark

5: Development Best Practices

All we need to do with data is reading, transforming and writing it. But the code we use to do that needs to be maintained. For this, we need to use development best practices. Some of them are general, others are specific to Apache Spark.

Lessons

Notebooks for Development, python files for production
Modularization
Logging
Error Handling
Testing
Continuous Integration

Lab

Read, clean, transform and write data using development best practices for production ready code

After completing this module, students will be able to:

Write PySpark code following development best practices

Codedocent

View all episodes

Schedule

Start date	Duration	Location
September 8, 2025September 9, 2025 Class is guaranteed to run	2 days	Utrecht / Remote This is a hybrid training and can be followed remotely. More information Utrecht / Remote This is a hybrid training and can be followed remotely. More information	Sign up
November 13, 2025November 14, 2025 Class is guaranteed to run	2 days	Veenendaal / Remote This is a hybrid training and can be followed remotely. More information Veenendaal / Remote This is a hybrid training and can be followed remotely. More information	Sign up

All courses can also be conducted within your organization as customized or incompany training.

Our training advisors are happy to help you provide personal advice or find Incompany training within your organization.

Trainers

Douwe van den Berg

Hello! I am Douwe van den Berg, trainer at the Knowledge Center of Info Support. In 2017, I started providing training in the field of data and artificial intelligence. Over time, I have expanded this to include responsibility for our curriculum in this area and contributing to future developments within our technology area Data&AI. I find it incredibly valuable that within Info Support, and especially within the technology area, I can exchange ideas with colleagues who are working on exciting projects for our clients. Through this exchange of knowledge and experience, we all improve, including our training programs. It’s clear that I find happiness in data and the solutions we can create based on it. This can be as complex as possible, but I am equally delighted by a simple graph that provides insight into seasonal effects, for example. However, the most rewarding aspect is when the participants of my training have truly learned something and can move forward with the knowledge I have provided them. I provide training in the areas of SQL, SQL Server, Python, data modeling, Power BI, Azure data and AI solutions, machine learning, Databricks, and Spark. In addition to my work at Info Support, I enjoy cycling and playing board and card games with my friends.

Prior knowledge courses

course - PYTHONDEVNL/EN

Essentials of Python Development

Attain a solid foundation of Python for developing software solutions

3 days
€ 2070
Classroom
September 17, 2025

Python

Follow-up courses

course - DP700NL/EN

Microsoft Fabric Data Engineer (DP-700)

Learn methods and practices to implement data engineering solutions by using Microsoft Fabric

4 days
€ 3060
Classroom
September 30, 2025

Cloud

course - SPARKADVClass is guaranteed to runNL/EN

Advanced Apache Spark for Data Engineers

Get a deeper understanding of Apache Spark in order to optimize your data workflow.

2 days
€ 1530
Classroom
September 29, 2025

Cloud

"Trainer who knows his profession!"

Marc

Hoge waardering
Praktijkgerichte trainingen
Gecertificeerde trainers
Eigen docenten

course

Apache Spark Fundamentals

Description

Prior Knowledge

Subjects

1: About Spark

2: Reading Data

3: Transforming Data

4: Writing Data

5: Development Best Practices

Codedocent

Schedule

All courses can also be conducted within your organization as customized or incompany training.

Trainers

Douwe van den Berg

Prior knowledge courses

Essentials of Python Development

Follow-up courses

Microsoft Fabric Data Engineer (DP-700)

Advanced Apache Spark for Data Engineers

Blogs

Building a CLI with Quarkus, Kotlin and GraalVM

Coding, Thinking and Adapting: My Take-Aways from Devoxx Poland 2025

Slimmer Werken met Generatieve AI als Way of Working Professional