Why choose us?

Apache Spark Development training course

A lightning-fast unified analytics engine for big data and machine learning

4.8 out of 5 average

(see feedback)


Our Apache Spark training course provides you with a solid technical introduction to the Spark architecture and how Spark works.

You will learn the basic building blocks of Spark, including RDDs and the distributed compute engine, as well as higher-level constructs that provide a simpler and more capable interface, including Spark SQL and DataFrames.

Click to get a quote
Next on 20th Jun 2022

JBI training course London UK

  • Understand the need for Spark in data processing
  • Understand the Spark architecture and how it distributes computations to cluster nodes
  • Become familiar with basic installation/setup/layout of Spark
  • Use Spark for interactive and ad-hoc operations
  • Use DataSet/DataFrame/Spark SQL to efficiently process structured data
  • Understand the basics of RDDs (Resilient Distributed Datasets), data partitioning, pipelining and computations
  • Understand performance implications and optimisations when using Spark
  • Understand Spark's data caching and usage
  • Become familiar with Spark Graph Processing and SparkML machine learning
Enquire & get a quote
Next on 20 Jun - see prices

Module 1 - Introduction to Spark - Getting started

  1. What is Spark and what is its purpose?
  2.  Overview, Motivations, Spark Systems
  3. Spark Ecosystem
  4. Spark vs. Hadoop
  5. Typical Spark Deployment and Usage Environments
  6. Components of the Spark unified stack
  7. Resilient Distributed Dataset (RDD)
  8. Downloading and installing Spark standalone
  9. Python overview
  10. Launching and using the Python shell 

Module 2 - Resilient Distributed Dataset and DataFrames

  1. Understand how to create parallelized collections and external datasets
  2. Work with Resilient Distributed Dataset (RDD) operations
  3. Utilize shared variables and key-value pairs
  4. RDD Concepts, Partitions, Lifecycle, Lazy Evaluation
  5. Working with RDDs - Creating and Transforming (map, filter, etc.)
  6. Caching - Concepts, Storage Type, Guidelines
  7.  Introduction and Usage
  8. Creating and Using a DataSet
  9. Working with JSON
  10. Using the DataSet DSL
  11. Using SQL with Spark
  12. Data Formats
  13. Optimizations: Catalyst and Tungsten
  14. DataSets vs. DataFrames vs. RDDs

Module 3 - Spark application programming

  1. Understand the purpose and usage of the SparkContext
  2. Initialize Spark with the Python programming language
  3. Describe and run some Spark examples
  4. Pass functions to Spark
  5. Create and run a Spark standalone application
  6. Submit applications to the cluster
  7. Overview, Basic Driver Code, SparkConf
  8. Creating and Using a SparkContext/SparkSession
  9. Building and Running Applications
  10. Application Lifecycle
  11. Cluster Managers
  12. Logging and Debugging

Module 4 - Introduction to Spark libraries

  1. Understand and use the various Spark libraries

Module 5 - Spark configuration, monitoring and tuning

  1. Understand components of the Spark cluster
  2. Configure Spark to modify the Spark properties, environmental variables, or logging properties
  3. Monitor Spark using the web UIs, metrics, and external instrumentation
  4. Understand performance tuning considerations
  5. The Spark UI
  6. Narrow vs. Wide Dependencies
  7. Minimizing Data Processing and Shuffling
  8. Caching - Concepts, Storage Type, Guidelines
  9. Using Caching
  10. Using Broadcast Variables and Accumulators

Module 6 - Spark STREAMING (optional)

  1. Overview and Streaming Basics
  2. Structured Streaming
  3. DStreams (Discretized Steams),
  4. Architecture, Stateless, Stateful, and Windowed Transformations
  5. Spark Streaming API
  6. Programming and Transformations
Enquire & get a quote
Next on 20 Jun - see prices
JBI training course London UK

Python or Java/Scala developers who need to learn about how to develop Big Data and ML solutions with Apache Spark

Enquire & get a quote
Next on 20 Jun - see prices

4.8 out of 5 average

"Good introduction to Apache Spark. The trainer was great at talking us through the information, specifically optimisation methods. He spoke slowly and concisely which really got his points across. He effectively tailored the course to our specifications which we also appreciated."

RL, Financial Crime Technologist, Apache Spark, April 2021

Enquire & get a quote
Next on 20 Jun - see prices
JBI training course London UK
Tech Updates Newsletter

Receive Tech Updates directly to your inbox

+44 (0)20 8446 7555



Corporate Policies     Terms & Conditions
JB International Training Ltd  -  Company number 08458005

Registered address Wohl Enterprise Hub 2B Redbourne Avenue London N3 2BS


AI & ML training course                                                                  React training course

Threat modelling training course   Python for data analysts training course

Power BI training course                                   Machine Learning training course

Spring Boot Microservices training course              Terraform training course

Kubernetes training course                                                           C++ training course

Power Automate training course                              Clean Code training course