Apache Spark Development training course

A lightning-fast unified analytics engine for big data and machine learning

NEXT COURSE 9 December (2 days £1495 + VAT) BOOK NOW

JBI training course London UK

  • Understand the need for Spark in data processing
  • Understand the Spark architecture and how it distributes computations to cluster nodes
  • Be familiar with basic installation / setup / layout of Spark
  • Use the Spark for interactive and ad-hoc operations
  • Use Dataset/DataFrame/Spark SQL to efficiently process structured data
  • Understand basics of RDDs (Resilient Distributed Datasets), and data partitioning, pipelining, and computations
  • Understand Spark's data caching and its usage
  • Understand performance implications and optimizations when using Spark
  • Be familiar with Spark Graph Processing and SparkML machine learning

FULL COURSE DETAILS

Our Apache Spark training course provides students with a solid technical introduction to the Spark architecture and how Spark works.

Attendees learn the basic building blocks of Spark, including RDDs and the distributed compute engine, as well as higher-level constructs that provide a simpler and more capable interface, including Spark SQL and DataFrames.


FULL COURSE DETAILS
JBI training course London UK
JBI training course London UK

Python or Java/Scala developers who need to learn about how to develop Big Data and ML solutions with Apache Spark


FULL COURSE DETAILS

Related Courses

Module 1 - Introduction to Spark - Getting started

  1. What is Spark and what is its purpose?
  2.  Overview, Motivations, Spark Systems
  3. Spark Ecosystem
  4. Spark vs. Hadoop
  5. Typical Spark Deployment and Usage Environments
  6. Components of the Spark unified stack
  7. Resilient Distributed Dataset (RDD)
  8. Downloading and installing Spark standalone
  9. Python overview
  10. Launching and using the Python shell 

Module 2 - Resilient Distributed Dataset and DataFrames

  1. Understand how to create parallelized collections and external datasets
  2. Work with Resilient Distributed Dataset (RDD) operations
  3. Utilize shared variables and key-value pairs
  4. RDD Concepts, Partitions, Lifecycle, Lazy Evaluation
  5. Working with RDDs - Creating and Transforming (map, filter, etc.)
  6. Caching - Concepts, Storage Type, Guidelines
  7.  Introduction and Usage
  8. Creating and Using a DataSet
  9. Working with JSON
  10. Using the DataSet DSL
  11. Using SQL with Spark
  12. Data Formats
  13. Optimizations: Catalyst and Tungsten
  14. DataSets vs. DataFrames vs. RDDs

Module 3 - Spark application programming

  1. Understand the purpose and usage of the SparkContext
  2. Initialize Spark with the Python programming language
  3. Describe and run some Spark examples
  4. Pass functions to Spark
  5. Create and run a Spark standalone application
  6. Submit applications to the cluster
  7. Overview, Basic Driver Code, SparkConf
  8. Creating and Using a SparkContext/SparkSession
  9. Building and Running Applications
  10. Application Lifecycle
  11. Cluster Managers
  12. Logging and Debugging

Module 4 - Introduction to Spark libraries

  1. Understand and use the various Spark libraries

Module 5 - Spark configuration, monitoring and tuning

  1. Understand components of the Spark cluster
  2. Configure Spark to modify the Spark properties, environmental variables, or logging properties
  3. Monitor Spark using the web UIs, metrics, and external instrumentation
  4. Understand performance tuning considerations
  5. The Spark UI
  6. Narrow vs. Wide Dependencies
  7. Minimizing Data Processing and Shuffling
  8. Caching - Concepts, Storage Type, Guidelines
  9. Using Caching
  10. Using Broadcast Variables and Accumulators

Module 6 - Spark STREAMING (optional)

  1. Overview and Streaming Basics
  2. Structured Streaming
  3. DStreams (Discretized Steams),
  4. Architecture, Stateless, Stateful, and Windowed Transformations
  5. Spark Streaming API
  6. Programming and Transformations
 
Course Updates & Newsletter
 
 

Receive the latest version of this course by email & subscribe to our Newsletter



CONTACT
0800 028 6400

enquiries@jbinternational.co.uk

SHARE

Corporate Policies     Terms & Conditions
JB International Training Ltd  -  Company number 08458005

Registered address 1345 High Road, London, N20 9HR