Platform for starters in the geo sector

200 courses, 20 online supports, 60 moocs,

20 professional teachers / tutors

from various countries

Course: Pyspark with Apache


The Pyspark with Apache lasts 2 days. We would like to receive an information request from you. Then we will inform you about the costs.

Geo-ICT Training Center, The Netherlands applies high group discounts for students abroad.

Apache Spark is a powerful, open-source processing engine for Big Data in the Hadoop cluster. With Spark it is possible to process datasets that differ in nature and source. The biggest advantages of Apache Spark are speed, ease of use, combining SQL, streaming and complex analyzes and the fact that data Spark can run anywhere. With the Python API, all kinds of actions can easily take place in Apache. Some basic knowledge of python is desirable but not required.

Content of Course: Pyspark with Apache

First of all, we will go into installing. Then an introduction to the framework is given. Then you will learn how to work with RDDs and HDFS. We also discuss parallel processing and building Spark applications. Finally, you will learn more about Spark streaming, Spark algorithms and improving the performance of the framework. In this Spark training / course you will learn how to develop Spark applications using Python. For example, you will learn how to test and deploy Spark applications to a cluster and how to subsequently monitor these clusters.

Learning objectives during this course:

  • Information request