Apache Spark and Scala Training and Certification

  • Course Duration50 Hrs.
  • Course ModeInstructor Led Training
  • Course Fee₹ 8100

About The Course

AICouncil certification and training will let you to create Spark applications using Scala programming. A clear difference between Spark and Hadoop can be understood with this course with the concept of Spark customization using Scala.The learning leads to creacte high-speed processing applications ising Sparkk RDDs.The course is designed as per Cloudera Hadoop and Spark Developer Certification Exam (CCA175) requirements. Complete Spark ecosystem consists of Spark RDD, Spark SQL, Spark MLlib and Spark Streaming will be covered along with Scala Programming language, HDFS, Sqoop, Flume, Spark GraphX and Messaging System such as Kafka.

Key Features

Instructor–led training

Highly interactive instructor-led training

Free lifetime access to recorded classes

Get lifetime access of all recored classes in your profile

Regular assignment and assessments

Real-time projects after every module

Lifetime accessibility

Lifetime access and free upgrade to the latest version

3 Years of technical support

Lifetime 24/7 technical support and query resolution

Globally Recognized Certification

Get global industry-recognized certifications


  • Programming concept of Apache Spark and Scala
  • Implementations of Scala
  • Cluster implementation
  • Spark application using Python, java and Scala
  • RDD operations
  • Defining Spark streaming
  • Project development using Scala

Mode of Learning and Duration

  • Weekdays – 5 to 6 weeks
  • Weekend – 7 to 8 weeks
  • FastTrack – 3 to 4 weeks
  • Weekdays – 5 to 6 weeks
  • Weekend – 7 to 8 weeks
  • FastTrack – 3 to 4 weeks


Course Agenda

Scala Course Agenda

  • Introducing Scala and deployment of Scala for Big Data applications and Apache Spark analytics
  • Scala REPL and Lazy Values
  • Control Structures in Scala
  • Directed Acyclic Graph (DAG)
  • First Spark Application Using SBT/Eclipse
  • Spark Web UI and Spark in Hadoop Ecosystem.
  • The importance of Scala
  • The concept of REPL (Read Evaluate Print Loop)
  • Deep dive into Scala pattern matching
  • Type interface, higher-order function, currying, traits
  • Application space and Scala for data analysis
  • Learning about the Scala Interpreter
  • Static object timer in Scala and testing string equality in Scala
  • Implicit classes in Scala
  • The concept of currying in Scala and various classes in Scala
  • Functional Programming
  • Higher Order Functions
  • Anonymous Functions
  • Learning about the Classes concept
  • Getters and Setters
  • Custom Getters and Setters
  • Properties with only Getters
  • Understanding the constructor overloading
  • Various abstract classes and the hierarchy types in Scala
  • The concept of object equality and the val and var methods in Scala
  • Understanding sealed traits, wild, constructor, tuple, variable pattern and constant pattern
  • Understanding traits in Scala
  • The advantages of traits
  • Linearization of traits, the Java equivalent, and avoiding of boilerplate code
  • Implementation of traits in Scala and Java and handling of multiple traits extending
  • Introduction to Scala collections
  • Classification of collections
  • The difference between Iterator and Iterable in Scala and example of list sequence in Scala
  • The two types of collections in Scala
  • Mutable and Immutable collections
  • Understanding lists and arrays in Scala
  • The list buffer and array buffer
  • Queue in Scala and double-ended queue Deque, Stacks
  • Sets, Maps and Tuples in Scala
  • Introduction to Scala packages and imports
  • The selective imports
  • The Scala test classes
  • Introduction to JUnit test class
  • JUnit interface via JUnit 3 suite for Scala test
  • Packaging of Scala applications in Directory Structure and examples of Spark Split and Spark Scala

Apache Spark

  • What is Big Data?
  • Big Data Customer Scenarios
  • Limitations and Solutions of Existing Data Analytics
  • Architecture with Uber Use Case
  • How Hadoop Solves the Big Data Problem?
  • What is Hadoop?
  • Hadoop’s Key Characteristics
  • Hadoop Ecosystem and HDFS
  • Hadoop Core Components
  • Rack Awareness and Block Replication
  • YARN and its Advantage
  • Hadoop Cluster and its Architecture
  • Hadoop: Different Cluster Modes
  • Big Data Analytics with Batch & Real-time Processing
  • Why Spark is needed?
  • What is Spark?
  • How Spark differs from other frameworks?
  • Introduction to Spark and how Spark overcomes the drawbacks of working MapReduce
  • Understanding in-memory MapReduce, interactive operations on MapReduce
  • Spark stack, fine vs. coarse-grained update
  • Spark stack and Spark Hadoop YARN
  • HDFS Revision
  • YARN Revision
  • The overview of Spark and how it is better Hadoop
  • Deploying Spark without Hadoop
  • Spark history server and Cloudera distribution
  • Spark installation guide and Spark configuration
  • Memory management
  • Executor memory vs. driver memory and working with Spark Shell
  • The concept of resilient distributed datasets (RDD)
  • Learning to do functional programming in Spark and the architecture of Spark
  • Spark RDD and creating RDDs
  • RDD partitioning, operations, and transformation in RDD
  • Deep dive into Spark RDDs
  • The RDD general operations, a read-only partitioned collection of records, using the concept of RDD for faster and efficient data processing
  • RDD action for collect, count, collects map, save-as-text-files and pair RDD functions Aggregating Data with Pair RDDs
  • Understanding the concept of Key-Value pair in RDDs
  • Learning how Spark makes MapReduce operations faster
  • Various operations of RDD
  • MapReduce interactive operations
  • Fine and coarse-grained update and Spark stack
  • Comparing the Spark applications with Spark Shell
  • Creating a Spark application using Scala or Java
  • Deploying a Spark application
  • Scala built application
  • Creation of mutable list
  • Set and set operations, list, tuple, concatenating list
  • Creating application using SBT
  • Deploying application using Maven
  • The web user interface of Spark application, a real-world example of Spark and configuring of Spark
  • Learning about Spark parallel processing
  • Deploying on a cluster
  • Introduction to Spark partitions
  • File-based partitioning of RDDs
  • Understanding of HDFS and data locality
  • Mastering the technique of parallel operations
  • Comparing repartition and coalesce and RDD actions
  • The execution flow in Spark, understanding the RDD persistence overview
  • Spark execution flow, and Spark terminology
  • Distribution shared memory vs. RDD and RDD limitations
  • Spark shell arguments and Distributed persistence
  • RDD lineage
  • Key-Value pair for sorting implicit conversions like CountByKey, ReduceByKey, SortByKey and AggregateByKey
  • Introduction to Machine Learning
  • Types of Machine Learning, introduction to MLlib
  • Various ML algorithms supported by MLlib
  • Linear Regression and Logistic Regression
  • Decision Tree and Random Forest
  • K-means clustering techniques, building a Recommendation Engine
  • Why Kafka and What is Kafka
  • Kafka architecture
  • Kafka workflow
  • Configuring Kafka cluster
  • Basic operations
  • Kafka monitoring tools
  • Integrating Apache Flume and Apache Kafka
  • Introduction to Spark Streaming
  • Features of Spark Streaming
  • Spark Streaming workflow
  • Initializing StreamingContext
  • Discretized Streams (DStreams)
  • Input DStreams and Receivers, transformations on Dstreams
  • Output Operations on Dstreams
  • Windowed Operators and why it is useful
  • Important Windowed Operators and Stateful Operators.
  • Introduction to various variables in Spark like shared variables and broadcast variables
  • Learning about accumulators
  • The common performance issues and troubleshooting the performance problems
  • Learning about Spark SQL
  • The context of SQL in Spark for providing structured data processing
  • JSON support in Spark SQL
  • Working with XML data
  • Parquet files
  • Creating Hive context and writing Data Frame to Hive
  • Reading JDBC files and understanding the Data Frames in Spark
  • Creating Data Frames
  • Manual inferring of schema
  • Working with CSV files and reading JDBC tables, Data Frame to JDBC
  • User-defined functions in Spark SQL
  • Shared variables and accumulators
  • Learning to query and transform data in Data Frames
  • How Data Frame provides the benefit of both Spark RDD and Spark SQL and deploying
  • Hive on Spark as the execution engine
  • Learning about the scheduling and partitioning in Spark
  • Hash partition and range partition
  • Scheduling within and around applications
  • Static partitioning and dynamic sharing
  • Fair scheduling and Map partition with index, the Zip, GroupByKey
  • Spark master high availability
  • Standby masters with ZooKeeper
  • Single-node Recovery with Local File System and High Order Functions



Industry: - Digital news portal

Problem Statement: -Provide personalized news pages to web visitors over bing or yahoo

Platform like yahoo or bing usually work to provide very personalized experience to the user based on their likes or dislikes. For example yahoo uses Machine Learning based algorithm running on Spark to figure out what individual users are intersted in reading, along with categorizing news articles which helps in figuring out about what sort of users could be interested in reading. To do same 120 Lines of Spark ML algorithm required written in scala. We will let our learner to develop such algos to make them business ready.

Industry: - Entertainment

Problem Statement: - Develop a movie recommendation system for user

Here we will deploy Apache spark for recommending movie to user.This project will let you gain more hands on experince over Machine Learning Library, MLlib by deploying collaborative filtering, clustering, regression, and dimensionality reduction. At the end you will have experience with streaming data, sampling, testing and statistics.

Industry: - Data analysis

Problem Statement: - Develop an interactive real time analytics tool for user

Apache spark also supports interactive analysis in addition to bundle of features. It process exploratory queries without sampling which results into faster processing. You can read APi very easily with interactive data analysis.It is available in Scala. MapReduce is made to handle batch processing and SQl on Hadoop engines which are usually slow.As a result if you are having live data for identification queries it can perform very fast. With structured streaming web analytics can be performed by allowing client to run user friendly query with visitors.

Industry: - Miscllaneous

Problem Statement: -Use wikipedia data set for data exploratory

This project will give hands on experince on Spark Sql.It will implementind and practised by combining it with ETL applications, real-time analysis of data, performing batch analysis, deploying machine learning, creating visualizations and processing of graphs.You can use twitter or Wikipedia data set.



Career Support

We have a dedicated team which is taking care of our learners learning objectives.


There is no such prerequisite if you are enrolling for Master’s Course as everything will start from scratch. Whether you are a working IT professional or a fresher you will find a course well planned and designed to incorporate trainee from various professional backgrounds.

AI Council offers 24/7 query resolution, you can raise a ticket with a dedicated support team and expect a revert within 24 Hrs. Email support can resolve all your query but if still it wasn’t resolved then we can schedule one-on-one session with our instructor or dedicated team. You can even contact our support after completing the training as well. There are no limits on number of tickets raised.
AI council provide two different modes for training one can choose for instructor lead training or learning with prerecorded video on demand. We also offer faculty development programs for college and schools. apart from this corporate training for organization/companies to enhance and update technical skills of the employees. We have highly qualified trainers who are working in the training industry from a very long time and have delivered the sessions and training for top colleges/schools and companies.
We are providing a 24/7 assistance for the ease of the student. Any query can be raised through the interface itself as well as can be communicated through email also. If someone is facing difficulties with above methods mentioned above we can arrange a one on one session with the trainer to help you with difficulties faced in learning. You can raise the query throughout the total training period as well as after the completion of the training.
AI Council offers you the latest, appropriate and most importantly the real-world projects throughout your training period. This makes student to gain industry level experience and converting the learning’s into solution to create the projects. Each Training Module is having Task or projects designed for the students so that you can evaluate your learning’s. You will be working on projects related to different industries such as marketing, e-commerce, automation, sales etc.
Yes, we do provide the job assistance so that a learner can apply for a job directly after the completion of the training. We have tied-ups with companies so when required we refers our students to those companies for interviews. Our team will help you to build a good resume and will trained you for your job interview.
After the successful completion of the training program and the submission of assignments/quiz, projects you have to secure at least B grade in qualifying exam, AI Council certified certificate will be awarded to you. Every certificate will be having a unique number through which same can be verified on our site.
To be very professional and transparent No, we don’t guarantee the job. the job assistance will help to provide you an opportunity to grab a dream job. The selection totally depends upon the performance of the candidate in the interview and the demand of the recruiter.
Our most of the programs are having both the modes of training i.e. instructor led and self-paced. One can choose any of the modes depending upon their work schedule. We provide flexibility to choose the type of training modes. While registering for courses you will be asked to submit your preference to select any of the modes. If any of the course is not offered in both modes so you can check in which mode, the training is going on and then you can register for the same. In any case if you feel you need any other training mode you can contact our team.
Yes, definitely you can opt for multiple courses at a time. We provide flexible timings. If you are having a desire for learning different topics while continuing with your daily hectic schedule our course timing and modes will help you a lot to carry on the learning’s.
Whenever you are enrolling in any of the courses we will send the notification you on your contact details. You will be provided with unique registration id and after successful enrollment all of the courses will be added to your account profile on our website.AI Council provides lifetime access to course content whenever needed.
A Capstone project is an outcome of the culminating learning throughout the academic years. It is the final project that represents your knowledge, efforts in the field of educational learning. It can be chosen by the mentor or by the students to come with a solution.
Yes, for obtaining the certificate of diploma programmer you have to submit the capstone project.