Apache Spark 2.0 with Java -Learn Spark from a Big Data Guru
Learn analyzing large data sets with Apache Spark by 10+ hands-on examples. Take your big data skills to the next level.
4.49 (3327 reviews)

23 252
students
3.5 hours
content
Sep 2018
last update
$13.99
regular price
What you will learn
An overview of the architecture of Apache Spark.
Work with Apache Spark's primary abstraction, resilient distributed datasets(RDDs) to process and analyze large data sets.
Develop Apache Spark 2.0 applications using RDD transformations and actions and Spark SQL.
Scale up Spark applications on a Hadoop YARN cluster through Amazon's Elastic MapReduce service.
Analyze structured and semi-structured data using Datasets and DataFrames, and develop a thorough understanding about Spark SQL.
Share information across different nodes on a Apache Spark cluster by broadcast variables and accumulators.
Advanced techniques to optimize and tune Apache Spark jobs by partitioning, caching and persisting RDDs.
Best practices of working with Apache Spark in the field.
Course Gallery




Loading charts...
Comidoc Review
Our Verdict
This course has helped many students grasping the basics of Apache Spark through solid theoretical foundations paired with hands-on examples. Although some aspects—such as RDDs—may feel outdated due to technology advancements, the curriculum still remains relevant for providing a firm introduction. Additionally, it's hard to ignore the wealth of information provided around scaling Spark applications and utilizing SQL with DataFrames. For beginners looking to build their skills in handling big data from scratch, this course offers an affordable starting point.
What We Liked
- Comprehensive coverage of Apache Spark's core concepts and features, tackling analyzing large data sets through extensive hands-on examples.
- Detailed explanations of working with Resilient Distributed Datasets (RDDs), DataFrames, and Spark SQL to process structured and semi-structured data.
- Optimization techniques for fine-tuning and scaling up Apache Spark jobs with YARN clusters and Amazon's Elastic MapReduce service.
Potential Drawbacks
- Lacks coverage of new features in version 3.1, and some content might seem dated, especially around RDD usage.
- Minimal practical exercises on the SQL section to consolidate understanding.
- Author presentation is via text-to-speech which can sometimes feel impersonal for learners who prefer human interaction.
- A few students have reported unanswered questions and lack of real-world scenario implementations.
Related Topics
1328642
udemy ID
22/08/2017
course created date
21/08/2019
course indexed date
Bot
course submited by