Taming Big Data with Apache Spark and Python - Hands On!
PySpark tutorial with 40+ hands-on examples of analyzing large data sets on your desktop or on Hadoop with Python!
4.53 (17193 reviews)

108 923
students
9 hours
content
May 2025
last update
$124.99
regular price
What you will learn
Use DataFrames and Structured Streaming in Spark 3
Use the MLLib machine learning library to answer common data mining questions
Understand how Spark Streaming lets your process continuous streams of data in real time
Frame big data analysis problems as Spark problems
Use Amazon's Elastic MapReduce service to run your job on a cluster with Hadoop YARN
Install and run Apache Spark on a desktop computer or on a cluster
Use Spark's Resilient Distributed Datasets to process and analyze large data sets across many CPU's
Implement iterative algorithms such as breadth-first-search using Spark
Understand how Spark SQL lets you work with structured data
Tune and troubleshoot large jobs running on a cluster
Share information between nodes on a Spark cluster using broadcast variables and accumulators
Understand how the GraphX library helps with network analysis problems
Course Gallery




Loading charts...
Comidoc Review
Our Verdict
Taming Big Data with Apache Spark and Python - Hands On! presents students with a robust introduction to analyzing large data sets using essential PySpark features alongside Python. Despite minor issues, such as outdated content, the course successfully offers diverse coverage of numerous topics, ensuring an engaging learning experience for those seeking hands-on familiarity with Spark. Nevertheless, potential learners should be prepared to explore additional resources or supplementary materials in order to derive a more comprehensive understanding of some complex concepts and applications.
What We Liked
- Comprehensive coverage of key topics like DataFrames, Structured Streaming, MLLib, Spark SQL, and GraphX
- Comprised of over 40 hands-on examples, allowing learners to build practical skills in analyzing large data sets with Python
- In-depth exploration of installing, running, and tuning Apache Spark on both desktop computers and Hadoop clusters
- Instructor's pleasant voice and pace of presentation facilitates learning
Potential Drawbacks
- Outdated content as the course material hasn't been updated since 2025; specifically, new features in Spark v3.5+ like Pandas-on-Spark aren't discussed
- Insufficient depth in explanations of theory, particularly around distributed execution flow and specific algorithms as well as dev best practices for API sets/data types
- Lackluster machine learning (ML) examples that do not convincingly demonstrate Spark ML's power or clear value proposition
- Occasional difficulty in following instructions on setting the Python environment variable, which could be more clearly outlined
Related Topics
622414
udemy ID
25/09/2015
course created date
07/08/2019
course indexed date
Bot
course submited by