Taming Big Data with Apache Spark and Python - Hands On!

PySpark tutorial with 40+ hands-on examples of analyzing large data sets on your desktop or on Hadoop with Python!
4.53 (17193 reviews)
Udemy
platform
English
language
Data Science
category
Taming Big Data with Apache Spark and Python - Hands On!
108 923
students
9 hours
content
May 2025
last update
$124.99
regular price

What you will learn

Use DataFrames and Structured Streaming in Spark 3

Use the MLLib machine learning library to answer common data mining questions

Understand how Spark Streaming lets your process continuous streams of data in real time

Frame big data analysis problems as Spark problems

Use Amazon's Elastic MapReduce service to run your job on a cluster with Hadoop YARN

Install and run Apache Spark on a desktop computer or on a cluster

Use Spark's Resilient Distributed Datasets to process and analyze large data sets across many CPU's

Implement iterative algorithms such as breadth-first-search using Spark

Understand how Spark SQL lets you work with structured data

Tune and troubleshoot large jobs running on a cluster

Share information between nodes on a Spark cluster using broadcast variables and accumulators

Understand how the GraphX library helps with network analysis problems

Course Gallery

Taming Big Data with Apache Spark and Python - Hands On! – Screenshot 1
Screenshot 1Taming Big Data with Apache Spark and Python - Hands On!
Taming Big Data with Apache Spark and Python - Hands On! – Screenshot 2
Screenshot 2Taming Big Data with Apache Spark and Python - Hands On!
Taming Big Data with Apache Spark and Python - Hands On! – Screenshot 3
Screenshot 3Taming Big Data with Apache Spark and Python - Hands On!
Taming Big Data with Apache Spark and Python - Hands On! – Screenshot 4
Screenshot 4Taming Big Data with Apache Spark and Python - Hands On!

Loading charts...

Comidoc Review

Our Verdict

Taming Big Data with Apache Spark and Python - Hands On! presents students with a robust introduction to analyzing large data sets using essential PySpark features alongside Python. Despite minor issues, such as outdated content, the course successfully offers diverse coverage of numerous topics, ensuring an engaging learning experience for those seeking hands-on familiarity with Spark. Nevertheless, potential learners should be prepared to explore additional resources or supplementary materials in order to derive a more comprehensive understanding of some complex concepts and applications.

What We Liked

  • Comprehensive coverage of key topics like DataFrames, Structured Streaming, MLLib, Spark SQL, and GraphX
  • Comprised of over 40 hands-on examples, allowing learners to build practical skills in analyzing large data sets with Python
  • In-depth exploration of installing, running, and tuning Apache Spark on both desktop computers and Hadoop clusters
  • Instructor's pleasant voice and pace of presentation facilitates learning

Potential Drawbacks

  • Outdated content as the course material hasn't been updated since 2025; specifically, new features in Spark v3.5+ like Pandas-on-Spark aren't discussed
  • Insufficient depth in explanations of theory, particularly around distributed execution flow and specific algorithms as well as dev best practices for API sets/data types
  • Lackluster machine learning (ML) examples that do not convincingly demonstrate Spark ML's power or clear value proposition
  • Occasional difficulty in following instructions on setting the Python environment variable, which could be more clearly outlined
622414
udemy ID
25/09/2015
course created date
07/08/2019
course indexed date
Bot
course submited by