Spark and Python for Big Data with PySpark
Learn how to use Spark with Python, including Spark Streaming, Machine Learning, Spark 2.0 DataFrames and more!
4.51 (25460 reviews)

143 605
students
10.5 hours
content
May 2020
last update
$174.99
regular price
What you will learn
Use Python and Spark together to analyze Big Data
Learn how to use the new Spark 2.0 DataFrame Syntax
Work on Consulting Projects that mimic real world situations!
Classify Customer Churn with Logisitic Regression
Use Spark with Random Forests for Classification
Learn how to use Spark's Gradient Boosted Trees
Use Spark's MLlib to create Powerful Machine Learning Models
Learn about the DataBricks Platform!
Get set up on Amazon Web Services EC2 for Big Data Analysis
Learn how to use AWS Elastic MapReduce Service!
Learn how to leverage the power of Linux with a Spark Environment!
Create a Spam filter using Spark and Natural Language Processing!
Use Spark Streaming to Analyze Tweets in Real Time!
Course Gallery




Loading charts...
Comidoc Review
Our Verdict
Overall, this course is a great starting point to learn PySpark with in-depth hands-on examples and practical projects. However, be prepared for outdated content, particularly in certain installations and APIs that may require external resources for up-to-date information. Furthermore, the focus on machine learning and lack of emphasis on core Spark concepts can make this course feel mismatched, affecting its overall value.
What We Liked
- Comprehensive coverage of PySpark, including data manipulation and machine learning techniques
- Hands-on examples and practical projects that are useful for beginners
- Detailed explanations of concepts with a step-by-step approach
- Instructor goes the extra mile to ensure learners do not feel lost
Potential Drawbacks
- Outdated content, particularly in areas such as installing AWS EC2 and Databricks, using Twitter API for streaming, and working with DataFrames
- Lack of focus on core Spark concepts like master node and worker nodes
- Insufficient data pre-processing step approach, and reliance on complementary courses for RDDs, log files, etc.
- Fast pace may make it challenging to keep up and fully grasp the content
Related Topics
980798
udemy ID
10/10/2016
course created date
07/08/2019
course indexed date
Bot
course submited by