Get hands on with Python and PySpark to build your first data pipeline. In this video I walk you through how to read, transform, and write the NYC Taxi dataset which can be found on Databricks, Azure Synapse, or downloaded from the web to wherever you run Apache Spark.
Once you have watched and followed along with this tutorial, go find a free dataset and try to write your own PySpark application. Pro tip: Search for the Spark equivalent of functions you use in other programming languages (including SQL). Many will exist in the pyspark.sql.functions module.
All thoughts and opinions are my own *
For links to the code and more information on this course, you can visit my website: https://dustinvannoy.com/2023/05/01/f...
More from Dustin:
Website: https://dustinvannoy.com
LinkedIn: / dustinvannoy
Github: https://github.com/datakickstart
CHAPTERS
00:00 Intro
0:58 Python key syntax
14:32 PySpark data pipeline (notebook)
31:12 PySpark locally
36:27 Outro