Open Source Projects

PySpark

Best Practices for Unit Testing PySpark

June 2024

Pandas on Spark: Simplicity of Pandas with Efficiency of Spark

June 2024

Best Features of Delta Lake: Love Your Open Tables

June 2024

Building Lakehouse on Delta Lake

September 2023

Why Delta Lake is the best storage format for pandas analyses

June 2023

5 Reasons Parquet Files Are Better Than CSV for Data Analyses

October 2021

Optimizing Delta / Parquet Data Lakes

October 2019

Optimizing Delta Parquet Data Lakes for Apache Spark

April 2019

Matthew Powers is a Staff Developer Advocate at Databricks.

He focuses on blogging, social media, coding, and community development for Spark, Delta Lake, and Unity Catalog.

He tries to teach concepts in an easily digestable manner and focus on core concepts.

He likes separating usage guides from theory, so learners that just want to get their job done are not bogged down with the theory.

TODO