Open Source Projects
PySpark
Scala Spark
Style guides
Ruby / Rails
Blogs
Documentation contributions
Talks
Best Practices for Unit Testing PySpark
June 2024
Pandas on Spark: Simplicity of Pandas with Efficiency of Spark
June 2024
Best Features of Delta Lake: Love Your Open Tables
June 2024
Building Lakehouse on Delta Lake
September 2023
Why Delta Lake is the best storage format for pandas analyses
June 2023
5 Reasons Parquet Files Are Better Than CSV for Data Analyses
October 2021
Optimizing Delta / Parquet Data Lakes
October 2019
Optimizing Delta Parquet Data Lakes for Apache Spark
April 2019
Matthew Powers Bio
Short
Matthew Powers is a Staff Developer Advocate at Databricks.
He focuses on blogging, social media, coding, and community development for Spark, Delta Lake, and Unity Catalog.
He tries to teach concepts in an easily digestable manner and focus on core concepts.
He likes separating usage guides from theory, so learners that just want to get their job done are not bogged down with the theory.
Full
TODO