Open Source Projects

PySpark

ProjectDownloadsLast commitAll time downloads
quinnPyPI - DownloadsLast commitDownloads
chispaPyPI - DownloadsLast commitDownloads
mackPyPI - DownloadsLast commitDownloads
cejaPyPI - DownloadsLast commitDownloads
beavisPyPI - DownloadsLast commitDownloads
farsantePyPI - DownloadsLast commitDownloads
unicron
erenLast commit

Scala Spark

Style guides

Ruby / Rails

Blogs

Documentation contributions

Talks

Best Practices for Unit Testing PySpark

June 2024

Pandas on Spark: Simplicity of Pandas with Efficiency of Spark

June 2024

Best Features of Delta Lake: Love Your Open Tables

June 2024

Building Lakehouse on Delta Lake

September 2023

Why Delta Lake is the best storage format for pandas analyses

June 2023

5 Reasons Parquet Files Are Better Than CSV for Data Analyses

October 2021

Optimizing Delta / Parquet Data Lakes

October 2019

Optimizing Delta Parquet Data Lakes for Apache Spark

April 2019

Matthew Powers Bio

Short

Matthew Powers is a Staff Developer Advocate at Databricks.

He focuses on blogging, social media, coding, and community development for Spark, Delta Lake, and Unity Catalog.

He tries to teach concepts in an easily digestable manner and focus on core concepts.

He likes separating usage guides from theory, so learners that just want to get their job done are not bogged down with the theory.

Full

TODO