The Dataset Digest
Believe it, or else

How to Tidy Data for Storage and Save Tables: A Quick Guide to Data Organization Best Practices
A comprehensive guide to organizing messy data for reliable analysis. Learn the fundamental principles of tidy data structure, strategic table naming conventions, essential documentation practices, and why spreadsheets create dangerous data integrity problems.
August 12, 2025
Should you be using DuckLake?
An introduction to DuckLake. Learn how it delivers faster queries, time-travel, and schema evolution while keeping data in Parquet. What trade-offs and best-fit scenarios should you expect?
June 14, 2025
The Hidden Cost of Scattered Flat Files
An in-depth look at spreadsheet sprawl: the chaos of scattered CSV and Excel files, and why a lightweight flat-file data catalog can restore trust, version control, and discoverability without forcing teams into a full database or data lake.
May 7, 2025
JSON, CSV, and Parquet: Guardians of Data
An exploration of why CSV, JSON, and Parquet remain essential in modern data work. Despite newer, high performance formats, these widely supported standards offer portability, simplicity, and collaboration benefits that empower smaller teams to stay competitive without enterprise level infrastructure.
Feb 9, 2025