The data lake landscape is undergoing a fundamental transformation. Traditional Hive tables are giving way to a new generation of open table formats—Apache Iceberg, Apache Hudi, Delta Lake, and emerging contenders like DuckDB—each promising to solve the inherent challenges of managing massive datasets at scale.
But which format fits your architecture? This session cuts through the marketing noise to deliver practical insights for data architects and engineers navigating this critical decision. We’ll explore how these formats tackle schema evolution, time travel, ACID transactions, and metadata management differently, and what these differences mean for your data platform’s performance, reliability, and total cost of ownership.
Drawing from real-world implementations, you’ll discover the hidden complexities, unexpected benefits, and common pitfalls of each approach. Whether you’re modernizing legacy Hive infrastructure, building greenfield data lakes, or evaluating lakehouse architectures, you’ll leave with a clear framework for choosing and implementing the right open table format for your specific use case—and the confidence to justify that decision to stakeholders.
Highlights: