Data Lakes on AWS
Designing a cost-effective lakehouse with S3, Glue and Athena
By Houssam Kodad
One-time purchase
€29.95
VAT included
where applicable
- Instant download after purchase
- Readable on any device
- Free updates to this edition
- Secure checkout
About this book
What's inside
AWS gives you a hundred ways to build a data platform and very little guidance on which to pick. This book lays out an opinionated, cost-effective lakehouse on S3, Glue and Athena, with Lake Formation for governance and open table formats for reliability. You'll learn the partitioning, file-format and security decisions that determine whether your lake stays fast and cheap or quietly becomes a swamp.
What you'll learn
Skills you'll walk away with
- Lay out S3 for performance, cost and lifecycle
- Catalog data with Glue and crawlers you can trust
- Query at scale with Athena and partition projection
- Adopt open table formats like Apache Iceberg
- Govern access with Lake Formation and IAM
- Build ETL with Glue jobs and orchestration
- Keep query and storage costs under control
Table of contents
9 chapters-
01
A Lakehouse on AWS
- · Lake vs warehouse vs lakehouse
- · The S3-Glue-Athena core
- · Where Redshift fits
-
02
Designing S3 Storage
- · Bucket and prefix layout
- · Partitioning strategies
- · Storage classes and lifecycle
-
03
File Formats and Compression
- · Parquet and columnar layout
- · File sizing and small-file pain
- · Compaction strategies
-
04
The Glue Data Catalog
- · Databases, tables and schemas
- · Crawlers vs explicit schemas
- · Schema evolution
-
05
Querying with Athena
- · SQL over the lake
- · Partition projection
- · Cost and performance tuning
-
06
Open Table Formats
- · Why Iceberg matters
- · Upserts, deletes and time travel
- · Migration considerations
-
07
ETL with Glue
- · Glue jobs and bookmarks
- · Spark on Glue
- · Orchestration with Step Functions
-
08
Governance with Lake Formation
- · Fine-grained access control
- · Row and column security
- · Cross-account sharing
-
09
Cost and Operations
- · Athena and storage cost levers
- · Monitoring and logging
- · A reliability checklist
This is the full chapter list — exactly what you'll receive in the PDF.
More in Cloud & Infrastructure