Sarcouncil Journal of Multidisciplinary
Sarcouncil Journal of Multidisciplinary
An Open access peer reviewed international Journal
Publication Frequency- Monthly
Publisher Name-SARC Publisher
ISSN Online- 2945-3445
Country of origin- PHILIPPINES
Frequency- 3.6
Language- English
Keywords
- Social sciences, Medical sciences, Engineering, Biology
Editors

Dr Hazim Abdul-Rahman
Associate Editor
Sarcouncil Journal of Applied Sciences

Entessar Al Jbawi
Associate Editor
Sarcouncil Journal of Multidisciplinary

Rishabh Rajesh Shanbhag
Associate Editor
Sarcouncil Journal of Engineering and Computer Sciences

Dr Md. Rezowan ur Rahman
Associate Editor
Sarcouncil Journal of Biomedical Sciences

Dr Ifeoma Christy
Associate Editor
Sarcouncil Journal of Entrepreneurship And Business Management
High Performance Read Operations in Merge-On-Read with Deletion Vectors in Apache Iceberg
Keywords: Apache Iceberg, Merge-on-Read, deletion vectors, data lakes, query optimization
Abstract: Data lakes have become essential infrastructure for modern analytics, but traditional Copy-on-Write (COW) memory management faces severe performance degradation at petabyte and zettabyte scales due to write amplification when handling updates and deletions. Apache Iceberg addresses these challenges through the Merge-on-Read (MOR) architecture with deletion vectors, which track deleted rows in compact bitmap structures rather than rewriting entire data files. This implementation separates metadata from data files, enabling efficient query planning while maintaining immutable data files, and applies deletion masks during query execution to filter deleted rows without physical data movement. Performance evaluation using TPC-DS and TPC-H benchmarks demonstrates that MOR with deletion vectors significantly reduces write overhead while maintaining acceptable read performance through intelligent caching, predicate pushdown optimizations, and adaptive compaction strategies. The architecture proves particularly effective for workloads with frequent small updates or scattered deletions, where COW would require expensive full file rewrites, making it suitable for modern cloud-native data lake deployments that demand both high write throughput and consistent analytical query performance.
Author
- Piyush Dubey
- University of Iowa USA