A Lambda Lakehouse Architecture Bridging Streaming and Batch Intelligence in Volatile and Scalable Financial Data Processing

Keywords: Big Data, Lambda Architecture, Lakehouse, Delta Lake, Real-Time Analytics, Bitcoin Forecasting, GRU, ARNN, LSTM, XGboost

Abstract

The vast growth of digital financial market data necessitates new kinds of analytical infrastructure which can process large volumes of data continuously, while maintaining reliability for use over extended periods as part of a long-term historical processing requirement. Batch based platforms have difficulty meeting both these needs, whereas pure streaming platforms often sacrifice analytical consistency with respect to their analysis. To address this limitation our paper proposes a Unified Lambda-Lakehouse Architecture which allows Real-Time and Batch Processing to be performed together in a single, ACID compliant. Apache Kafka captures live Bitcoin markets and performs the real-time processing via Spark Structured Streaming, while the periodic storage of historical records and subsequent periodic reprocessing of those records is accomplished via Amazon S3. Ultimately both the real-time and batch processing paths converge at a Delta Lakehouse; thereby enabling schema enforcement, versioning, and time-travel queries. The proposed architecture places the emphasis on combining the Speed Layer, Batch Layer, and Serving Layer into a single operational workflow atop a transactional Lakehouse foundation. Advanced predictive models including LSTM, GRU, ARNN, and XGBoost are used to forecast Bitcoin prices at daily, hourly, and minute granularities. Results from experiments indicate that the LSTM model consistently produced the best results (RMSE = 2383.9, 539.3, 144.9) at the three respective levels.
Published
2025-12-21
How to Cite
Maatallah, M., Fariss, M., Asaidi, H., & Bellouki, M. (2025). A Lambda Lakehouse Architecture Bridging Streaming and Batch Intelligence in Volatile and Scalable Financial Data Processing. Statistics, Optimization & Information Computing. https://doi.org/10.19139/soic-2310-5070-3222
Section
Research Articles