Toward a Modular, Low-Latency Architecture with BERT-based Big Media Data Analysis

Widyawan Widyawan, Handoko Wisnu Murti, Guntur Dharma Putra, Eddy Nurmanto, Achmad Affandi

Abstract


The significant growth of digital and social media platforms has introduced massive streams of unstructured media data. However, current big data approaches are not specifically tailored to the high volume and velocity of media data, which consists of unstructured and lengthy full-text messages. This study proposes a modular and stream-oriented big data architecture for media data. The proposed architecture consists of data crawlers, a message broker, machine learning modules, persistent storage, and analytical dashboards, with a publish-subscribe communication pattern to enable asynchronous, decoupled data processing. The system integrates IndoBERT, a transformer-based model fine-tuned for the Indonesian language, enabling real-time semantic tagging within the streaming pipeline. The proposed solution has been implemented as a prototype using open-source technologies in an on-premise cluster. As such, the primary novelty is the successful integration and operationalization of a large, transformer-based language model (IndoBERT) within a low-latency streaming pipeline. The experimental results underscore the feasibility of deploying scalable, vendor-neutral media analytics platforms for institutions with high sensitivity to privacy and cost. Architectural quality is quantitatively evaluated through Martin's Instability Metric and Coupling Between Objects (CBO), confirming high modularity across components. The system demonstrates an end-to-end latency of 3.121 seconds, deep learning latency of 2.333 seconds, and processes 32,102 messages per day, making an explicit trade-off where the 2.333-second deep learning inference provides advanced semantic depth. This study presents a reference architecture for scalable, intelligent real-time media analytics systems that support public sector and academic deployments, requiring data privacy and control over infrastructure.

Keywords


big media data; modular architecture; latency; BERT

Full Text:

Link Download

References


Ani Petrosyan. (2025, February 6). Share of users worldwide accessing the Internet in 3rd quarter 2024, by device. Statista. https://www-statista-com.ezproxy.ugm.ac.id/statistics/1289755/internet-access-by-device-worldwide/

Apache. (2025). Apache Kafka. https://kafka.apache.org/

Carbone, P., Katsifodimos, A., Kth, †, Sweden, S., Ewen, S., Markl, V., Haridi, S., & Tzoumas, K. (2015). Apache FlinkTM: Stream and Batch Processing in a Single Engine.

DB-Engines. (2025). DB-Engines Ranking of Search Engines. https://db-engines.com/en/ranking/search+engine

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRR, abs/1810.04805. http://arxiv.org/abs/1810.04805

Dobbelaere, P., & Esmaili, K. S. (2017). Industry paper: Kafka versus RabbitMQ: A comparative study of two industry reference publish/subscribe implementations. DEBS 2017 - Proceedings of the 11th ACM International Conference on Distributed Event-Based Systems, 227–238. https://doi.org/10.1145/3093742.3093908

Elisa Shearer. (2021, January 12). More than eight-in-ten Americans get news from digital devices By. PewResearch. https://www.pewresearch.org/short-reads/2021/01/12/more-than-eight-in-ten-americans-get-news-from-digital-devices/

Essaidi, A., & Bellafkih, M. (2023). A New Big Data Architecture for Analysis: The Challenges on Social Media. IJACSA) International Journal of Advanced Computer Science and Applications, 14(3). www.ijacsa.thesai.org

Faaique, M. (2023). Overview of Big Data Analytics in Modern Astronomy. International Journal of Mathematics, Statistics, and Computer Science, 2, 96–113. https://doi.org/10.59543/ijmscs.v2i.8561

G, K., & Annabel, S. P. (2025). A survey on big data classification. Data and Knowledge Engineering, 156. https://doi.org/10.1016/j.datak.2025.102408

Grafana Labs. (2025). Grafana. https://grafana.com/

Guo, M. (2024). Predictors of Mobile News Consumption through News Applications (Apps): The Impacts of Audience Characteristics, Media Usage, and Motivations. Journalism and Media, 5(3), 1071–1084. https://doi.org/10.3390/journalmedia5030068

Hector, D.-L., Chavoya, A., & Hernandez-Ochoa, M. (2024). The Role of Machine Learning in Big Data Analytics: Current Practices and Challenges. In F. and M. G. J. and D.-L. H. Mora Manuel and Wang (Ed.), Development Methodologies for Big Data Analytics Systems: Plan-driven, Agile, Hybrid, Lightweight Approaches (pp. 47–74). Springer International Publishing. https://doi.org/10.1007/978-3-031-40956-1_2

Karimov, J., Rabl, T., Katsifodimos, A., Samarev, R., Heiskanen, H., & Markl, V. (2018). Benchmarking distributed stream data processing systems. Proceedings - IEEE 34th International Conference on Data Engineering, ICDE 2018, 1519–1530. https://doi.org/10.1109/ICDE.2018.00169

Koto, F., Rahimi, A., Lau, J. H., & Baldwin, T. (2020). IndoLEM and IndoBERT: A Benchmark Dataset and Pre-trained Language Model for Indonesian NLP. https://arxiv.org/abs/2011.00677

Martin, R. C., & October. (1997). OO Design Quality Metrics. https://api.semanticscholar.org/CorpusID:18246616

Marx, J., & Cheong, M. (2023). Decentralised Social Media: Scoping Review and Future Research Directions. ACIS 2023 Proceedings.

Muhunzi, D., Kitambala, L., & Mashauri, H. L. (2024). Big data analytics in the healthcare sector: Opportunities and challenges in developing countries. A literature review. In Health informatics journal (Vol. 30, Issue 4). https://doi.org/10.1177/14604582241294217

Nti, I. K., Quarcoo, J. A., Aning, J., & Fosu, G. K. (2022). A mini-review of machine learning in big data analytics: Applications, challenges, and prospects. In Big Data Mining and Analytics (Vol. 5, Issue 2, pp. 81–97). Tsinghua University Press. https://doi.org/10.26599/BDMA.2021.9020028

Pal, G., Li, G., & Atkinson, K. (2018). Multi-agent big-data lambda architecture model for E-commerce analytics. Data, 3(4). https://doi.org/10.3390/data3040058

Penka, J. B. N., Mahmoudi, S., & Debauche, O. (2022). An Optimized Kappa Architecture for IoT Data Management in Smart Farming. Journal of Ubiquitous Systems and Pervasive Networks, 17(2). https://doi.org/10.5383/juspn.17.02.002

RabbitMQ. (2025). RabbitMQ, One broker to queue them all. https://www.rabbitmq.com/

Rahul, K., Banyal, R. K., & Arora, N. (2023). A systematic review on big data applications and scope for industrial processing and healthcare sectors. Journal of Big Data, 10(1). https://doi.org/10.1186/s40537-023-00808-2

Roy Debashish and Srivastava, R. and J. M. and K. M. S. (2022). A Complete Overview of Analytics Techniques: Descriptive, Predictive, and Prescriptive. In T. and H.-P. D. and S. T. P. and A. S. Jeyanthi P. Mary and Choudhury (Ed.), Decision Intelligence Analytics and the Implementation of Strategic Business Management (pp. 15–30). Springer International Publishing. https://doi.org/10.1007/978-3-030-82763-2_2

Saleh, A., Morabito, R., Dustdar, S., Tarkoma, S., Pirttikangas, S., & Lovén, L. (2025). Towards Message Brokers for Generative AI: Survey, Challenges, and Opportunities. ACM Comput. Surv. https://doi.org/10.1145/3742891

Sandhu, A. K. (2022). Big Data with Cloud Computing: Discussions and Challenges. Big Data Mining and Analytics, 5(1). https://doi.org/10.26599/BDMA.2021.9020016

Sang, V. M., Thanh, T. N. P., Gia, H. N., Quoc, D. N., Long, K. Le, & Yen, V. P. T. (2024). Impact of user-generated content in digital platforms on purchase intention: the mediator role of user emotion in the electronic product industry. Cogent Business & Management, 11(1), 2414860. https://doi.org/10.1080/23311975.2024.2414860

Sangeeta Rani. (2025). Tools and techniques for real-time data processing: A review. International Journal of Science and Research Archive, 14(1), 1872–1881. https://doi.org/10.30574/ijsra.2025.14.1.0252

Sas, D., Avgeriou, P., & Uyumaz, U. (2022). On the evolution and impact of architectural smells—an industrial case study. Empirical Software Engineering, 27(4). https://doi.org/10.1007/s10664-022-10132-7

Shahnawaz, M., & Kumar, M. (2025). A Comprehensive Survey on Big Data Analytics: Characteristics, Tools and Techniques. In ACM Computing Surveys (Vol. 57, Issue 8, pp. 1–33). Association for Computing Machinery. https://doi.org/10.1145/3718364

Shatnawi, R. (2010). A quantitative investigation of the acceptable risk levels of object-oriented metrics in open-source systems. IEEE Transactions on Software Engineering, 36(2), 216–225. https://doi.org/10.1109/TSE.2010.9

Statista. (2024). Number of social media users worldwide from 2017 to 2028.

Sundarakumar, M. R., Mahadevan, G., Natchadalingam, R., Karthikeyan, G., Ashok, J., Manoharan, J. S., Sathya, V., & Velmurugadass, P. (2023). A comprehensive study and review of tuning the performance on database scalability in big data analytics. Journal of Intelligent & Fuzzy Systems, 44(3), 5231–5255. https://doi.org/10.3233/JIFS-223295

Vance, T. C., Huang, T., & Butler, K. A. (2024). Big data in Earth science: Emerging practice and promise. Science, 383(6688), eadh9607. https://doi.org/10.1126/science.adh9607

Venable, J., Pries-Heje, J., & Baskerville, R. (2016). FEDS: a Framework for Evaluation in Design Science Research. European Journal of Information Systems, 25(1), 77–89. https://doi.org/10.1057/ejis.2014.36

Zhang, H., & Song, M. (2022). How Big Data Analytics, AI, and Social Media Marketing Research Boost Market Orientation. Research-Technology Management, 65(2), 64–70. https://doi.org/10.1080/08956308.2022.2022907




DOI: http://dx.doi.org/10.35671/telematika.v18i2.3151

Refbacks

  • There are currently no refbacks.


 



Indexed by:

   

Telematika
ISSN: 2442-4528 (online) | ISSN: 1979-925X (print)
Published by : Universitas Amikom Purwokerto
Jl. Let. Jend. POL SUMARTO Watumas, Purwonegoro - Purwokerto, Indonesia


Creative Commons License This work is licensed under a Creative Commons Attribution 4.0 International License .