Predicting YouTube Video Viewership Using Multi-Feature Random Forest Modeling: A Case Study on the Warganet Life Official Channel
DOI:
https://doi.org/10.64472/jciet.v1i2.23Keywords:
youtube analytics, viewer prediction, random forest, machine learning, CRISP-DMAbstract
This study presents a viewer prediction model for the YouTube channel “Warganet Life Official” using the Random Forest algorithm and multi-feature engagement metrics obtained from YouTube Studio. The dataset includes impressions, likes, dislikes, shares, watch time, and subscriber changes, which were processed using the CRISP-DM framework. The model achieved its best performance under a 70:30 train–test split, producing a MAPE of 12.20%, an RMSE of 204,890.42. Random Forest outperformed Linear Regression and XGBoost baselines, confirming its suitability for modeling nonlinear engagement behavior in dynamic digital-media environments. The novelty of this work lies in its multi-feature, engagement-driven modeling applied to a large Southeast Asian entertainment channel, offering localized evidence for viewer-performance forecasting. Theoretically, this study strengthens recent findings that multi-modal engagement metrics yield more accurate digital-media performance predictions. Practically, the deployment of a Streamlit-based prediction tool enables creators to perform real-time content evaluation and early performance diagnostics, providing actionable insights for improving content strategies and long-term channel optimization.
Downloads
References
P. Chapman et al., “CRISP-DM 1.0: Step-by-step data mining guide,” IBM, 2020.
M. Ahmed, M. S. Khan, and R. Rony, “Machine Learning–Based Viewer Engagement Prediction for Online Video Platforms,” IEEE Access, vol. 12, pp. 11523–11538, 2024, doi: 10.1109/ACCESS.2024.3356721.
A. Gupta and S. Kumar, “Analyzing Nonlinear Audience Growth and Virality Patterns in Online Video Networks,” ACM Trans. Web, vol. 18, no. 2, pp. 1–25, 2024, doi: 10.1145/3641234.
D. R. Thomas and K. Lee, “Evaluating Regression Models for Social-Media Popularity Prediction: A Comparative Study of Linear, Tree-Based, and Boosting Methods,” Expert Syst. Appl., vol. 235, 2024, doi: 10.1016/j.eswa.2023.121234.
H. Liu, J. Park, and T. Chen, “Hyperparameter Optimization Strategies for Ensemble Learning Models in Large-Scale Prediction Tasks,” Information Sciences, vol. 661, pp. 119874, 2024, doi: 10.1016/j.ins.2023.119874.
Y. Zhao, B. Wu, and J. Luo, “Understanding Multi-Feature Engagement Metrics for Predictive Modeling in Digital Media Platforms,” IEEE Trans. Multimedia, vol. 26, pp. 4120–4134, 2024, doi: 10.1109/TMM.2023.3345678.
R. H. Pratama and P. H. Gunawan, “YouTube Viewership Prediction Using Facebook Prophet,” J. Media Inform. Budidarma, vol. 8, no. 1, pp. 383–392, 2024.
S. E. K. Sihombing, “Comparison of Multiple Linear Regression and Random Forest Regression for Information System Project Budget Forecasting,” J. Comput. Digital Business, vol. 3, no. 2, pp. 86–97, 2024.
Q. Balqis, S. Suryati, and M. Manalullaili, “The Role of YouTube in Digital Communication Behavior,” Journal of Digital Communication, vol. 1, no. 2, pp. 10–20, 2024.
D. Indrawan et al., “Deep Neural Network Model for YouTube Viewer Prediction,” JISICOM, vol. 5, no. 1, pp. 94–98, 2021.
F. Mukarromah and S. A. Putri, “Descriptive Analytics of YouTube Engagement Metrics: Case of Satu Persen Channel,” J. Mediakita, vol. 5, no. 2, pp. 130–146, 2021.
R. Lo et al., “Python-Based Modeling of Agricultural Media Quality Using Machine Learning,” J. Publ. Tek. Inform., vol. 2, no. 2, pp. 100–109, 2023.
A. S. T. Al Azhima et al., “Hybrid Machine Learning for Predictive Healthcare Analytics,” J. Teknol. Terpadu, vol. 8, no. 1, pp. 40–46, 2022.
M. N. Raza, “Naïve Bayes and Random Forest for Hoax Detection,” Pondasi, vol. 1, no. 2, pp. 43–57, 2024.
E. Riyanto and R. D. Amalia, “Comparison of The Accuracy of Predicting The Number of Positive COVID-19 Between The Neural Network and LSTM Methods,” in Proc. 2023 International Conference on Informatics, Multimedia, Cyber and Information Systems (ICIMCIS 2023), pp. 578–582, Nov. 2023. [Online]. Available: https://www.researchgate.net/publication/376548340_Comparison_of_The_Accuracy_of_Predicting_The_Number_Of_Positive_Covid-19_Between_The_Neural_Network_and_LSTM_Methods.
A. Utami and N. T. Hadi, “Anomaly Detection of Road Ranking Shifts Due to Traffic Accidents Using Deep Learning on Time Series Data”, Journal of Computing Innovations and Emerging Technologies, vol. 1, no. 1, pp. 21-25, 2025, doi : 10.64472/jciet.v1i1.5
Downloads
Published
Issue
Section
License
Copyright (c) 2025 Meiza Alliansa, Nur Hafifah Matondang, Rifka Dwi Amalia (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.
PROPOSED POLICY FOR JOURNALS OFFERING OPEN ACCESS
The following conditions must be fulfilled by the Authors:
-
Copyright and Licensing
Authors retain copyright and grant the journal the right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgment of the work’s authorship and initial publication in this journal. -
Non-Exclusive Distribution
Authors may enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal’s published version of the work (e.g., posting it to an institutional repository or publishing it in a book), with an acknowledgment of its initial publication in this journal. -
Online Posting and Early Sharing
Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their personal websites) prior to and during the submission process, as this can lead to productive exchanges, as well as earlier and increased citation of the published work. (See The Effect of Open Access).


