Classification and Prediction of Video Game Sales Levels Using the Naive Bayes Algorithm Based on Platform, Genre, and Regional Market Data

Rafi Pratama Putra, Nevita Cahaya Ramadani, Agi Nanjar

Abstract


The exponential expansion of the video game industry has resulted in a vast accumulation of market data that can be leveraged to analyze and predict sales performance. This study aims to construct a classification model for video game sales levels by applying the Naïve Bayes algorithm, recognized for its simplicity, efficiency, and strong baseline performance in supervised learning tasks. The research employs a public dataset containing over 13,000 video game entries, encompassing key attributes such as genre, platform, publisher, release year, user and critic ratings, and global sales figures. The target variable global sales was discretized into three categories: Low (<1 million units), Medium (1–5 million units), and High (>5 million units) to represent distinct tiers of commercial success. Prior to modeling, the dataset underwent a comprehensive preprocessing pipeline involving duplicate removal, handling of missing data, normalization of numerical attributes, and feature selection to ensure optimal model performance. The Multinomial Naïve Bayes classifier was then implemented and assessed using standard evaluation metrics, including accuracy, precision, recall, and F1-score. Experimental results revealed an accuracy of 71.82% and an F1-score of 70.03%, signifying strong predictive capability for a probabilistic model of this simplicity. The classifier effectively identified low and medium sales categories, though slightly underperformed on the high sales group due to class imbalance within the dataset. Further analysis of conditional probabilities indicated that game genre, platform popularity (especially PS2 and Wii), and critic scores were the most influential determinants of higher sales outcomes. These findings affirm that the Naïve Bayes algorithm provides a reliable and interpretable foundation for video game sales prediction, serving as a benchmark model in market analytics. Future studies are encouraged to address data imbalance through oversampling or synthetic data generation, incorporate contextual variables such as marketing strategies and release schedules, and explore ensemble or deep learning approaches to enhance predictive accuracy and robustness.


Article Metrics

Abstract: 12 Viewers PDF: 5 Viewers

Keywords


Naïve Bayes; Video Game Sales; Machine Learning; Classification; Data Imbalance; Feature Engineering; Predictive Modeling

Full Text:

PDF


Refbacks

  • There are currently no refbacks.



Barcode

IJIIS: International Journal of Informatics and Information Systems

ISSN:2579-7069 (Online)
Organized by:Departement of Information System, Universitas Amikom Purwokerto, IndonesiaFaculty of Computing and Information Science, Ain Shams University, Cairo, Egypt
Website:www.ijiis.org
Email:husniteja@uinjkt.ac.id (publication issues)
  taqwa@amikompurwokerto.ac.id (managing editor)
  contact@ijiis.org (technical & paper handling issues)

 This work is licensed under a Creative Commons Attribution-ShareAlike 4.0