Predicting Airline Passenger Satisfaction with Classification Algorithms

Airline businesses around the world have been destroyed by Covid-19 as most international air travel has been banned. Almost all airlines around the world suffer losses, due to being prohibited from carrying out aviation transportation activities which are their biggest source of income. In fact, several airlines such as Thai Airways have filed for bankruptcy. Nonetheless, after the storm ends, demand for air travel is expected to spike as people return for holidays abroad. The research is aimed at analyzing the competition in the aviation industry and what factors are the keys to its success. This study uses several classification models such as KNN, Logistic Regression, Gaussian NB, Decision Trees and Random Forest which will later be compared. The results of this study get the Random Forest Algorithm using a threshold of 0.7 to get an accuracy of 99% and an important factor in getting customer satisfaction is the Inflight Wi-Fi Service.


Introduction
The service industry has now replaced manufacturing as the most important field of the global economy [1]. The service sector accounts for about 60% of annual GDP and almost 70% of new employment in the United States, making it the global leader in service industry growth [2]. To enhance service efficiency, several studies have analyzed service quality and attempted to define the factors that affect customer satisfaction and loyalty in different industries [2]. Even if the airline industry has historically had a strong degree of competition, which has forced airlines to work desperately to find ways to boost the efficiency of their services in order to achieve a competitive edge, there is a lot of study on the service quality of the aviation industry.
Since passengers prefer to judge airlines based on their level of satisfaction with in-flight services, in-flight services from flight attendants and plane facilities are the most direct to consumers (Park et al. 2004). As a result, improving the quality of in-flight service is one of the driving factors for an airline's success; in-flight food service, in particular, is a significant determinant of in-flight service. Several significant previous research in the aviation industry have attempted to classify service quality influences [2]. However, there are few observational research on the value of in-flight service efficiency. The aim of this analysis was to establish the value of in-flight service efficiency, with an emphasis on food and beverage services, in increasing passenger happiness and loyalty in airline airlines [3].
In this paper, the research will begin, we present a brief literature review on general service quality and airline services. Then, we present a research methodology process using several classification algorithms to analyze the most important factors / features of the dataset & perform comparisons of the accuracy of each algorithm used.

Review of relevant literature 2.1. Service Quality
Service quality is characterized as the customer's overall opinion of the service provided [4]. Satisfaction is a direct reaction to consumption, while service quality is defined as the customer's overall impression of the service provided. Planned and perceived service quality have an effect on service quality. Customers will be satisfied and consider the quality of service to be very good if the service matches their standards, but if the service beats their expectations, customers will be very happy and consider the quality of service to be very good [5]. As a result, improving service efficiency is strongly reliant on the airline's ability to reliably satisfy passengers' needs and desires [6]. By doing their best to build and sustain reliable service, which can contribute to customer loyalty [7], airlines can prosper by maintaining a competitive edge. This, in turn, will support airlines by (1) improving the bond between the airline and its customers, (2) creating a strong basis for repurchase operations, (3) promoting customer satisfaction, (4) creating word-of-mouth reviews that will help the airline, (5) developing a good corporate image in the eyes of passengers, and, ultimately, (6) by lowering prices. As a result, airlines must understand the competitive value of efficiency: long-term quality enhancement is not expensive. rather, it is an opportunity that produces greater returns [9].

Airline Companies Services
To reliably estimate the characteristics of an airline's operation, it is appropriate to conceptualize them. Airline operation is a term that includes all of the facilities operated by airlines. Kandampully & Suhartanto [9] refer to airline employment as service storage in order to explain and extend the definition of airline service, based on Park's [10] model, which describes four types of service firms with two task dimensions, as seen in Table 1. The features of airline services are both fixed and versatile [11]. Seat height, cargo storage, aircraft type, and maintenance are all factors that influence the characteristics. In-flight meal services, which provide both observable and intangible services from departure to arrival, such as flight attendant services [12][13][14][15], are examples of airline services that are versatile. Because of the quality of the airline's offerings [16], passengers appear to be faithful to these airlines. Customers who are unhappy with the level of service will remain with a single airline rather than moving to another [17][18][19]. Aside from service efficiency expectations, transaction and switching costs have a huge impact on service loyalty [20]. The role of one company is much more difficult to consider the efficiency of airline services than it is for other service providers, such as financial institutions, whose work processes consist of different but interrelated matters [21]. Airline services, on the other hand, are supported simultaneously with a number of procedures by various agencies such as the TSA, airport authority, catering firms, and so on [22]. As a result, enhancing the efficiency of airline services necessitates the seamless coordination of different operations by several organisations [23].

Fig. 1. Methodology Process
It is important to know which dataset we are using before deciding and creating the model to be used. Our dataset includes approximately 130,000 survey entries as well as passenger and flight information from US airlines. There are 21 function columns and one level goal column in all. Fourteen of the features are survey entries in which passengers score their flight experience on a scale of one to five. However, certain survey entrants with a score of 0 are viewed as unanswered survey questions. The resulting data set for modeling has around 70,000 entries after deleting this survey entry and some NaN (Not a Number) values. Also some columns and other entries have been renamed for clarity. Finally, we have a cleaned data set as shown below:

Fig. 2. Dataset after Cleaning
Data can be labeled as available for modeling, as seen in Figure 2. It would be awkward, however, if we did not conduct a specific analysis of the results, such as calculating the number, average, largest value, smallest value, quartile, and standard deviation. We do not necessarily use this research for simulation, but it will be helpful if the reader could appreciate how the dataset we're using is more detailed; this would also give readers a clearer understanding of the findings that will be produced later. The research results for each function are shown in Figure 3 below.

Fig. 3. Dataset Analysis
After conducting the analysis, we can clearly see how the data we have from a statistical point of view, but there are some drawbacks, namely that there are some features and data in features that are suitable but not suitable for making several models. So the next thing we're going to do is adjust the feature name and content to make it easier to build the model. The changes we have made are like: • Changes in the structure of some features in the dataset.  After making some of these changes, for example changing the Satisfaction column to binary, with this we can make a clear comparison of the amount of data. Comparisons that are owned for each data in the satisfaction column are attached in the figure below. Namely, 0 = 0.564257 1 = 0.435743 The result of sharing the data that is owned is quite balanced and makes sense. with this the model selection process can be carried out. With 56.4 % of passengers reporting Neutral / Disappointed (negative class: 0) and 43.6 % reporting Pleased, the target class is equally balanced (positive class: 1). The vast number of negative entries is unsurprising, since 'Neutral / Unhappy' does not always indicate disappointment. This involves travelers who are unhappy with their flight experience. We observed that first-time consumers had a lower satisfaction ratio when we further separated the satisfaction class by customer form. Customers who took personal vacations (holidays) had a much lower satisfaction ratio when we segmented the satisfaction groups by trip type.
Also, we aim to remove features that don't contribute to our predictive modeling. It includes features that don't contribute to target class differences as well as highly correlated features, which can cause multicollinearity issues. As much as possible, we want to maintain features that include survey entries, so that we can identify areas of flight satisfaction. For this purpose, we apply Kernel Density Estimation (KDE) plots, heat map correlation and LASSO regression for feature selection. We found that: a. KDE plot: The 'Gate Location' feature seems to be lacking '2' and '4' points, which suggests an inconsistency because passengers are unlikely to reach these scores. The distribution of satisfaction in the 'Gender' feature is virtually equal for both, suggesting that it is negatively aligned with the goal and has therefore been eliminated. b. Correlation Heatmap: The features 'Age,' 'Departure / Arrival Time Convenience,' 'Gate Location,' and 'Total Delay' have a poor correlation with the target of 0.15 downwards. c. LASSO Regression Plot: As the alpha hyper parameter increases, the least significant function has a linear coefficient that decreases to zero at the earliest. We described the features as 'Food and Drink,' 'Ease of Online Booking,' 'Age,' 'Flight Distance,' 'Total Delay,' and 'Gate Location' based on the plot.
We decided to delete the features "Gender," "Age," "Gate location," "Total Delay," "Flight Distance," and "Departure / Arrival Time Convenience" after considering various conditions and possibilities. Finally, there are only 15 features, which comprise the bulk of survey groups and consumer forms / grades. We split the data into 80 % for 5-fold cross validation and 20 % as a test package for the final assessment of the chosen model before deciding which classification model was the most predictive for the data collection. Then, we used 5-fold cross-validation to find the hyper-optimal parameters in different classification models. Apart from that, the cross validation process involves Gaussian Naive Bayes and the Ensemble method (voting on both models). The Random Forest model was found to be the best performer, with AUC 0.99, Precision 0.97, and Recall 0.94, after testing all models on AUC, Precision, and Recall. High precision would be more relevant for our business matters under this model. The estimation of the positive class model, 'Satisfied,' must be very accurate in order to accurately define the critical factors that contribute to consumer satisfaction. We conduct Easy Validation using the Random Forest Model to evaluate the optimal probability threshold since the Precision-Recall interchanges the probability threshold adjustment.  As seen in Figure 15, the model's accuracy is 99.1%, indicating that when it forecasts passenger satisfaction, the model is assured that the forecast is 99.1% reliable and correct. Let's look at how this high precision can be used to address the business problems we've already addressed in the process.

Business Problem
First-time customers, as previously reported in the EDA, have higher demands and are therefore less likely to be fulfilled. However, capturing customer loyalty the first time is crucial because it increases the chances of the customer returning to the airline for the next flight. We will analyze the important factors that contribute to first-time customer loyalty using this model.

A. Private Travel in Economy Class -First Time Customer
When we began assigning all categories to the average ranking (rating: 3) for economy consumers on a personal trip, the model was unsure if the consumer would be pleased. The model is optimistic that the customer will be pleased if we raise the In-Flight Wifi Connectivity level to outstanding (rating: 5), with other groups performing averagely. Fortunately, even though the In-Flight Wi-Fi Facility is downgraded when the rest of the tier is set to really fine, the model also doubts that the consumer would be happy. The model assumes that business customers who go on business trips will be more readily fulfilled. This model is optimistic that business customers will be happy considering the lower In-Flight Wi-Fi Coverage level relative to the rest of the segment, which is set to a very good ranking. However, as I continued to downgrade in other categories, this model assumes that consumers would be happier if I set the Ease of Online Shopping ranking to very good at the very least (rating: 5).

Conclusion
For airlines, we've built a highly precise classification model to help them recognise critical bottlenecks and improve passenger satisfaction. We propose that airlines concentrate on optimizing the In-Flight Wi-Fi Service experience based on many simulations. Airlines may, for example, create improved tools to make accessing in-flight wi-fi easier, or lower the cost of accessing in-flight wi-fi so that more economy class travelers would take advantage of the service. Furthermore, airlines must concentrate on the Simplicity of Online Booking, as business travelers enjoy flexibility and comfort while flying.