A Data Mining Practical Approach to Inventory Management and Logistics Optimization

The latent demand to optimize costs and customer service has been fostered in the current economic situations, characterized by high competitiveness and disruption in supply chains, placing inventories as a vital sector with significant potential to implement improvements in firms. Inventory management that is done correctly has a favorable impact on logistics performance indexes. Warehousing operations account for around 15% of logistics expenditures in terms of dollars. This article employs a method based on the Partitioning Around Medoids algorithm that incorporates, in a novel way, the application of a strategy for locating the optimal picking point based on cluster classification, taking into account the qualitative and quantitative factors that have the greatest impact or priority on inventory management in the company. The results obtained with this model improve the routes of distributed materials based on the identification of their characteristics such as the frequency of collection and handling of materials, allowing for the reorganization and expansion of storage capacity of the various SKUs, moving from a classification by families to a cluster classification. This article shows a suggestion for a warehouse distribution design using data mining techniques, which uses indicators and key qualities for operational success for a case study in a corporation, as well as an approach to improve inventory management decision-making.


Introduction
Customers' increasing demand for products and services has prompted organizations to look for ways to improve efficiency in their operations. Internal logistics and warehousing are considered areas with significant potential for improvement by many businesses, and can have a positive impact on the business's operational efficiency [1]. Warehousing operations are critical to an organization's operational performance because they encourage and support the fulfillment of a customer's set of criteria and expectations along the supply chain. Inventory management has a big impact on logistics performance indices, especially for companies that want to save costs and enhance efficiency in their product preparation and delivery procedures [1]. In affluent countries, warehouses account for approximately 15% of total logistics expenses [2].
A warehouse is a facility that serves as a link between suppliers and customers, with the purpose of dampening demand by taking into account time and cost variables in order to close the gap between the production and consumption of commodities [3]. In a warehouse, operations are divided into receiving, storage, order picking, sorting, and shipment [4], with order picking processes taking up the most time and effort. Order picking is the most expensive function, accounting for roughly 55% of a warehouse's operational costs [5], which is why order preparation is viewed as a research opportunity for increasing corporate productivity. Order picking in logistics is the process of selecting a group of Stock Keeping Units (SKUs), extracting them from multiple storage sites, and transporting them for review, packaging, and shipping in order to fulfill client orders. Whether internal or external, In the case of manual operations, order picking activities are also the most time demanding, due to the heavy effort and repeated activities carried out in the warehouse. Harvesting costs are mostly connected to the time spent transporting products within the warehouse in manual processes when the labor force is involved. According to studies conducted by [6], the collection cost accounts for around half of the entire order picking time.
The distance that must be traveled to collect the items requested in a customer's order determines transportation time. As a result, reducing distance and collection time is a critical goal in the warehouse to achieve efficiency and competitiveness [7]. Several tactical and operational decisions must be taken in order to plan for order choosing [8]. At the tactical level, there are two decisions to make: (1) product allocation to storage areas, which outlines the criteria for assigning SKUs to storage locations, and (2) storage area zoning. collection, which is a method for deciding policies on how to split the order picking region into zones and where the order picking areas should be located [9]. Order batch processing, which is based on rules that define the mix of customer orders in a single selection round, and routing policies, which define the sequence of storage locations that must be visited to collect all the SKUs necessary for the formation of an order, have an impact on operational decisions. Current supply chain management trends encourage the use of various technologies to optimize inventories as a support for storage operations [10]. Warehouse Management Systems (WMS), for example, are becoming more complex technological instruments based on data analysis. WMSs are intended to offer timely information for storage, inventory, and SKU movement decisions. Analytical tools such as data mining and business intelligence are being utilized in inventory management in the Big Data era to give accurate and up-to-date information so that better decisions may be made [11]. Data mining analytic techniques, in particular, are regarded as the primary foundation for Big Data.
Data mining is the process of collecting information from a data set. Its key advantages are the ability to combine statistical models and autonomous learning, allowing it to handle a wide range of data kinds. The areas focusing on Big Data analysis in data mining, according to Choi et al. [11], Arora and Chana [12], and Tsai et al. [13], include (1) grouping approaches, (2) distributed and parallel processing, and (3) multimedia processing. A data set is divided into categories using grouping techniques. The grouping process involves allocating a large number of data points to a smaller number of groups, so that data points in the same group have the same qualities as data points in other groups. The process of grouping is categorizing the incoming data based on particular values or features [14]. Artificial intelligence, marketing, scientific analysis, and engineering are all examples of where clustering is applied [15]. Inventory managers can arrange SKUs based on particular characteristics using cluster grouping analysis. This research focuses on a strategy to inventory management that employs cluster grouping based on variables linked to collection, storage, and warehouse returns. A case study in the food industry is examined using data mining techniques based on the Partitioning Around Medoids (PAM) algorithm. The goal is to provide an inventory management strategy that uses clustering algorithms and the optimal picking point to categorize goods based on variables including collection frequency, consumption rates, and qualitative traits related to warehouse operations.
The premise was that by combining qualitative and quantitative characteristics, it would be possible to manage inventory, distribution, and storage logistics in a corporation using cluster grouping techniques. The first section of this article discusses current developments in inventory management, order picking, and picking. The proposed approach, which is based on data mining techniques, is next presented. A case study is used to assess the idea, which is based on data acquired from a food industry. Following that, the findings are analyzed, with a focus on the variables that have the biggest impact on storage logistics operations. Finally, the study's results and future work are discussed.

Literature Review
Inventory management is responsible for the planning and control of resources and goods that support manufacturing functions, maintenance activities, and customer service. Due to the high level of cost that stocks can reach in an organization, the latent demand to optimize costs and customer service has positioned inventories as a vital area for improvement in firms. According to Gu et al. [12], who conducted a thorough literature analysis, inventory management issues are categorised based on storage (reception, storage, order preparation, and shipping). ABC analysis is one of the most common and classic inventory management procedures used by businesses. According to the ABC demand curve, class-based storage classifies stored products based on regulations such as inventory turnover or cost [14]. Grosse and Glock [16] establish an analytical model that aids in the prediction of performance on specific order picking system elements. Jemelka et al. [17] employ a quantitative technique to propose a variant of the ABC analysis for determining inventories using a recursive model that incorporates material return rates and section redistribution for the position of SKUs inside a warehouse.
To set themselves apart from the competition by minimizing order preparation time, van Gils et al. [17] suggest workload forecasting in a warehouse scenario, with a focus on collection regions. De Vries et al. [18] propose that the human factor plays a key part in order picking within the warehouse after examining the performance of various instruments for order picking. Van Gils et al. [19] used a full factorial Analysis of Variance to statistically assess and test the links between storage, order processing, batching, zoning, and routing (ANOVA). These authors believe that significant inventory management gains can be realized by addressing storage, order processing, zone selection, and routing procedures at the same time. Zhang et al. [20] introduce the notion of Demand Correlation Pattern (DCP) to characterize the relationship between SKUs, and then provide a methodology to solve the Storage Location Assignment Problem based on it (SLAP). Zhang et al. [21] conclude with the DCP proposition that the class-based storage approach, which splits SKUs into many classes and assigns each class to a storage location, is one of the most often used inventory management systems.
Lolli et al. [22] provide a strategy based on a hybrid model that combines the K-means algorithm and the Analytic Hierarchy Process in order to design a multi-criteria inventory categorization approach (AHP). Yuan et al. [23] investigate zone stowage decisions in multi-zone storage systems within the context of a WMS to determine the optimal distribution of products arriving through different storage zones. In their study, Anelkovi and Radosavljevi [24] used cluster analysis to determine which inventory management processes can provide the greatest benefits for the implementation of the WMS. As a result, these authors conclude that order processing operations are the best candidates for implementing information technologies based on the WMS. Using the principles of graph theory and a model based on a heuristic approach, elik and Süral [25] investigate the order picking problem (OPP) to find the path that minimizes the time necessary for storage operations. Matthews and Visagie [26] provide a proposal for reducing collection transfer times in order to obtain an adequate arrangement for SKU collection activities in a warehouse. Faia Pinto and Nagano [27] offer GA-OPS, a computational tool based on two genetic algorithms that reduces the number of picking trips while matching the constraints specified in different production orders.
Djatna and Hadi [28] present the order preparation problem at a beverage company's warehouse with a drive-in rack system using a multiobjective mathematical model. Djatna and Hadi [17] conclude that order picking research is currently facing a challenge in integrating order picking (warehousing allocation, routing, batch processing, zoning, and warehouse architecture) with other factors (queue, operational, and material handling features). The authors Liu et al. [11] present a methodology that builds hierarchies of similar inventory groups and then applies a simulated annealing algorithm to optimize inventory classifications on different hierarchy levels, combining cluster analysis and simulated annealing algorithm to search for optimal classification in a warehouse. Kusrini [14] presented a study that backs up the process of determining the minimal stock and profit margin by utilizing a model that divides SKUs into "rapid movement" and "slow movement" categories using the k-means grouping approach.
Aqlan [12] categorizes inventories with cluster grouping using data mining techniques, based on variables such as collection frequency, time in storage, price, and product sensitivity to transportation. [16] He provides the variables of flow time, work in process, and throughput in terms of pick probability after studying the aspects involved in inventory management. Given that inventory classification and categorization necessitates the use of multiple criteria to control various inventory management functions, Aktepe et al. [17] examined a functional-normal-and-small (FNS) algorithm, which combines ABC analysis with variables such as handling frequency, lead time, contract manufacturing process, and specialty. In general, most inventory management literature focuses on elements such as time optimization, transfer lengths, and resource utilization, with these serving as a benchmark for quantifying advances in a company's warehouse operations. Inventory analysis, which has been advocated by numerous authors and is based on categories, is a key component in improving storage operations. The gathering and preparation of orders, in particular, are variables that are acknowledged as key components that must be considered in order to achieve inventory management effectiveness.
The k means method is the most commonly used data mining technique, but in this study, the cluster method is proposed using quantitative variables such as (1) collection frequency, (2) average quantity per order, (3) daily rate of consumption, (4) daily rate of returns, (5) average amount of returns per order, and (6) frequency of collection.

Theoretical Basis
The proposed inventory management strategy in this study is based on cluster grouping, which identifies similar elements among distinct SKUs in a warehouse. Each cluster is built using variables with distinct features, which distinguishes it from the other clusters. An worldwide food corporation was utilized as a case study for this research, which focuses on the manufacturing of food and beverages, specifically studying the challenges of a business unit in Mexico. This business unit, which is based in Mexico, specializes in beverage preparation and bottling. In a make-to-order environment, the said company's inventory management is carried out through internal warehouses (placed within the facilities) and external warehouses (positioned outside the facilities). Figure 1 depicts warehouse logistics with the production department acting as an internal client.

Fig. 1. Logistics in warehouse
Due to various formulation and mixing procedures, the materials in a manufacturing order are not consumed in their whole and are returned from the production area to the warehouse, which is a unique feature of the company. High inventory levels, storage capacity, a shortage of storage space, and frequent demand variations are just a few of the issues that plague firm inventory management. With this in mind, the proposed methodology begins by identifying the SKUs present in the warehouse. This stage entails deciding on the methods and instruments that will be used to ensure the identity, location, and position of the various materials and products utilized by the company in its supply chain, as well as their presence throughout the various production processes. 203 SKUs into the warehouse over a one-year manufacturing period were identified by identifying barcodes on various materials and using reports from the Computer System for Resource Planning (SAP). Chemicals, powders, liquid concentrates, lids, pallets, packaging, labels, PET lids, and cardboard are the different families of materials. Following the description of the various SKUs that make up the warehouse, the criteria that have the most impact or importance for inventory management were chosen. In this scenario, the daily consumption rate (DRC) (1), average quantity per order (AQO) (2), and pick frequency (PF) (3) were used for warehouse exits. The company analyzed the following elements when determining the factors that have the biggest impact on returns: daily rate of return (DRR) (4), average number of returns per order (ARO) (5), and frequency of return (RF) (6). The PF represents how often the items are required by the production department [12], while the RF indicates how often they are returned to the warehouse from the production department. Each of these factors was calculated using the following equations: The corporation included this qualitative component for inventory management, along with the unit of measure factor in which the materials are accounted for, due to the peculiarities of particular materials, which necessitate occupying positions in "rack" and "floor" locations (kilograms, pieces, or gallons). A location on the floor denotes that the material does not require special storage conditions, whereas a location on racks denotes that the material, such as chemicals, powdered ingredients, and concentrates, requires a storage system that includes racks and, in some cases, controlled temperature conditions, in order to preserve the materials they protect in the best possible condition. Following the calculation of these parameters, the cluster grouping technique was used to classify the data. In a cluster analysis, a set of data, in this case SKUs, is classified by similarity in the input variables, in this case six quantitative parameters (continuous) and two categories, to identify groupings that are internally as homogeneous as feasible yet differ as much as possible from one another. One of the key advantages of cluster analysis is that it produces a sensible grouping and categorizes data into a more similar series [15]. Clustering algorithms are also appropriate for applications in which the data changes over time.
Partitioning Around Medoids was the clustering approach employed in this investigation (PAM). For each medoid, the PAM algorithm minimizes the sum of the differences of each observation. The PAM algorithm was employed with k-medoids because some SKUs in the company's operations record anomalous use and storage under consignment (customer property). A medoid is an element in a cluster with the least average distance (difference) between it and all other items in the same cluster. When compared to algorithms like k-means [20], using medoids instead of centroids makes the PAM technique more robust, as it is less impacted by outliers or noise. The following stages are used to create the PAM algorithm: (1) choose k random observations as initial medoids, (2) calculate the distance matrix between all observations, (3) assign each observation to its closest medoid, (4) check if adding another observation as a medoid reduces the cluster's distance, and (5) check if at least one medoid has changed, otherwise the process ends. The Gower distance metric was utilized to analyze the dataset, which is not possible with other methods such as k-means, which only allows for Euclidean or Manhattan distances. The Gower distance metric is a strong method established by Gower [17] and extended by Kaufman and Rousseeuw [20], which may be used to analyze databases that contain continuous, ordinal, or categorical variables at the same time. The Gower distance is calculated by comparing two examples I and j using Gower's General Similarity Coefficient Sij, which is defined as: The SKU's classification is determined using cluster grouping and the specified factors. Finally, using the categorization achieved from PAM, an inventory analysis is conducted, with the goal of recommending a material redistribution in the warehouse. The ideal picking point was determined as an extra inventory management approach in this scenario. The organization suggested using the best picking point placement technique based on material redistribution to establish the collection point location that minimizes transfer distances. Points a^1,... a^m R2 were minimized through the facility location issue, where a denotes a warehouse location.

Result and Discussion
The SKU data was received via the SAP system and analyzed in spreadsheets to identify the inputs and outputs across a one-year production cycle. The SAP system data was used to compute the frequency and quantity of materials required by the production area, in addition to the volume occupied and the location of each SKU. The following table details the SKUs present in the warehouse, including their identification, location, amount, and volume occupied.

Fig. 2. Locations of the SKUs
Internal / external warehouse locations were considered those in which an SKU can be found, either in an internal warehouse, or in an external warehouse, mainly due to capacity and space constraints. Figure 2 presents the current locations of the SKUs in the internal warehouse. Using the PAM clustering method, the SKUs were grouped based on the factors of DRC, AQO, PF, DRR, ARO, and RF, including two qualitative factors related to location (racks and floor) and the management unit (kilograms, pieces or gallons). Using the R studio software, modeling was performed with PAM, determining the number of clusters. By analyzing the similarity in the dataset and implementing the Gower distance, a similarity matrix was created. Establishing a similarity criterion allows the similarity of the elements to be related to each other; therefore, the proximity of an element is determined employing a similarity measure.

Fig. 3. Silhouette analysis
Subsequently, using the similarity matrix, with silhouette analysis, the size of the cluster was defined. With six clusters an average silhouette width of 0.73 was achieved, this being the highest result for the number of clusters (Figure 3). Examining the context of each cluster, the solution with six clusters was the one that best adjusted to the diversity of data and requirements for warehouse administration. With the PAM clustering approach, the SKUs of the warehouse were grouped into six clusters as shown in Table 2. The SKUs were allocated as follows: 58% in Cluster 1, 10% in Cluster 2, 7% in Cluster 3, 12% in Cluster 4, and 8% in Cluster 6. Clusters 4 and 5, which correspond to the families of labels, covers, and packaging materials, had the highest PF and RF values. Cluster 6 contains the lowest PF label and packaging material families. Clusters 4, 5, and 6 have significantly higher DRC, AQO, and DRR values because they correlate to SKUs such as labels, can lids, and packaging that are utilized in large quantities during production. Cluster 5 comprises the pallet family in addition to labels and packing, resulting in a larger volume occupied than in previous clusters. Without taking into account the family of pallets in cluster 5, cluster 3 is the one that occupies the most space in the warehouse. Cluster 1 has the most SKUs, with 117, consisting of chemicals, powders, and liquid concentrates. For qualitative characteristics, the results given in Table 3 were derived using the PAM grouping approach.
For instance, in Cluster 2, the SKUs that comprise a group belong to the liquid and chemical concentrates families and are stored in racks and classified as gallon material handling units. Additionally, these qualitative factors, as represented by the various clusters, enable the company to correlate the characteristics of the commodities and their movement requirements with the systems and capacities of the material handling equipment. A weight is assigned to each element based on the characteristics that define each of the grouped groups, the location of the ideal collecting point, and the Pareto principle. The weights assign significance to those aspects that contribute to a higher score toward the achievement of the various performance indicators; these weights are applied via equation (9) by reducing the variable d by taking values of vi. In this example, the organization determined the picking frequency and return frequency factors as the primary elements that favor order processing efficiency. After removing outliers, as illustrated in Figure 4, it was determined that clusters 4 and 5 were the primary ones to evaluate for the PF and RF variables. The corporation assigned the highest weights to clusters 4 and 5, followed by clusters 2 and 3, and cluster 1 received the lowest weight. In the case of cluster 6, the corporation considered shipping these materials to an external warehouse due to the effect of low PF levels, which favors the internal warehouse's capacity. The position of the optimal picking point was computed using equations (8) and (9), and the MATLAB facility location optimizer module. Figure 5 illustrates the solution graphically. With this layout concept, the estimated picking point was determined using the cluster weights that were significant in terms of picking frequency and warehouse returns, giving these factors precedence. This site is the one with the shortest transfer distance for preparing the various SKUs, which enables the warehouse's logistics performance to be improved. With this distribution, material development favors clusters 4 and 5. By combining a cluster collection policy with a picking point policy, it is also feasible to calculate the volume required for operation. With an average volume capacity of 60 m3, the suggested picking point minimizes the routes taken by materials from clusters 4 and 5. The arrangement in Figure 5 also depicts the clusters' distribution throughout the warehouse, assisting in making space planning decisions based on the volume occupied by each SKU that comprises the clusters. In comparison to the company's current distribution model, this cluster-based distribution proposal enables a reorganization of the SKUs, increasing storage capacity by 8%, while also avoiding the dispersed distribution of materials by transitioning from a family-based classification to a cluster-based classification.

Conclusion
The warehouse is a critical component of the supply chain for a variety of reasons, including demand variations and value-added service to the client. Storage efficiency is determined by three factors: space, time, and cost. Costs and time are reduced by streamlining inventory management. This concept allowed for the implementation of a methodology for the optimal identification and positioning of items without the use of costly information systems, with an emphasis on the characteristics and factors affecting order preparation processes. The application of data mining techniques such as PAM clustering enables the inventory to be classified into distinct groups based on both qualitative and quantitative parameters.
This article demonstrated how to incorporate variables such as daily consumption rate, average quantity per order, picking frequency, daily rate of return, average number of returns per order, and frequency of return into a distribution design, as well as attributes relating to material handling within the warehouse. The clustering plan combined with PAM provides a more realistic approach to inventory management, where considerations such as time and capacity, as well as the types and handling of materials inside the warehouse, must be considered. The typical approach to inventory management storage design challenges ignores the dynamic nature of client demand. The company's decision-makers can examine the dynamic environment of orders using this proposal by examining aspects such as choosing frequency and return frequency. Additionally, with this study, the characteristics and qualities of stocks may be assessed on a periodic basis, and the placements of SKUs can be updated to benefit supply logistics optimization.
Through PAM's inventory supply and administration procedure, it is possible to modify the material selection environment, increase collecting speed, and decrease journey distance. Additionally, the results aid in storage capacity decision-making by identifying the amount of space required by the materials that comprise the various clusters. By integrating the optimization of the picking point, the organization saw significant benefits in terms of expediting the order preparation process and lowering inventory management costs. By decreasing transfer distances and utilizing material identification, it is feasible to fulfill orders for a variety of clients more quickly and with a high level of satisfaction. Warehouse design considerations are another part of inventory management that should be examined, as they affect a variety of performance metrics, including material handling, space costs, and capacity. Additionally, this study could be expanded to optimize the warehouse architecture by taking into account additional parameters such as collection routes, order delivery dates, the defining of picking zones, and material storage policies.