Implementation of the Convolutional Neural Network Method to Detect the Use of Masks

The planet has been taken seriously by Coronavirus disease since the end of 2019. Wearing a mask in public is one of the key means of security for people. Furthermore, certain public service vendors only require clients to use the service if they wear masks correctly. However, based on image processing, there is relatively little study into the discovery of face masks. Almost everybody appears to wear a mask in order to shield themselves from the COVID-19 Pandemic. Monitoring whether people in the crowd wear face masks at the most public place, such as malls, museums, parks, has become increasingly important. The development of an AI approach to deal with if the person wears a face mask and their entrance would significantly assist society. In this article, we will use a deep learning model that is then combined with Keras / TensorFlow & OpenCV, respectively CNN or Confusional Neural Network. The accuracy of the research results obtained from this model is more than 96%.


Introduction
Due to the Corona COVID-19 virus epidemic across the globe, the practice of using masks in public areas is growing. People used to wear masks before Covid-19 to safeguard their protection against air pollution. While others are self-conscious of their looks, by covering their expressions, they conceal their feelings from the public. Scientists have demonstrated that the propagation of COVID-19 [1] can be prevented by wearing a face mask. The new epidemic virus to reach human health in the last century [2] is COVID-19 (known as coronavirus). The accelerated transmission of COVID-19 will push the World Health Organization to declare COVID-19 a global pandemic in 2020. According to [3], more than five million cases in 188 countries were infected with COVID-19 in less than six months. By close contact and in cramped, overcrowded areas, the virus is spread.
The outbreak of coronavirus has significantly improved the scientific collaboration of the world. Machine learning and Deep Learning-based Artificial Intelligence (AI) can help battle Covid-19 in several respects. In order to forecast the spread of COVID-19, to act as an early warning system for possible pandemics, and to identify susceptible populations, machine learning enables researchers and doctors to analyze vast volumes of data. Provision of healthcare needs support for emerging technology to combat and predict new diseases, such as artificial intelligence, IoT, big data and machine learning. The power of AI is harnessed to tackle the Covid-19 pandemic [4], such as COVID-19 identification on medical chest X-rays [5], to help recognize infection patterns and rapidly monitor and diagnose infections.
In dealing with the spread and propagation of COVID-19 [6], policy makers face multiple problems and threats. In certain nations, citizens are required by law to wear facial masks in public. In response to the rapid rise of cases and deaths in many fields, these rules and laws have been established. The method of controlling a wide number of persons, however, is getting more difficult. The method of screening requires identifying who is not wearing a face mask. In France, new AI tech has been built into the Paris Metro system's security camera [7] to ensure passengers wear face masks. The software-developed French start-up DatakaLab [8] states that the aim is not to classify or apprehend individuals who do not wear masks, but to produce anonymous statistics that will help officials anticipate a possible epidemic of COVID-19.
Object detection is a very important problem in the development vision and challenges of data science, combining segmentation and recognition together, accuracy and real-time time are very important capabilities, in recent years, object detection is used in various applications in various fields such as driving, artificial intelligence (AI), facial recognition, however, various factors will interfere with the detection process, such as angles, occlusion, uneven illumination, and other factors. Krizhevsky [9] defines Confusional neural networks as art products in ImageNet classification, many types of Confusional neural networks have been used for object recognition since then. CNN achieved a good performance in the introduction, especially, the CNN Mask method got the latest results in object recognition, however, it is very time consuming and tiring to get the object Mask for training, so we use reasonable data available open source.
In this article, we present a model of face mask detection based on a branch of the model of machine learning, namely CNN or Confusional Neural Networks. By allowing the identification of people who are not wearing face masks, the proposed model may be combined with a camera to prevent transmission of COVID-19. An interaction among machine learning and the CNN algorithm is the blueprint. For extracting features, we have used deep machine learning and coupled it with 5-layer models that will be used to receive input filters.

Related Works
In previous research, when wearing a face mask, the focus of the study was more on facial construction and identity recognition. In this research, our emphasis is on detecting individuals who do not wear face masks to help minimize COVID-19 transmission and spread. The wearing of face masks helps to minimize the spread of COVID-19, researchers and scientists have demonstrated. In [10], a system for defining the requirements for wearing a new face mask was created by the authors. For wearing face masks, they were able to identify three types of circumstances. Correct use of face masks, incorrect use of face masks, and no use of face masks are the types. In the face detection point, the proposed observation has achieved an accuracy of 98.70 percent. To identify the human, Sabbir et al [11] applied Principal Component Analysis (PCA) to masked and exposed face recognition. They discovered that mask wear significantly affected the accuracy of facial resonance using PCA. When recognizable faces are closed, identification precision declines to less than 70 percent. In [12], PCA is also used. A procedure used to detach glasses from a picture of a human frontal face is suggested by the speaker. Using recursive error correction using PCA restoration, deleted sections are reconstructed.
In [13], for face recognition, the authors used the YOLOv3 algorithm. As the backbone, YOLOv3 utilizes Darknet-53. The system proposed achieves a precision of 93.9 percent. With more than 600,000 images, it was trained on the CelebA and WIDER FACE dataset. The FDDB dataset is the test. Nizam et al [14] proposed a new GAN-based network which, by building up the missing holes, can automatically delete masks covering facial areas and regenerate images. A full facial picture that looks normal and realistic is the production of the suggested model.
In [15], the authors provided a method for the identification in the operating room of the presence or absence of mandatory medical masks. The main objective is to eliminate false positive face detection as much as possible without missing mask detection only for medical professionals who do not wear surgical masks to cause alarms. 95 percent precision is archived in the proposed system. An immersive tool called MRGAN was proposed by Muhammad et al [16]. This approach is focused on acquiring the user's microphone area and using the Generative Adversarial Network to reconstruct this field. In real time, Shaik et al [17] used description and identification of facial emotions. To classify seven facial expressions, they used VGG-16. On the KDEF dataset, the proposed model was trained and achieved an accuracy of 88 percent.
Due to its superior spatial feature extraction capabilities and lower computational costs, CNN played an important role in the pattern recognition task relevant to computer vision in this study [18]. In order to retrieve higher-level characteristics, CNN uses a convolutional kernel to merge initial images or feature maps. What remains an opening question, though, is how to build a better convolutional neural network architecture. The early networking suggested in [19] helped the system to learn the right kernels combinations. To deeper neural networks to be trained, A Residual Network (ResNet), which can analyze the identity mappings of the previous layer, was proposed by K. He et al [20]. A Cellular Network (MobileNet) [21] was suggested because object detectors are typically used in handheld or embedded devices, where processing resources are very small. To remove features and wise channel convolution to balance channel numbers, it uses careful convolution, so the cost of MobileNet computing is even smaller than networks using regular convolution.  The first step is to visualize the data that will be used. The dataset we use consists of 1376 images which are divided into 690 images using masks and 686 images using no masks. In order to move on to the next process we have to augment our dataset by selecting images for training. which in the end will produce a number of 560 selected images. Then we divide the details into training data containing images for the CNN model's training and testing data containing images that would be used to validate the model that was developed. Our augmented data is split into 80% (training data) and 20% (testing data).

Fig. 3. CNN Model
In the next stage, after augmentation of the data and dividing it into 2 data, namely training and tests. We will start building a CNN Sequential model with several layers namely Conv2D, MaxPooling2D, Flatten, Dropout, and Dense. To make this study easier to understand each layer of the model we will discuss them one by one.
Conv2D, or better known as the 2D convolution layer, is a convolution layer that uses the properties of an image, such as 5 x 5 pixels, to "scan" the image with a filter. The convolution operation calculates the point product between the image pixel value and the weight defined in the filter for each 5 x 5 pixel area in the image. The 2D convolution layer implies that the convolution operation's input is three dimensional. This is a little complicated, since the input is supposed to be two-dimensional. But "2D" in "2D convolution" refers to a filter's movement, which crosses the image in two dimensions. For instance, a color picture with a value of three layers for each pixel: red, blue and green. The filter is then run three times over the image, once for each layer.

Fig. 4. Conv2D Concept
After downsampling or "merging" the previous convolution results, the same convolutionary form is used sequentially, first to classify features in the original image, and then to identify sub-features in smaller portions of the image. Finally, the aim of this method is to define essential features that can assist in classifying images. MaxPooling2D, since acquiring a convolutional layer, is a process that is carried out. A typical pattern used to organize layers in a convolutional neural network that can be replicated one or more times in a given model is the addition of a pooling layer after a convolutional layer. In order to construct a new set of aggregated feature maps, the pooling layer works on each feature map separately.
For example, max pooling and min pooling are different forms of pooling. Maximum concatenation operates by choosing the largest value in the square by putting a 2x2 matrix on the feature diagram. The 2x2 matrix is passed around the whole feature map from left to right, taking the highest value for each pass. These values then form a new matrix called the feature map that is pooled. Max pooling works to maintain key characteristics while reducing the scale of the image as well. This helps prevent overfitting, which can occur if too much information is provided to the CNN, especially if the information is not important to the image classification.
A summary version of the features observed at the time of input is the result of using the pooling layer and generating the sampled or aggregated function diagram. They are beneficial since a slight difference in the position of the feature in the feedback observed by the convolutionary layer will result in a feature map at the same location that is merged with the feature. Model invariance to local translation is called this capacity added by coalescing. The next step is flattening, after the collected function map has been acquired. Flattening involves converting the whole matrix of the aggregated function map into a single column that is then fed for processing to a neural network.
Dropout is a method of regularization which provides an estimation of the training in parallel of large numbers of neural networks with different architectures. A number of output layers are skipped or "dropped out." arbitrarily during preparation. This has the effect of making layers appear alike and are viewed as layers with a different number of nodes and connectivity to the previous layer. As a result, any changes to a layer during training are carried out in another "view" of the configured layer.
The dropout has the effect of normalizing the training process, requiring the nodes in the layer to take on more or less responsibility for the input in probabilistic terms. This conceptual model implies that it could be a dropout scenario in which the network layers adjust to correct the previous layer's mistakes together, making the model more robust in turn.

Fig. 5. Dense Layer Model
A deeply connected neural network layer is a dense, dense layer, which indicates that each neuron in the dense layer receives input from all neurons in the previous layer. The material most widely used in models is the dense layer. The thick layer executes matrix-vector multiplication in the background. The values used in the matrix are essentially parameters which, with the help of back propagation, can be trained and modified. A vector with dimension 'm' is the output generated by the dense layer. So, to change the dimensions of the vector, the dense layer is essentially used.
Furthermore, in the pre-training model phase we can see that the data has been separated into 2, there are a total of 556 in the training data and 203 in the testing data. with this we can start the process of the most important part which is training the CNN model. In the training set and the test set, we modify our drawing to the Sequential model we have generated using the hard library. We have planned to train the 30-epoch model (iterations). For a larger number of epochs, however, we may train to gain greater precision to prevent over-fitting. We see that our model has 98.86 percent accuracy with the training set after the 30th epoch, and 96.19 percent precision with the test set. This means that without over fitting, it is well trained.

Step 3
The next step will begin with labeling the obtained result. "For the results, we have prepared two probability labels, namely "With Mask" and "Without Mask". We intend to use it after this to identify whether we use our PC webcam to wear a face mask. We first need to implement face recognition for this. We are using Haar's Feature Dependent Cascade Classifier to detect facial characteristics in this situation. OpenCV developed this cascade classifier to detect frontal faces through the training of thousands of photographs. An .xml file can be downloaded and used for face recognition for the same purpose. And in the final step, we run an infinite loop using the OpenCV library to use our camera to detect faces using the Cascade Classifier. The model would estimate the probability that both groups will have (with or without masks). Labels will be picked and shown around our face depending on which probability is greater. We see from the resulting picture below that the model is able to detect accurately whether or not the person is wearing a mask and shows the same thing on the label.

Results and Conclusion
We have succeeded in developing a CNN model in this article to detect whether or not anyone is wearing a mask. In different implementations, it can be used. Given the COVID-19 crisis, wearing a mask may be appropriate in the future and this method of detecting if the individual is wearing a face mask may be very helpful. The pandemic COVID-19 corona virus is triggering a global health crisis. Governments are battling this form of virus around the world. According to the World Health Organisation (WHO), protection from COVID-19 infection is a required precautionary step. A machine learning model is presented in this paper for face mask detection. Our model had an accuracy of 98.86 percent with the training set in the data set checked, and an accuracy of 96.19 percent with tests specified. The literature review has presented comparable findings. In terms of testing accuracy, the model used can be assumed to be passed. In future studies, the primary objective to be accomplished is the machine learning approach to attain the lowest consumption time and maximum precision. Using a deeper machine learning model for feature extraction and using the neurotrophic domain is one of the possible future activities as it demonstrates promising promise in problems of classification and detection.