Product Review Sentiment Analysis by Artificial Neural Network Algorithm

Buying and selling and marketing goods and services are now done online. The online store provides facilities that enable its customers to provide review related products offered. The number of reviews received by the store, online sometimes does not allow the store online to analyze one by one. Thus, it takes the help of machines to assist in the analysis of such sentiments. Analysis of the sentiments of the review the product is done to help the shop get a general overview related to the level of consumer satisfaction. In this study, the ANN algorithm will be used to analyze sentiment for review. A product ANN algorithm used because it can provide high accuracy performance. This research resulted in a reasonably high accuracy performance is 88.2%.


Introduction
The rapid development of technology provides much ease for modern society. One of them is the ease of dealing with buy and sell that can be done over the Internet, or more commonly known by e-commerce. According to research conducted by Ismail in 2017, there are at least 3 online stores that dominate the E-CommerceMarket, namely Alibaba, Amazon, and eBay [1]. Based on a survey conducted by Statista in 2017, the number of active visitors owned by Alibaba reached 488 million visitors [2] [3]. While Amazon has 183 million visitors. While eBay only reaches 87 million visitors. The number shows the magnitude of public interest in online stores.
According to Fobes [4], one of the reasons that make the online store is in demand by the community is the ease in obtaining product-related information and the price comparison without having to meet the shop party. Also, the online store provides more convenience than conventional stores, such as free shuttle service. Not infrequently, online stores also provide facilities that allow consumers to provide feedback or review related products offered by the online store. This feedback or reviews can later be used by online store management to analyze consumer feedback on the products offered [5] [6].
In general, consumer response can be grouped into two broad groups, namely, satisfying and disappointing. Consumer feedback is said to satisfy when it contains positive sentiments in it. Conversely, the response is said to disappoint when it contains negative sentiments [7]. Nevertheless, the number of responses received by online stores; sometimes, people cannot analyze them one by one. So, it needs the help of the engine to assist with the sentiment analysis process. Thus, a sentiment analysis algorithm is required that can be implemented into the machine so that the machine can recognize the sentiment of the response given by the consumer [8]. The analysis of these sentiments will later help the online store to get an overview of their consumer satisfaction level [9] [10].
In this study, the ANN algorithm will be used to perform sentiment analysis on product reviews [11] [12]. The implementation of the ANN algorithm on the product review sentiment analysis will result in a more stable classification rule architecture [13]. By setting up a neural network algorithm on the sentiment analysis of product reviews, the machine is expected to have high accuracy in classifying the sentiments of consumer responses related to products offered by online stores [14] [15]. As such, management will be helped to get a general idea of their consumer satisfaction level.

Research Concept
In this study, a concept was conducted in the same analysis with the following stages: a. Literary studies This stage is used to determine an overview of the research related to sentiment analysis. Also, this stage describes the extent to which research is themed sentiment analysis.

b. Problem identification
This stage is used to identify any issues related to the E-Commerce field.

c. Data Collection
This stage is done to collect the necessary data in the research. Data should be collected according to the plans that have been compiled in the previous step.

d. Pre-processing
The pre-processing stage aims to perform pre-processing of data that has been obtained in order to be used in research that will be done.

e. Building an ANN model
This stage aims to design the ANN architecture that is used in research. The architectures used should allow us to have a quite high accuracy but not too much in need of computing resources.

f. Train Model ANN
At this stage, the ANN has been designed to be trained using the training set that has been prepared. ANN is trained in such a way that it has a relatively small error value.
g. Testing ANN Performance ANN, which has been successfully trained, will be tested its accuracy performance by using a pre-prepared testing set. The test results will be documented and will be the evaluation material of this research.

a. Data Collection
The data to be used in this study is product review data from Amazon.com. Datasets have been grouped into two classes, i.e., positive sentiment classes and negative sentiment classes. The Dataset used in this study amounted to 448 reviews consisting of 230 positive sentiment sentences and 218 negative sentiment sentences. Its class-based has labeled each data. The sign (+) is a label of a positive sentiment sentence, while negative sentiment is labeled with a (-) sign. The semicolon (;) (determinant) between the label and the sentence.
b. Pre-processing

1) Punctuation Removal
Punctuation removal aims to remove existing punctuation marks on datasets such as question marks (?), exclamation (!), dots (.), commas (,) and other punctuation marks. Table 1 is an example of data before and after punctuation removal. The punctuation in the review sentence is eliminated so that it only forms a sentence without punctuation. The customer support is very rude. The customer support is very rude

2) Tokenization
Tokenization is the cutting phase of text documents based on each word they are composing. The piece is called a token or term. At this stage, the dataset will be checked from the first character until the last character shown in table 2.

Table 2 Tokenization
No Class labels

3) Single Character Removal
At this stage, the elimination of single characters or letters of that only consists of one character such as a, I, N, and so forth.

4) Stopword Removal
Stopword removal is a process of eliminating words that often appear but has no influence whatsoever in the extraction of a sentiment review. Words include such as a timepiece, a question word, and so on. Examples of and, to and the.

5) Stemming
In the stemming process, it will find the root word and remove the suffix to the word. Stemming aims to reduce variations of words that have the same base word. Examples of "greating" and "greats" have the basic word "great". Table 3 shows changes before and after the stemming process. The word for each review is then added to the 2D structure, with each of the columns containing the words of each review. After the words go through the pre-processing process, has syllables are shown in table 4.

6) Vocabulary List
In this process, each word is added to the vocabulary list. The vocabulary is then sorted in alphabetical order. A Review of the 2d list in table 4 will result in a list of 1182 vocabularies shown in table 5. The training Label for positive or negative is stored in 2 x (Total review) matrix. The number 1 in the first row and the number 0 in the second line is a Negative sentiment review, and the number 1 in the second row with 0 in the first line is a positive review sentiment. Table 6 shows an example of this structure. The size matrix (number of vocab) x (number of reviews) made will be used on ANN for training. The number of rows is the same as the number of words collected during the pre-processing process, so if the word in the review will be labeled 1, if it is not there, it is labeled with a number 0. The variable map is used to determine which line 1 will be placed in. A review of table 4 will produce the input vector shown in table 7.

Table 7 Numerical Vector Equivalent
The Review of the The neural networks are trained using a numerical input vector, which is shown in table 7, and the training label is shown in table 6. The results of the above processes will be saved into 3 different CSV files named Vocabularylist.csv, _vector-training.csv label, and numerical_vector-training.csv. The neural network used is a pattern recognition network provided by Matlab, which is a feedforward network using the conjugate scale gradient method to adjust weights and bias.

9) Results
The ANN development process uses the help of the tool found in the MATLAB, the Nntool (Neural Network tool). DataSet files used as input are the process results of numerical vector training, and the target is numeric results that have been saved as a result of the vector training label. Some inputs to neural networks amounted to 1182 by the amount of vocabulary stored on the Vocabulary list. csv. The output layer consists of two neurons, one neuron represents the probability of being a positive review, and the other is likely to be a negative review.
Preliminary testing is done with 1 hidden layer of 10 neurons used for all experiments. Table 8 shows the accuracy performance generated by neural networks with different data divisions.   Table 8 shows that the highest accuracy result is 88.2%, with 70% training data sharing and 30% data testing. From the result, the amount of data sharing 70%-30% will be used to perform subsequent tests. Testing is done by entering the number of neurons on different hidden layers ranging from 5 neurons to 20 neurons shown in table 9. The difference in the number of neurons in a hidden layer affects the accuracy performance generated by neural networks. The accuracy experienced a drastic decline in testing with the number of neurons 13 in the hidden layer, which is only reached 46.9%.
From table 8 and Table 9, the highest accuracy results gained from testing with 70% of training data distribution and 30% Data testing with the number of neurons on hidden layer 10, which is at 88.2%.

Conclusions and Suggestions
Based on the results of the research done, then it can be concluded as follows: 1. It has been applied to neural network algorithms to analyze the sentiment of product reviews on Amazon.com, especially electronic products.
2. By setting up a neural network algorithm to analyze the sentiment of product reviews shows the results of satisfactory accuracy performance of 88.2% with a large number of datasets that are 1182 inputs. This suggests that the artificial neural network algorithm is worth using to analyze sentiment.