Advancement of artificial intelligence techniques based lexicon emotion analysis for vaccine of COVID-19

Emotions are a vital and fundamental part of life. Everything we do, say, or do not say, somehow reflects some of our feelings, perhaps not immediately. To analyze a human's most fundamental behavior, we must examine these feelings using emotional data, also known as affect data. Text, voice, and other types of data can be used. Affective Computing, which uses this emotional data to analyze emotions, is a scientific fields. Emotion computation is a difficult task; significant progress has been made, but there is still scope for improvement. With the introduction of social networking sites, it is now possible to connect with people from all over the world. Many people are attracted to examining the text available on these various social websites. Analyzing this data through the Internet means we're exploring the entire continent, taking in all of the communities and cultures along the way. This paper analyze text emotion of Iraqi people about COVID-19 using data collected from twitter, People's opinions can be classified based on lexicon into different separate classifications of feelings (anticipation, anger, trust, fear, sadness, surprise, disgust, and joy) as well as two distinct emotions (positive and negative), which can then be visualized using charts to find the most prevalent emotion using lexicon-based analysis.


Introduction
A series of security incidents have recently happened around the world, demonstrating the wide range of crises in which today's ordinary people are effectively using their mobile communication devices [1]. The majority of significant news events now include real-time social media comments [2]. Social networks have emerged as a research topic in which experts from all backgrounds seek inspiration. As a matter of fact, social networks, particularly social network analysis (SNA), which is backed by computer science, offer the opportunity to widen other fields of knowledge. Many fields have established the notion of social network and social network analysis, including scientist or other professions cooperation networks, family networks, student friendship networks, company director networks, consumer networks, labor market, public health, psychology, and so on. It has recently become a part of a new branch of science known as computational social science [3]. Social networks are characterized by social scientists as a group of people who share a common interest, such as familial relationships, political action, information, views, or geographic location [4]. Microblogging is defined as the activity of posting small amounts of digital content, such as text, images, links, small videos, or any other type of media, to the internet. They, like other social networking sites, attempt to build a sense of online community. These platforms let users to exchange information about their lives, activities, opinions, and status in a light-weight, simple manner. Twitter is one of the most widely used microblogging sites [5]. Recent events have brought Twitter to the fore as a new media to study. Users can follow or be followed on Twitter. The relationship of following and being followed on does not need reciprocation like most online social networking sites, such as Facebook or MySpace,. A user can follow any other user without having to follow them back [6].

Artificial intelligence (AI)
Several computer systems have been constructed over the last few decades to do numerous human mental functions such as arithmetic, designing computer programs, and interpreting languages, all of which are thought to require "intelligence." Some computer systems are able to analyze electronic circuits, solving formulas, diagnosing diseases, and comprehending a limited amount of human speech and natural language texts. The majority of the work in developing these systems has been done in the field of "AI" [7]. AI is the branch of engineering and science related to the theory and practice of creating systems that possess the characteristics we associate with intelligence in human behavior, such as NLP, perception, issue planning and solving, adaptation & learning, and environmental action [8]. The AI field of NLP is a branch of computer science. It is related with how computers interact with human natural languages, and in particular with programming computers to analyze huge amounts of natural language data [9].

Social media
Any medium of communication that allows for two-way engagement is referred to as social media (SM). It is a allows users to share and consume content in a variety of formats, including text, image, and video. People utilize social media in their daily lives in a variety of ways, from text messaging to online dating. Users of social media can contact with friends, family, and organizations all around the world using interactive services [10]. Users of these services have decided to form type of virtual society known as online social networks (OSN). They're also known as virtual communities [3]. These social networks have grown in popularity in recent years, offering a more efficient and user-friendly method to maintain social connections and communicate information in a variety of formats and mediums, including microblogging, status updates, mobile text alerts, blogs, instant messaging, and forums. Microblogging is defined as the activity of posting little amount of online content to the internet, which can take the shape of text, images, links, brief videos, or any other form of media. Microblogging has become extremely popular among groups of friends and professional colleagues who often update their material and follow one other's updates . Because they are brief and easy to analyze, this style of blogging is seen to be more informative and accurate for marketers. Twitter, Jaiku, and Pownce are just a few of the services that provide microblogging. These platforms let users to exchange information about their lives, activities, opinions, and status in a light-weight, simple manner. Twitter is one of the most widely used microblogging sites [5]. Twitter keeps track of the most popular phrases, words, and hashtags and posts them under the heading "trending topics" on a regular basis. A hashtag is a Twitter protocol for starting and following a discussion thread by prefixing a word with the '#' character. Twitter displays a list of the top ten trending topics on every user's homepage in a right sidebar [6]. This data is open to the public. As a result, it can be used as raw data primarily for opinion extraction, customer satisfaction analysis, and grading alternative government schemes, as well as sentiment analysis [11].

Twitter application programming interfaces (APIs)
APIs are the modern software systems' electrical sockets. It defines how software components should communicate with one another. This interface, at a high level, contains a list of commands that a first component can use to access functionality in a second component, as well as the particular format in which the first component should provide those commands to the second component. The user can see some program components, such as the user interface of a web browser. There are many more components that are hidden but serve important duties. Various software components, for example, are in charge of delivering and receiving web page data over the Internet, interpreting that data and rendering it in a graphical style, and handling persistent data (e.g., browser cookies) kept by websites. The relationship between these components is defined via APIs [12]. There are two forms of API sorting available on Twitter. Developers can read and write Twitter data using the REST-APIs. These APIs are useful for researchers because they allow them to search for messages that have been posted recently and meet certain criteria, such as the inclusion of specific keywords, hashtags, or user names. Developers can use the streaming APIs to access Twitter's global data stream. These APIs are useful for academics since they enable for the real-time capture of data matching certain criteria. Researchers must first construct a Twitter application that handles requests to Twitter's database in order to gain access to these APIs. To obtain the login details required to access data using Twitter's API, you must first create a Twitter application. The steps for establishing a Twitter application are outlined below. It must first have a Twitter account that is active. Then go to Twitter's app registration page and follow the instructions there. Following the creation of the application, it must generate four tokens that will allow the scripts to collect data on Twitter. Access token, API-secret, API-key, and Access token secret are the four. These keys are extremely important, and the user should treat them as if they were email passwords. Any use of these credentials to access Twitter's databases can be traced [13].

Gathering data from twitter
To retrieve data from Twitter, a user must first have access to the Twitter API. Twitter may alter the procedure for granting API access to users [14]. The OAuth package in R is used to perform Twitter API authentication. The processes for using OAuth to access the Twitter API are shown in Figure 1. To use Twitter API, you'll need to establish a Twitter application.  These keys will be used to construct a Twitter link that will start the authentication procedure. Twitter verifies the user's identification and issues PIN 3 (also known as a verifier). This pin must be provided by the user to the application.
 The next application step uses this PIN to obtain from the Twitter API an Access Secret and Token that are unique to the user.
 The information about the token and secret key is cached for future use. GetUserAccessKeySecret can be used to accomplish this.

Gathered data preprocessing
By reducing data errors, preprocessing the data can improve sentiment analysis. It's a method of removing undesirable parts from data. Sentiment analysis methods that do not use data preprocessing may miss crucial terms, reducing the accuracy of the results. Preprocessing, on the other hand, can result in the loss of critical information. The elimination of punctuation, which could be valuable to the analysis, is one example of erroneous over-preprocessing. There are many different types of general preprocessing procedures which are: 1) Filtering: Non Arabic tweets are removed using the filtering method. The filter () function in the dplyr package in the Rstudio environment is used to subset a data frame, keeping the rows that satisfy the constraints. It works with both grouped and ungrouped data; most data operations are performed on variablesdefined groups. group by() turns an existing table into a grouped table in which actions are carried out "by group." The function ungroup () eliminates grouping. On ungrouped data, filtering is frequently more faster.
Simple typing errors, such as repeated letters and misspellings, are corrected by the filtering mechanism. It used dictionaries in this operation. Predefined dictionary words replace acronyms and abbreviations. The following are some examples of simple forms of errors: a) Errors created by the sound of the language, such as ‫"ظالم"‬ can be transcribed as ‫"ضالم"‬ and errors generated by switching letters, such as ‫"بيت"‬ can be written as ‫"يبت"‬ b) Eliminate the repetition of characters, such as " ‫كثيييييييييير‬ " being replaced by " ‫كثير‬ " by deleting the vowels repetition.
c) Correct spelling errors such as:  Issues with the space bar; either no space (as in " ‫كيفحالك‬ "), or incorrect space (as in ‫حبا"‬ ‫مر‬ ").
 Letter closeness; for example, the word ‫"كنت"‬ might be written as ‫"منت"‬ because the letters and are close to each other on the keyboard.  The confusion produced by similar characters, such as the words ‫"كتير"‬ and ‫"كثير"‬ [15].
2) Removing URL's: Links to other webpages and websites can be found in URLs. During the analysis, they supply no information. These were of no use, thus they were eliminated from all tweets using R's tm_map() function. Blank spaces were used to replace all sentences and subparts of sentences that began with http [16].
3) Removing emoticons, numbers, and punctuations: The different emoticons punctuations in the tweets are removed in this phase [17]. One might wonder why emoticons were removed. It was removed because when the emoticons were retrieved, they appeared as square boxes rather than genuine emoticons. The "gsub" function in R was used to delete the unwanted emoticon values [16].

4)
Removing stopwords: It's a method for removing commonly used terms that are nonsensical and unhelpful for text classification. This decreases the size of the corpus without sacrificing crucial information [18].

5) Stemming:
Stemming is a necessity for many Natural Language Processing operations. In most Information Retrieval systems, it is critical. The basic goal of stemming is to reduce a word's various grammatical forms, such as its noun, adjective, verb, adverb, and so on, to its root form. The purpose of stemming is to reduce a word's inflectional forms and, in some cases, derivationally related forms to a common base form. R's "stem" function is used to complete this task [19].

Sentiment and emotion analysis
Sentiment analysis is a procedure that applies NLP to automate the extraction of opinions, attitudes and emotions from text, audio, perspectives, tweets, and sources of DB. Subjectivity analysis, opinion mining, and assessment extraction are other terms for It [20]. A sentiment classifier can detect whether a sentence has a positive or negative connotation by determining its polarity. A general opinion on a topic can be derived by averaging the polarity of individual texts given a sample of texts that discuss the same issue. For example, a consensus view on a product can be determined by gathering a collection of reviews, i.e., whether the product is popular with consumers or not. The state-of-the-art in sentiment categorization is generally split into two approaches, one lexicon-based and the other learning-based classifier [21]. To discover the emotions contained in a text, emotion analysis employs NLP, analysis of text, and a variety of techniques for computational. This analysis can be done on a number of levels, including document, sentence, word, and aspect levels. The basic two important approaches in sentiment analysis for classifying emotions are: emotional dimensions and emotional categories [22].As shown in Figure 2, the analysis of emotion of certain input data consists of the following steps: Figure 2. Procedure for analyzing emotions

Experiment
This research is based on the use of lexicon-based analysis as an analytical tool for natural language processing. The text was pre-processed and filtered after collecting data from Twitter. Then take a series of procedures to demonstrate individual sentiments as well as the most prevalent emotion among Iraqis regarding Corona virus vaccine. Figure 3 shows the algorithm that was used to analyze the sentiment of the Iraqi populace.
The analysis procedure was as follows:

1-
The keys are used to generate tokens that are used to authenticate the browser after receiving the Twitter API and Google Map API. The Iraqi trend ‫"كورونا"‬ on November 2020 is used in this research. This hashtag is used to perform a Twitter search and collect data.

2-
Non Arabic tweets are removed from the fetched tweets.
3-Remove URLs, numbers, punctuation, emoticons, and stopwords from the filtered data.

4-
To get the words back to their roots, a stemming process is used.

5-
Make a list of the most frequently used words.
6-Create the TDM (term-document matrix), which describes the frequency of words found in the cleaned tweets.

7-
Arrange the words in decreasing order.

9-
The most frequent emotion for the chosen hashtag is calculated and displayed in a plot by summarizing the sentiments.
The most positive and negative feelings can be shown in figure 7. Positive sentiments can be found in the right-hand corner. The negative sentiments are displayed in the bottom left corner. Figure 7. Most populated positive and negative emotions Finally, using the hashtag ‫"كورونا"‬ we were able to determine the most prevalent emotion for Iraqi population opinion about corona vaccine. They were feeling trust about it. Figure 8 depicts the situation.

Conclusions
In this research, a sentiment analysis method was developed to examine Iraqis' perceptions of the corona virus vaccine. In our work, we could: Create a Twitter developer account and a Twitter application that gives us access to the Twitter API, allowing us to get Twitter data by selected hashtag. Examine the public's positive and negative sentiments, also their most prevalent emotions on the chosen issue.