IAES Nawala: Natural language processing technology Indonesian case study

Greetings, fellow Nawala! May you always be in good health.

This is the IAES Nawala of the Institute of Advanced Engineering and Science. Today we will share some news related to natural language processing (NLP). NLP is a subset of artificial intelligence development that deals with the interaction between computers and human language. NLP focuses on the ability of computers to manage human language similarly to how humans understand the context. Sentence segmentation, which breaks textual data into sentences, is an important part of NLP techniques. Petrus et al. (2023) proposed a sentence segmentation system called Indonesian sentence segmentation (SKBI) by applying a set of rules that can be used on Indonesian text and can be adapted for English, such as the use of periods, commas, question marks, and exclamation marks which are sentence separators. More detailed results have been described in the following article.

An adaptable sentence segmentation based on Indonesian rules

Johannes Petrus, Ermatita Ermatita, Sukemi Sukemi, Erwin Erwin

Sentence segmentation that breaks textual data strings into individual sentences is an important phase in natural language processing (NLP). Each word in the string that is added a punctuation mark such as a period, question mark, or exclamation point, becomes the location for splitting the string. Humans can easily see the punctuation and split the string into sentences, but not machines. Basically, the three punctuation marks also perform other functions so that the sentence segmentation process must really be able to detect whether a word marked with punctuation is a sentence boundary or not. This research proposes a sentence segmentation system called segmentasi kalimat bahasa Indonesia (SKBI) or Indonesian language sentence segmentation by applying a set of rules and can be used in Indonesian texts and can be adapted for English. There are 34 rules built with a combination of 27 fairly complete features that contribute to this research. The experimental results for the Indonesian text show that the SKBI is able to achieve an F1-Score of 96.89% and 97.07% for English. Both need to be improved but now better than previous research.

Hayaty et al. (2023) examined the development of NLP on posts in application X (formerly Twitter). They observed the impact of embedding pre-trained global vector (GloVe) words on accuracy in the classification of hate or non-hatred speech in Indonesian text. The study found that the use of pre-trained GloVe (text in Indonesian) and single and multi-layer long short-term memory (LSTM) classifiers was resistant to overfitting compared to pre-trained embedding for hate speech detection. The accuracy value is 81.5% in single layer and 80.9% in double layer LSTM.

Hate speech detection on Indonesian text using word embedding method-global vector

Mardhiya Hayaty, Arif Dwi Laksito, Sumarni Adi

Hate speech is defined as communication directed toward a specific individual or group that involves hatred or anger and a language with solid arguments leading to someone’s opinion can cause social conflict. It has a lot of potential for individuals to communicate their thoughts on an online platform because the number of Internet users globally, including in Indonesia, is continually rising. This study aims to observe the impact of pre-trained global vector (GloVe) word embedding on accuracy in the classification of hate speech and non-hate speech. The use of pre-trained GloVe (Indonesian text) and single and multi-layer long short-term memory (LSTM) classifiers has performance that is resistant to overfitting compared to pre-trainable embedding for hatespeech detection. The accuracy value is 81.5% on a single layer and 80.9% on a double-layer LSTM. The following job is to provide pre-trained with formal and non-formal language corpus; pre-processing to overcome non-formal words is very challenging.

On the other hand, Garini et al. (2023) measured the service quality of a mobile application in Indonesia through online customer reviews using NLP. The methods used were sentiment analysis and topic modeling. They analyzed 20,452 reviews from Google Play Store and Apple App Store for an application named “myIndiHome”. From the results of this study, it is known that various aspects influence positive and negative reviews, such as app features, products/services, app interface, availability, feature reliability, processing speed, bugs, and reliability. This approach can help understand customer feedback and improve the quality of self-service mobile apps in the telecommunications sector in Indonesia.

Using machine learning to improve a telco self-service mobile application in Indonesia

Jwalita Galuh Garini, Achmad Nizar Hidayanto, Agri Fina

The use of mobile applications extends to the telecommunication sector, mainly due to COVID-19. Failure to provide it can cause dissatisfaction and result in the removal of the mobile application. Moreover, this leads to lost service opportunities, so paying attention to the mobile application’s quality is essential. There has yet to be a study on measuring the service quality of a self-service mobile application in the telecommunication sector using online customer reviews. This study uses sentiment analysis and topic modeling to determine the service quality of a self-service mobile application in the telecommunication sector from reviews on Google Play Store and Apple App Store. This study uses myIndiHome as a case study. The total data obtained from both platforms are 20,452 reviews. Sentiment analysis was performed using Naïve Bayes, support vector machine, and logistic regression, while topic modeling was performed using latent dirichlet allocation. The results show that logistic regression performs better than support vector machine and Naïve Bayes. Meanwhile, topic modeling shows that the positive review data has three topics, including application features, products/services, and application interfaces. Moreover, the negative review data has five topics, including application availability, application feature reliability, application processing speed, bugs, and application reliability.

Some of the articles above are a small part of the research on natural language processing. To get more information, readers can visit the IAES International Journal of Artificial Intelligence (IJ-AI) page and read articles for FREE via the following link https://ijai.iaescore.com/.

By: I. Busthomi