Jasmir Jasmir, Nurhadi Nurhadi, Eni Rohaini, M Riza Pahlevi B, Daniel Sintong Pardamean Simanjuntak
Abstract
In this research we use several machine learning methods and word embedding features to process social media data, namely comments on the Disney Plus Hotstar application. The word embedding features used are word2vec, GloVe and FastText which are used to see their effect on the classification performance evaluation value of machine learning methods such as Naive Bayes (NB)¸K-Nearest Neighbor (KNN) and Random Forest (RF) which is the aim of this research . NB is very simple and efficient and very sensitive to feature selection. Meanwhile, KNN is known for its weaknesses such as biased k values, overly complex computations, memory limitations, and ignoring irrelevant attributes. Then RF has a weakness, namely that the evaluation value can change significantly with just a slight change in the data. In text classification, feature selection can improve the scalability, efficiency and accuracy of text classification. The test results that have been carried out on several machine learning methods and word embedding features state that the highest accuracy is obtained from KNN both before using the feature and after using the feature. The feature used by KNN that produces the highest score is the FastText feature. Using the FastText feature also produces almost even accuracy, precision, recall and F1-score values.