A Comparison of Similarity Metrics for Sentiment Analysis on Turkish Twitter Feeds


ÇOBAN Ö., ÖZYER B., TÜMÜKLÜ ÖZYER G.

2015 IEEE International Conference on Smart City/SocialCom/SustainCom (SmartCity), Chengdu, Çin, 19 - 21 Aralık 2015, ss.333-338 identifier identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/smartcity.2015.93
  • Basıldığı Şehir: Chengdu
  • Basıldığı Ülke: Çin
  • Sayfa Sayıları: ss.333-338
  • Atatürk Üniversitesi Adresli: Evet

Özet

Sentiment analysis is one of the most useful tools in social media monitoring. Implementing sentiment analysis on social media data (Blogs, Twitter, and Facebook etc.) is beneficial to measure customer satisfaction and as a result can reduce production cost for a company. Moreover, sentiment analysis can be used in various other domains, such as economics, commerce and opinion mining to collect data for obtaining meaningful information. In this study, our major goal is to investigate the positive/negative polarity of Turkish Twitter feeds by using text classification methods for sentiment analysis. Bag of Words and N-Gram models are used to extract the content of text in feature extraction phase. Different similarity metrics are analyzed to improve the performance of the kNN classifier on both Reuters-8 and Turkish Twitter Feeds data. The Reuters-8 data used to analyze effect of text language and length on classfication results. The experiments are conducted on six different combinations of feature extraction models and weighting methods. Experimental results show that IT-Sim gives better performance compared to other classification metrics and Tf-Idf is the most effective weighting method. The accuracy of the kNN classifier is depended on combination feature extraction model with different weighting methods and the values of k parameter.