I have decided to center my project around sentiment analysis of Amazon reviews. The questions I am trying to answer is “Can I build a model and train it with actual customer reviews to predict the star value of any given written customer review.” The dataset I am using is located on Kaggle. The set contains 400,000 real customer reviews of various products on Amazon’s website. The reviews are primarily written in English, but there is a small percentage written in other languages that would need to be filtered out.
To explore the dataset, I’ll first run a term frequency counter on all the reviews to see the most prevalent words. I’ll then compare the most common words that originate from each review classification (1-5 stars). I will then try to build a model that analyses all of the words in a review and tries to guess the star value the customer gave. Sentiment analysis becomes more complex when negative words and multiword human phrases are considered. For example, “great” and “not great” would be associated with different review ratings. Hopefully, the large number of training reviews will make it easier to assign sentiment to these complex phrases. After testing the model on Amazon reviews, I want to test it on other platforms to see how well it holds up. I chose this topic because I’m interested in how computers analyze human communication and assign human emotion to them.