python

OrNot Sentiment Analysis- Part 2

With the necessary customer reviews collected in Part 1, we can now begin our analysis.

Introduction

In Part 1 of this project I scraped customer reviews off OrNot’s website, cleaned the data, and performed some cursory analysis. We learned that people really love their OrNot products with an average 4.8 star rating over all of their reviews. These high ratings held consistent across all of their products as well. Customers were also very satisified with the fit of the clothing they ordered. Put it all together and you've got some happy customers.

But with the data I collected there’s still some potentially interesting insights to uncover about people’s attitudes towards the company’s products and that’s namely through the reviews themselves where people can express their likes, dislikes, and leave any additional comments. This is what we’ll be diving into in Part 2.

Preparing the Data

Before we can begin any sentiment analysis we first need to prepare the data by converting all text to lowercase, tokenizing the text into individual words, and then removing all stop words (things like ‘I’, ‘the’, ‘is’, ‘and’). I also want to flatten the data so that instead of a single row for each review they're all clumped together into one big list. Once that is accomplished we can find the most common words in the reviews and the reviews’ headers.

1# Make headers and reviews lowercase
2df['headers'] = df['headers'].str.lower()
3df['body'] = df['body'].str.lower()
4
5# Tokenization - splitting headers and reviews into lists of words. Also removing any punctuation
6tokenizer = RegexpTokenizer("[\w]+")
7df['tokenized_headers'] = df['headers'].map(tokenizer.tokenize)
8df['tokenized_body'] = df['body'].map(tokenizer.tokenize)
9
10# Removing stop words
11stop_words = set(stopwords.words('english'))
12
13df['tokenized_nostop_headers'] = df['tokenized_headers'].apply(lambda x: [word for word in x if word not in stop_words])
14df['tokenized_nostop_body'] = df['tokenized_body'].apply(lambda x: [word for word in x if word not in stop_words])
15
16# Flattening so that we have one big list instead of rows of lists
17flattened_headers = []
18for sublist in df['tokenized_nostop_headers']:
19    for word in sublist:
20        flattened_headers.append(word)
21
22flattened_reviews = []
23for sublist in df['tokenized_nostop_body']:
24    for word in sublist:
25        flattened_reviews.append(word)
26
27# Most common words in headers
28header_word_count = {}
29for word in flattened_headers:
30    if word not in header_word_count:
31        header_word_count[word] = 1
32    else:
33        header_word_count[word] = header_word_count[word] + 1
34				
35# Most common words in reviews
36review_word_count = {}
37for word in flattened_reviews:
38    if word not in review_word_count:
39        review_word_count[word] = 1
40    else:
41        review_word_count[word] = review_word_count[word] + 1

Unsurprisingly the reviews and headers contain a lot of product-related names like ‘jersey’, ‘shorts’, and ‘bibs’, but we also get some quality adjectives like ‘great’, ‘perfect’, and ‘love’. It’s nice to see that this falls in line with our findings from earlier — that people genuinely enjoy their OrNot gear.

We can also visualize the most common words by making some word cloud art.

1# Making wordcloud for headers excluding any product words like jersey, shorts, bibs, house etc so it's just adjectives
2top_words_header = dict(sorted_header_word_count[:100])
3words_to_exclude_header = ['jersey', 'shorts', 'bibs', 'house', 'bib', 'jacket', 'vest', 'ornot', 'short', 'ever', 'shirt', 'mission', 'kit', 'long', 'weather', 'bag', 'product', 'go', 'wind', 'like', 'review', 'layer', 'piece', 'one', 'climate', 'fit', 'pants', 'pair', 'winter', 'merino', 'bike', 'really', 'high', 'shell', 'sleeve', 'jerseys', 'size', 'another', 'blue', 'cycling', 'well', 'little', 'work', 'thermal', 'buy', 'summer', 'small', 'cargo', 'orange', 'first', 'stone', 'socks', 'wear', 'base', 'bought', 'looking' 'ls', 'made', 'grid', 'gloves', 'pockets', 'tight', 'ride', 'colors', 'purchase', 'riding', 'sweatshirt', 'sizing']
4filtered_header_word_count = {word: count for word, count in top_words_header.items() if word not in words_to_exclude_header}
5wordcloud_header = WordCloud(width=300, height=300).generate_from_frequencies(filtered_header_word_count)
6plt.imshow(wordcloud_header)
7plt.axis('off')
8# plt.show()
9
10# Making new wordcloud for reviews
11top_words_body = dict(sorted_review_word_count[:100])
12words_to_exclude_body = ['jersey', 'ornot', 'shorts', 'bibs', 'jacket', 'wear', 'really', 'fit', 'would', 'long', 'ride', 'one', 'small', 'rides', 'medium', 'bit', 'bike', 'pair', 'vest', 'layer', "i've", 'get', 'little', 'also', 'riding', 'enough', 'house', 'tight', 'pocket', 'first', 'back', 'much', 'large', 'look', 'right', 'short', 'made', 'bib', 'jerseys', 'around', '5', 'sleeve', 'weather', 'wind', 'even', 'bought', 'wearing', 'snug', 'sleeves', 'days', 'go', 'base', 'still', 'day', 'way', 'far', 'feels', 'got', 'time', 'definitely', 'looking', 'worn', 'length', 'without', 'use', 'could', 'used', 'buy', 'stretch', 'cold', '6', 'two']
13filtered_body_word_count = {word: count for word, count in top_words_body.items() if word not in words_to_exclude_body}
14review_wordcloud = WordCloud(width=300, height=300).generate_from_frequencies(filtered_body_word_count)
15plt.imshow(review_wordcloud)
16plt.axis('off')
17# plt.show()

Most common words used in customer reviews.

Most common words used in customer's review headers.

In these word clouds for the most common words in the headers and reviews I’ve manually removed any of the product-related names (jersey, bibs, socks, etc.) so as to only show adjectives that meaningfully demonstrate people’s attitudes towards their purchases. While quite simple I think this actually gives a better qualitative sense of customer's experiences with OrNot. We already knew the star ratings were high and were consistent across products but now we can also see that customer's thought the clothing was comfortable, versatile, and of high quality. I'd still like to push this further though, and that's where sentiment analysis comes in.

The VADER Model

I’ll be using the VADER model which uses a lexical approach to map words to sentiments by using a ‘dictionary of sentiment’. In other words, each word in the dictionary is assigned a numerical value between -4 and 4 with -4 being negative and 4 being positive. The word ‘horrible’ is rated -2.5 while ‘great’ is 3.1. After each word is assigned a value they’re summed to produce positive, neutral, negative, and compound scores between -1 and 1.

Take these two sentences for example:

‘OrNot is the best.’

‘I hate spiders.’

When ran through the VADER model, these values are returned:

{'neg': 0.0, 'neu': 0.417, 'pos': 0.583, 'compound': 0.6369} {'neg': 0.787, 'neu': 0.213, 'pos': 0.0, 'compound': -0.5719}

‘OrNot is the best’ is rated mostly positive, and the compound score reflects this. On the other hand, ‘I hate spiders’ is rated much more negatively. It's a simple but effective approach to determine the sentiment of a sentence. We can apply this to each of the customer's reviews and review headers to find the distribution of the compound scores. I also bucketed the compound scores to make it less granular in the visualization.

1# VADER Sentiment Scores for headers and reviews
2sia = SentimentIntensityAnalyzer()
3
4df['headers_sentiment_VADER'] = df['tokenized_nostop_headers'].apply(lambda x: sia.polarity_scores(' '.join(x)))
5
6df['headers_VADER_neg'] = df['headers_sentiment_VADER'].apply(lambda x: x['neg'])
7df['headers_VADER_neu'] = df['headers_sentiment_VADER'].apply(lambda x: x['neu'])
8df['headers_VADER_pos'] = df['headers_sentiment_VADER'].apply(lambda x: x['pos'])
9df['headers_VADER_compound'] = df['headers_sentiment_VADER'].apply(lambda x: x['compound'])
10
11df['reviews_sentiment_VADER'] = df['tokenized_nostop_body'].apply(lambda x: sia.polarity_scores(' '.join(x)))
12
13df['reviews_VADER_neg'] = df['reviews_sentiment_VADER'].apply(lambda x: x['neg'])
14df['reviews_VADER_neu'] = df['reviews_sentiment_VADER'].apply(lambda x: x['neu'])
15df['reviews_VADER_pos'] = df['reviews_sentiment_VADER'].apply(lambda x: x['pos'])
16df['reviews_VADER_compound'] = df['reviews_sentiment_VADER'].apply(lambda x: x['compound'])
17
18# Putting VADER compound scores into buckets to see which compound scores are the most common
19bin_edges = [-1.0, -0.9, -0.8, -0.7, -0.6, -0.5, -0.4, -0.3, -0.2, -0.1, 0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0]
20df['headers_VADER_compound_buckets'] = pd.cut(df['headers_VADER_compound'], bins = bin_edges)
21df['reviews_VADER_compound_buckets'] = pd.cut(df['reviews_VADER_compound'], bins = bin_edges)
22
23header_compound_count = df['headers_VADER_compound_buckets'].value_counts()
24header_compound_count = header_compound_count.reindex(bin_edges)
25review_compound_count = df['reviews_VADER_compound_buckets'].value_counts()
26review_compound_count = review_compound_count.reindex(bin_edges)
27
28plt.figure(figsize = (12, 6))
29
30plt.subplot(1, 2, 1)
31header_compound_count.plot(kind = 'bar', color = 'royalblue')
32plt.title('Headers')
33plt.xlabel('Sentiment Score')
34plt.ylabel('Count')
35
36plt.subplot(1, 2, 2)
37review_compound_count.plot(kind = 'bar', color = 'seagreen')
38plt.title('Reviews')
39plt.xlabel('Sentiment Score')
40plt.ylabel('Count')
41plt.suptitle('Distrubtion of VADER Compound Scores', fontweight = 'bold')
42plt.tight_layout()
43# plt.show()

The most common compound score for the headers is 0, or neutral, while the reviews are most frequently extremely positive. I think it's also especially striking how few negative compound scores are present.

Length of Text and It's Effect on Sentiment Scores

I am suspicious, however, of why the reviews are generally more positive than than the headers, and I suspect it’s because of their length. Remember how each word is assigned a value between -4 and 4 before being converted to a scale of -1 to 1 when reporting the sentiment scores? Well, after summing the values of each word a normalization is applied to map the value between -1 and 1. The equation used is x/(√x^2+α) where x is the sum of the sentiment scores of the constituent words of the sentence and where alpha is a normalization parameter set to 15. What this means is that as x grows larger -- as the number of words that you’re trying to analyze increases -- we’ll get a score closer to -1 or 1.

In this dataset the reviews average over 24 words while the headers barely scratch 3, and it’s this discrepancy in length of text that I think is largely causing these two visualizations to differ slightly. For example, I averaged the positive, neutral, negative, and compound scores for the headers and reviews. If length didn’t have an effect we would expect them to be fairly similar, but instead:

	Avg Negative	Avg Neutral	Avg Positive	Avg Compound
Headers	0.016740494296577948	0.5003085551330798	0.47838764258555133	0.36292769961977184
Reviews	0.0318893536121673	0.536294866920152	0.43086692015209127	0.7771844866920151

Scroll to view the full table.

The average negative, neutral, and positive scores are all fairly similiar, but the compound score for the reviews is significantly higher than the compound score for the headers. I'm going to chalk this up to the difference in length between the reviews and headers, but we can also try to confirm this with another visualization.

1# Length of reviews and headers
2df['length_of_review'] = df['tokenized_nostop_body'].str.len()
3df['length_of_header'] = df['tokenized_nostop_headers'].str.len()
4
5plt.figure(figsize = (12, 8))
6
7plt.subplot(2, 3, 1)
8sns.scatterplot(data = df,
9                x = 'length_of_review',
10                y = 'reviews_VADER_pos',
11                color = 'limegreen')
12plt.title('Positive Sentiment')
13plt.xlabel('Length of Review')
14plt.ylabel('Positive Score')
15
16plt.subplot(2, 3, 2)
17sns.scatterplot(data = df,
18                x = 'length_of_review',
19                y = 'reviews_VADER_neu',
20                color = 'khaki')
21plt.title('Neutral Sentiment')
22plt.xlabel('Length of Review')
23plt.ylabel('Neutral Score')
24
25plt.subplot(2, 3, 3)
26sns.scatterplot(data = df,
27                x = 'length_of_review',
28                y = 'reviews_VADER_neg',
29                color = 'firebrick')
30plt.title('Negative Sentiment')
31plt.xlabel('Length of Review')
32plt.ylabel('Negative Score')
33
34plt.subplot(2, 3, 5)
35sns.scatterplot(data = df,
36                x = 'length_of_review',
37                y = 'reviews_VADER_compound',
38                color = 'royalblue')
39plt.title('Compound Sentiment')
40plt.xlabel('Length of Review')
41plt.ylabel('Compound Score')
42plt.suptitle('VADER Sentiment Scores by Length of Reviews', fontweight = 'bold')
43plt.tight_layout()
44# plt.show()

I think it’s most useful to focus on the Compound Sentiment graph, which clearly shows that as the length of the review increases so does the compound score. It’s fun being able to demonstrate this concept in a visualization, but it’s also worth noting this also illuminates a severe downside of using the VADER model. It conflates a long review with a positive review, but we intuitively know that's not always the case.

Sentiment Scores and Star Ratings

Because the length of review has an effect on the sentiment score I'm curious how they compare to the star ratings customers gave.

1# Checking to see how the compound, pos, and neg scores compare to the star ratings for the headers
2plt.figure(figsize=(15, 8))
3
4# Pos VADER scores
5plt.subplot(2, 3, 1)
6sns.barplot(data = df,
7            x = 'star',
8            y = 'headers_VADER_pos',
9            color = 'limegreen')
10plt.title('Positive Sentiment')
11plt.xlabel('Star Rating')
12plt.ylabel('Positive Score')
13
14# Neutral VADER scores
15plt.subplot(2, 3, 2)
16sns.barplot(data = df,
17            x = 'star',
18            y = 'headers_VADER_neu',
19            color = 'khaki')
20plt.title('Neutral Sentiment')
21plt.xlabel('Star Rating')
22plt.ylabel('Neutral Score')
23
24# Neg VADER scores
25plt.subplot(2, 3, 3)
26sns.barplot(data = df,
27            x = 'star',
28            y = 'headers_VADER_neg',
29            color = 'firebrick')
30plt.title('Negative Sentiment')
31plt.xlabel('Star Rating')
32plt.ylabel('Negative Score')
33
34# Compound VADER scores
35plt.subplot(2, 3, 5)
36sns.barplot(data = df,
37            x = 'star',
38            y = 'headers_VADER_compound', 
39            color = 'royalblue')
40plt.title('Compound Sentiment')
41plt.xlabel('Star Rating')
42plt.ylabel('Compound Score')
43plt.suptitle('VADER Sentiment Scores by Star Rating for Headers', fontweight = 'bold')
44plt.tight_layout()
45# plt.show()
46
47# Checking to see how the compound, pos, and neg scores compare to the star ratings for the reviews
48plt.figure(figsize=(15, 8))
49
50# Pos VADER scores
51plt.subplot(2, 3, 1)
52sns.barplot(data = df,
53            x = 'star',
54            y = 'reviews_VADER_pos',
55            color = 'limegreen')
56plt.title('Positive Sentiment')
57plt.xlabel('Star Rating')
58plt.ylabel('Positive Score')
59
60# Neutral VADER scores
61plt.subplot(2, 3, 2)
62sns.barplot(data = df,
63            x = 'star',
64            y = 'reviews_VADER_neu',
65            color = 'khaki')
66plt.title('Neutral Sentiment')
67plt.xlabel('Star Rating')
68plt.ylabel('Neutral Score')
69
70# Neg VADER scores
71plt.subplot(2, 3, 3)
72sns.barplot(data = df,
73            x = 'star',
74            y = 'reviews_VADER_neg',
75            color = 'firebrick')
76plt.title('Negative Sentiment')
77plt.xlabel('Star Rating')
78plt.ylabel('Negative Score')
79
80# Compound VADER scores
81plt.subplot(2, 3, 5)
82sns.barplot(data = df,
83            x = 'star',
84            y = 'reviews_VADER_compound', 
85            color = 'royalblue')
86plt.title('Compound Sentiment')
87plt.xlabel('Star Rating')
88plt.ylabel('Compound Score')
89plt.suptitle('VADER Sentiment Scores by Star Rating for Reviews', fontweight = 'bold')
90plt.tight_layout()
91# plt.show()

Relationship between compound scores and star ratings for headers.

Relationship between compound scores and star ratings for reviews.

There's definitely alignment here between the sentiment scores and the star ratings. As the star rating increases the positive score increases, the negative score decreases, and the compound score increases.

Sentiment Scores Over Time

I think it would also be instructive to see how the compound sentiment scores have changed over time.

1# Avg compound score for reviews and headers by year
2average_review_comp_score_by_year = df.groupby('year')['reviews_VADER_compound'].mean().sort_index()
3review_count_by_year = df['year'].value_counts().sort_index()
4average_header_comp_score_by_year = df.groupby('year')['headers_VADER_compound'].mean()
5header_count_by_year = df['year'].value_counts().sort_index()
6
7# How has compound score changed over the years for reviews?
8fig, ax = plt.subplots(figsize=(10, 6))
9
10average_review_comp_score_by_year.plot(kind='bar', ax=ax, color='royalblue', label='Avg Review Compound Score')
11
12years_range = range(len(review_count_by_year))
13
14ax2 = ax.twinx()  
15ax2.plot(years_range, review_count_by_year, color='black', marker='o', label='Review Count')
16
17ax2.set_xticks(years_range)
18ax2.set_xticklabels(review_count_by_year.index)
19ax.set_xlabel('Year')
20ax.set_ylabel('Avg Review Compound Score', color='black')
21ax2.set_ylabel('Review Count', color='black')
22ax.set_title('Average Review Compound Score and Review Count by Year', fontweight = 'bold')
23lines, labels = ax.get_legend_handles_labels()
24lines2, labels2 = ax2.get_legend_handles_labels()
25ax2.legend(lines + lines2, labels + labels2, loc='upper left')
26# plt.show()

I think OrNot should be incredibly proud with this. As the number of reviews has increased (used as a proxy for the amount of items sold), the review compound scores have held steady. They’re doing more business but keeping people just as happy.

They’re doing just as great across all their product categories as well.

1# Compound scores by product category
2plt.figure(figsize = (12, 8))
3
4average_review_comp_score_by_product_category.plot(kind = 'bar', color = 'royalblue')
5plt.title('Average VADER Compound Score by Product Category', fontweight = 'bold')
6plt.xlabel('Product Category')
7plt.ylabel('Compound Score')
8plt.xticks(fontsize=10)
9plt.xticks(rotation=45)
10plt.tight_layout()
11# plt.show()

Additional Limitations of the VADER Model

Before we move on with more analysis, I think it’s worthwhile to circle back to the VADER model’s limitations. We already discussed how length can affect sentiment scores, but that’s not its only downfall. For lack of a better word, it can be tricked. The English language is complex after all.

Let’s take a look at one of the lowest compound scores for reviews that were given a 5 star rating:

1five_star_reviews = df[df['star'] == 5]
2lowest_review_comp_score_and_5_star = five_star_reviews.nsmallest(5, 'reviews_VADER_compound')
3
4for index, review in lowest_review_comp_score_and_5_star.iterrows():
5    print("User:", review['users'])
6    print("Date:", review['date'])
7    print("Product:", review['product'])
8    print("Headers:", review['headers'])
9    print("Body:", review['body'])
10    print("Star Rating:", review['star'])
11    print("VADER Compound Score (Review):", review['reviews_VADER_compound'])
12    print("VADER Compound Score (Header):", review['headers_VADER_compound'])
13    print("============================================")

Header: works and looks great, durable!

Body: this little bag was exactly what i wanted for my gravel bike - big enough to hold my mini pump, phone, and snacks. it tucked right under my wahoo computer. on the second “real” ride with it i got hit by a van and dragged underneath it - my bike is dead, but the bag survived! no tears, no failure of the rigid parts, just a bit dirty. definitely get this bag. it kicks ***.

Star Rating: 5.0

VADER Compound Score (Review): -0.8225

VADER Compound Score (Header): 0.6249

What a review, right? As a human it’s obvious that this is an incredibly good review — the user is complimenting the bags durability. But the VADER model struggles because of some strongly negative words like ‘dead’.

This effect can also be seen when looking at some of the lowest compound scores for headers that were given a 5 star rating:

1lowest_comp_header_score_and_5_star = five_star_reviews.nsmallest(5, 'headers_VADER_compound')
2for index, review in lowest_comp_header_score_and_5_star.iterrows():
3    print("User:", review['users'])
4    print("Date:", review['date'])
5    print("Product:", review['product'])
6    print("Headers:", review['headers'])
7    print("Body:", review['body'])
8    print("Star Rating:", review['star'])
9    print("VADER Compound Score (Header):", review['headers_VADER_compound'])
10    print("VADER Compound Score (Review):", review['reviews_VADER_compound'])
11    print("============================================")

Header: quiver killer

Body: i grab this jersey 10 times out of 10 for the crisp morning/evening headlands loops. it's a true quiver killer; breathes really well, keeps you toasty, and cuts the wind. extra kudos for the soft earth tones (matchy-matchy with the stigmata!). more often than not, i find myself pairing with a vest vs jacket. ornot killing it as always.

Star Rating: 5.0

VADER Compound Score (Header): -0.6486

VADER Compound Score (Review): -0.5279

Header: killer jersey

Body: awesome. few rides in the heat and in cooler temps and this jersey was comfortable for both conditions. definitely recommend

Star Rating: 5.0

VADER Compound Score (Header): -0.6486

VADER Compound Score (Review): 0.9118

The VADER model doesn’t understand the nuance of the words ‘killer’ in these instances, while it’s easy for you and me. While not super useful for my analysis, I think keeping these inconsistencies in mind while working with this data was a good learning exercise.

Bi/Tri/Quadgrams

Moving on though, the Natural Language Toolkit (NLTK) has some really useful features for sentiment analysis, especially by being able to find bigrams, trigrams, and quadgrams (the most common two-word, three-word, and four-word phrases) in the dataset. While the sentiment scores were overwhelmingly positive, we’ve seen how it can be artificially inflated/deflated depending on the length or certain language processing shortcomings. Finding these bi/tri/quadgrams will, however, hopefully give us more specificity about what people enjoy about OrNot’s products.

1# Finding common two-word phrases throughout reviews
2bigram_finder = BigramCollocationFinder.from_words(flattened_reviews)
3bigram_finder.apply_freq_filter(50)
4bigrams_with_freq_reviews = bigram_finder.ngram_fd.items()
5sorted_bigrams_with_freq_reviews = sorted(bigrams_with_freq_reviews, key = lambda x: x[1], reverse = True)
6# for bigram, freq in sorted_bigrams_with_freq_reviews:
7#     print(f": {bigram}, {freq}")
8
9# Finding common three-word phrases throughout reviews
10trigram_finder = TrigramCollocationFinder.from_words(flattened_reviews)
11trigram_finder.apply_freq_filter(10)
12trigrams_with_freq_reviews = trigram_finder.ngram_fd.items()
13sorted_trigrams_with_freq_reviews = sorted(trigrams_with_freq_reviews, key = lambda x: x[1], reverse = True)
14# for trigram, freq in sorted_trigrams_with_freq_reviews:
15#     print(f": {trigram}, {freq}")
16
17# Finding common four-word phrases throughout reviews
18quadgram_finder = QuadgramCollocationFinder.from_words(flattened_reviews)
19quadgram_finder.apply_freq_filter(5)
20quadgrams_with_freq_reviews = quadgram_finder.ngram_fd.items()
21sorted_quadgrams_with_freq_reviews = sorted(quadgrams_with_freq_reviews, key = lambda x: x[1], reverse = True)
22# for quadgram, freq in sorted_quadgrams_with_freq_reviews:
23#     print(f": {quadgram}, {freq}")

Because there's so many bi/tri/quadgrams, please click here for a full list.

People really enjoy the fit of the clothing, how comfortable it is, and the quality of the materials. The trigrams doubled-down on people’s love of the fit and comfort but also revealed that people frequently mentioned the two-way zipper (more on this in a bit). It’s also great to see that people commonly mention that they’ve purchased other items previously, are planning to do so in the future, and would recommend their products to others.

This is yet another example of the overwhelmingly positive feelings customers have towards OrNot. This is great, but I’m also curious to see if there are any hang-ups in people’s experiences. What can OrNot do better? Do customers have any suggestions?

Concordances

Thankfully, the NLTK library can help us answer these questions through something called concordances. A concordance allows us to see every occurrence of a given word or phrase as well as some of the surrounding context. For example, I can find the concordance for the phrase ‘I wish’ in order to better understand what customers' desires/suggestions might be.

1# Finding instances where people say 'wish'
2concordance_index = ConcordanceIndex(flattened_reviews)
3
4iwish_concordance = concordance_index.find_concordance(['i', 'wish'])
5for concordance_line in iwish_concordance:
6    left_context = ' '.join(concordance_line.left)
7    matched_word = concordance_line.query
8    right_context = ' '.join(concordance_line.right)
9    print(f"Left Context: {left_context}")
10    print(f"Matched Word: {matched_word}")
11    print(f"Right Context: {right_context}")
12    print("============================================")
13print(len(iwish_concordance))

In our dataset that phrase appeared 40 times. Almost all of these instances were one off wishes like ‘I wish this was black instead of grey’ or ‘I wish the pockets held a little more’. However, there was one repeated desire and that was about the zipper — ‘I wish the zippers were nicer quality they feel a little cheap and I personally would have paid a few dollars more to have nice zippers’ and ‘I wish it was a molded tooth zipper instead of a nylon coil zipper’. Remembering that the two-way zipper was a very common trigram my interest was immediately piqued.

Before delving into this more I continued with more concordances with the word ‘improve’.

1improve_concordance = concordance_index.find_concordance('improve')
2for concordance_line in improve_concordance:
3    left_context = ' '.join(concordance_line.left)
4    matched_word = concordance_line.query
5    right_context = ' '.join(concordance_line.right)
6    print(f"Left Context: {left_context}")
7    print(f"Matched Word: {matched_word}")
8    print(f"Right Context: {right_context}")
9    print("============================================")
10print(len(improve_concordance))

Low and behold there was another mention of the zipper — ‘I would still love if the team could upgrade the zipper and improve comfort at the top of the zipper when fully closed’. While continuing these concordances with ‘only thing’ another comment appeared — ‘The only thing that could use improvement is the quality of the zipper. It feels cheap and gets stuck sometimes’. I think we’re on to something!

The Zipper Concordances

I found all concordances for the word ‘zipper’ and over 300 results were returned. I went through each one, and the overwhelming majority were positive. People were especially enthusiastic about the two-way zip feature for thermo-regulation; however, there was still a sizable amount of comments that mentioned the zippers could be difficult to use either because both hands were required, they were too small for people to use while wearing gloves, or that they regularly snagged/felt cheap. The worst offender was the Magic Shell Jacket which received the most comments regarding the quality of the zipper; however, many of these came from reviews back in 2019. I’d be curious to know if OrNot made changes to address this. Additionally the Lightweight House Jersey had a pair of comments about the zipper not being comfortable when fully zipped up around the neck.

Conclusion

After collecting all of OrNot's customer reviews and analyzing them I feel confident in saying people love their products. They have incredibly high average star ratings issued by reviewers, those reviews scored highly using the VADER sentiment analysis model, and the bi/tri/quadgrams revealed more specific qualitative aspects that make their products so appealing. The only minor critique people had was the zipper and its functionality; however, as mentioned previously, this is a very minor critique and it was difficult finding other things people wanted improved.

I'm so glad to have found OrNot so early in my cycling journey, and based on my experience with the company, and especially after my time analyzing other customers' reviews, I will confidently recommend them to any other cyclist.

Learning Take-Aways

Lexical approaches to sentiment analysis has its issues. The length of text and the model's propensity to be tricked are certainly drawbacks, but lucky there are other models like HuggingFace's BERT. I'd love to come back to this dataset and perform similiar analysis with BERT and see what the difference in results would be.

Data begets more data. I started this project with only 10 attributes. I finished with 35. I'm honestly astonished at how quickly it blew up, and it goes to show how important good organizational practices are to keep track of everything.

There are so many tools to aid in analysis and I'm just beginning to find and use some of them. Basic statistical and sentiment analysis are just the tip of the iceberg, and I can't wait to learn and implement other libraries/methods/ideas to make my analysis more robust.