Qualitative data—customer reviews, survey open-ends, social posts, interview excerpts—carries meaning that goes
far beyond a simple positive/negative label. This lab gives you three complementary methods for quantifying that
meaning: sentiment (the overall positive–negative charge of a text), emotion profiling
(which of eight discrete emotions are present), and dimensional affect (how pleasant, activated,
or dominant the language feels). Run one method at a time to go deep, or run all three at once and compare what
each lens reveals—and what it hides.
👨🏫 Professor Mode: Guided Learning Experience
New to text analysis? Enable Professor Mode for step-by-step guidance through sentiment scoring, emotion profiling, and dimensional affect analysis!
OVERVIEW & CONCEPTS
📊
Sentiment Analysis
How positive or negative is this text, overall?
VADER scores each token against a hand-validated lexicon of 7,500+ entries, then applies six
rule layers—negation, intensifiers, contrastive conjunctions, punctuation, capitalization, and
bigram dampeners—to produce a single compound score in \([-1, 1]\) plus
separate positive, neutral, and negative proportions.
VADER · Hutto & Gilbert, 2014
🎭
Emotion Profiling
Which discrete emotions does this text carry?
The NRC EmoLex maps over 14,000 English words onto Plutchik's eight primary emotions
(anger, anticipation, disgust, fear, joy, sadness, surprise, trust) plus binary
positive/negative sentiment. Each token is matched and its emotions tallied;
the result is an emotion profile showing the relative weight of each dimension.
NRC EmoLex · Mohammad & Turney, 2013
🧭
Dimensional Affect
How pleasant, energized, and in-control does the language feel?
The NRC-VAD Lexicon rates 20,000 words on three independent continuous dimensions:
Valence (pleasure–displeasure), Arousal (activation–calm), and
Dominance (in control–controlled). Averaging across matched tokens places a
record in Russell's circumplex—e.g., high-V high-A = excitement; low-V high-A = anxiety.
NRC-VAD · Mohammad, 2018
Deep dive: How VADER computes its scores
VADER starts with a lookup: each word token is checked against a valence lexicon where entries
range from –4 (extremely negative) to +4 (extremely positive). Words not in the lexicon get a
valence of 0. The raw scores are then adjusted by six rule layers applied in order:
Booster words — intensifiers like "very," "extremely," or "incredibly" within
a 3-token window before a sentiment word increase its absolute valence by 0.293 (full boost), 0.279
(one token back), or 0.263 (two tokens back). Downplayers like "somewhat" or "marginally" apply a
corresponding negative adjustment.
Negation — "not," "never," "without," or any word ending in "n't" within 3
tokens before a sentiment word multiplies its valence by –0.74, reversing and attenuating the signal.
Contrastive "but" — when a sentence contains "but," words before it are
attenuated (×0.5) and words after it are amplified (×1.5), capturing the flip implied by "it was
good, but very expensive."
Capitalization — if a sentiment word (but not all words) is in ALL
CAPS, its absolute valence is boosted by 0.733 to capture emphasis like "LOVE it."
Punctuation — each "!" up to three adds 0.292; two or more "?" add 0.18 (three or
more add 0.36), reflecting how punctuation intensifies emotional expression in social text.
Bigram dampeners — phrases like "kind of" or "sort of" before a sentiment word
reduce its influence, capturing the hedge in "kind of nice."
After adjustments, individual valences are summed into \(S\). The compound score normalizes this
sum: \[\text{compound} = \frac{S}{\sqrt{S^2 + 15}}\] Values ≥ 0.05 are labeled positive,
≤ −0.05 are labeled negative, and everything in between is labeled neutral. Separate
positive, neutral, and negative proportions describe how many tokens carry each tone
(each non-neutral token adds a "+1 punch" to its proportion bucket).
Ambivalence — a compound near zero can arise two ways: many positive words and
many negative words cancelling out (high ambivalence) or very few sentiment words at all (genuinely
neutral). The lab distinguishes these by computing
\(\text{ambivalence} = \frac{2 \cdot \min(pos,\, neg)}{pos + neg}\) (0 = purely one-sided,
1 = perfectly balanced).
Proportion of tokens contributing positive valence
neu
[0, 1]
Proportion of tokens that are lexically neutral
neg
[0, 1]
Proportion of tokens contributing negative valence
ambivalence
[0, 1]
How mixed (vs. genuinely neutral) a near-zero compound is
Deep dive: NRC Emotion Profiling
The NRC Word-Emotion Association Lexicon (EmoLex) was built by crowdsourcing human judgments of word–emotion
associations. Annotators labeled whether each word evoked each of Plutchik's eight emotions and whether it was
positive or negative. The result is a binary lookup: a word either does or does not carry each emotion.
Emotion
Opposite
Example words
😠 Anger
Fear
fury, resentment, hostile
🔮 Anticipation
Surprise
eager, expect, forward
🤢 Disgust
Trust
nauseating, vile, revolting
😨 Fear
Anger
terrified, dread, nightmare
😊 Joy
Sadness
bliss, delight, wonderful
😢 Sadness
Joy
grief, loss, heartbreaking
😲 Surprise
Anticipation
astonishing, stunned, whoa
🤝 Trust
Disgust
reliable, honest, earnest
The analyzer tokenizes each text, strips punctuation, and looks up each word (case-insensitive). For each matched
token, all of its associated emotions are incremented. Scores are then normalized by the number of matched tokens
(lexicon coverage) so that a short review with three matched words and a long essay with thirty matched words can
be compared fairly. A high joy score, for example, means a large share of the emotionally-loaded words
in that text are joy-associated.
When NRC adds insight beyond VADER: Two texts can have the same VADER compound score but very
different emotion profiles. A customer review reading "thrilling and terrifying" might score slightly positive on
VADER but show high joy and high fear in NRC—suggesting a mixed-intense reaction rather than calm satisfaction.
Deep dive: VAD Dimensional Affect (Russell's Circumplex)
Rather than labeling emotions as discrete categories, dimensional models position affect in a continuous space.
The NRC-VAD Lexicon provides continuous ratings (0–1) on three axes for each of 20,000 English words:
Dimension
Low end (0)
High end (1)
Example contrast
Valence (V)
Very unpleasant
Very pleasant
"grief" (0.06) vs. "triumph" (0.94)
Arousal (A)
Very calm / deactivated
Very activated
"lullaby" (0.20) vs. "panic" (0.90)
Dominance (D)
Controlled / powerless
In control / commanding
"victim" (0.12) vs. "authority" (0.88)
The Valence–Arousal plane corresponds to Russell's (1980) circumplex model of affect, which organizes emotions into
four quadrants: high-V high-A (excited, happy), low-V high-A (anxious, angry), high-V low-A (calm, relaxed), and
low-V low-A (sad, bored). Placing your text data in this space tells you not just the direction of emotion (positive
or negative) but also its energy level.
Dominance adds a third axis capturing power dynamics. Texts that feel "in control"—commands,
brand authority claims, expert opinions—show high dominance. Texts that communicate vulnerability, pleading, or
powerlessness show low dominance. In consumer research, dominance differences often emerge between complaint texts
(low-D) and recommendation texts (high-D) even when valence is similarly positive.
When to use each method — and when to use all three
Question you're asking
Best method
Why
Is this review positive or negative? How strongly?
VADER (Mode 1)
Compound score is the gold standard for overall valence in short social text; rule layers handle informal language well.
What emotions are customers feeling? Which brands trigger fear vs. trust?
NRC (Mode 2)
Discrete emotion labels give stakeholders a vocabulary beyond "it was bad." Fear and disgust are very different business problems.
How energized/calm does brand language feel? Do complaint tweets feel "activated"?
VAD (Mode 3)
Arousal is invisible to sentiment analysis. High-arousal negative text (rage) calls for different intervention than low-arousal negative (disappointment).
Full picture: convergence and divergence across all three lenses
All Three (Mode 4)
Methods can agree (clearly negative, fearful, unpleasant) or diverge (neutral VADER but high anger NRC). Divergence is often the most interesting finding.
DATA SOURCE
📚
Use a Case Study
📊 Pre-loaded Text Datasets
Use these case studies to practice reading text analysis outputs before uploading your own data.
Each scenario comes with a suggested analysis method—but you're always free to run any or all three
engines on any dataset.
📚Use a Case Studyclick to switch
📤
Upload Your Data
Drag & Drop CSV file (.csv, .tsv, .txt, .xls, .xlsx)
The first row should be column headers. One column should contain your text data.
📤Upload Your Dataclick to switch
CONFIGURE YOUR DATA
Choose which column contains the text you want to analyze.
Optional: compare results across groups (e.g., brand, source, product rating).
Selects a column to use as row labels in output tables and inspectors.
SELECT ANALYSIS METHOD
Choose how you want to analyze your text. You can switch methods at any time without re-uploading data.
📊
Sentiment Analysis
VADER · Overall positive / negative tone
Score each record on a compound scale from −1 (strongly negative) to +1 (strongly positive). Reveals overall valence, label distribution, ambivalence, and group differences.
Best for: customer satisfaction, brand perception, comparing positive/negative signal across groups.
🎭
Emotion Profiling
NRC EmoLex · Eight discrete emotions
Map each record onto Plutchik's eight emotions (anger, fear, joy, sadness, disgust, surprise, anticipation, trust). Reveals which emotional registers dominate your text data.
Best for: understanding which emotions drive engagement, distinguishing fear from anger, identifying trust-building language.
🧭
Dimensional Affect
NRC-VAD · Valence, Arousal, Dominance
Place records in continuous emotional space. Valence = pleasant vs. unpleasant. Arousal = activated vs. calm. Dominance = in-control vs. controlled. Visualized on Russell's circumplex.
Best for: segmenting arousal level (rage vs. disappointment), analyzing power framing in text, circumplex mapping.
🔬
Full Comparison
All Three Methods · Side-by-side
Run VADER, NRC, and VAD on the same data and compare what each method reveals. Convergence across methods strengthens conclusions; divergence points to complexity worth investigating.
Best for: publishable analyses, presentations where you need to defend your methodology, and rich qualitative-to-quantitative interpretation.
SENTIMENT RESULTS
Reading VADER output
VADER returns a compound score (−1 to +1) for each record: values ≥ 0.05 are labeled
positive, values ≤ −0.05 are labeled negative,
and everything in between is labeled neutral. The compound score is the primary
output; the separate pos, neu, and neg proportions show how much of the text's
vocabulary contributes each tone.
A compound score near zero deserves scrutiny. The ambivalence score below tells you whether
neutral-looking records are genuinely low in sentiment ("the product arrived") or highly mixed
("outstanding service, but terrible product quality"). Mixed records may warrant qualitative follow-up
even if their average looks benign.
Sentiment summary
Average compound score:–
Records labeled positive:–
Records labeled neutral:–
Records labeled negative:–
Run the analysis to see overall sentiment across your text records.
Labels distribution
Bar chart shows how records are classified by VADER's compound score thresholds. Mixed-neutral records
(high ambivalence near zero) are distinguished from genuinely neutral ones.
DETAIL & WORKED EXAMPLE
Export Per-Record Results
Download all VADER scores (compound, pos, neu, neg, ambivalence, label) as a CSV—one row per record.
How VADER scored two examples
The worked example below shows the token-by-token valence coding for one relatively positive record and one
relatively negative record. Color-coded chips show each token's base lexicon value; hover a chip to see the
full modifier chain (negation, booster, capitalization) that produced the final contribution.
Run the analysis to see how each token in a relatively positive record contributes to its final sentiment score.
Run the analysis to see how each token in a relatively negative record contributes to its final sentiment score.
How this example is computed
Each token is looked up in the VADER lexicon. Tokens with a known valence receive a colored chip; words
not in the lexicon appear as plain gray chips. The six rule layers (see Concepts above) then adjust those
base scores before they are summed and normalized to the compound score shown.
Tip: Hover over any colored chip in the Record Inspector to see the full
modifier chain—what the base lexicon value was, which rule modified it, and what the final
contribution was.
🔗 From examples to patterns: These two examples show individual token logic.
To see how sentiment is distributed across all your records, scroll to the
Histogram and Box Plot below.
Record Inspector — Token-Level Valence Coding
Select any record to see how each token's valence contribution builds the final compound score. Colored chips
indicate tokens with non-zero lexicon values; hover for the full modifier chain. Gray chips are lexicon misses.
Token-level breakdown will appear here after running the analysis.
Sentiment Character Analysis
A compound score near zero can mean two very different things: a record with
high mixed sentiment (positive and negative pulling in opposite directions) vs. a record
that is genuinely neutral (very little charged language at all). This card separates the two
using the ambivalence score \(\text{amb} = \frac{2 \cdot \min(pos,\, neg)}{pos + neg}\).
ANALYSIS REPORT
APA-Style Statistical Reporting
Run the analysis to generate an APA-style report of your sentiment results.
Managerial Interpretation
Run the analysis to generate a managerial interpretation of your sentiment results.
Grouped Sentiment Comparison
Sentiment metrics broken down by group.
Summary of Estimates
Measure
Estimate
Std. Dev.
Min
Max
Sentiment Distribution: Histogram
How to read bar colors:Green = positive territory (≥ 0.05),
red = negative (≤ −0.05),
gray = neutral. Bar height = record count in that range.
Run the analysis to see how compound sentiment scores are distributed across your text records.
Sentiment Distribution: Box Plot (By Group)
Box Plot Guide: The box = middle 50% of scores.
The line inside = median. Whiskers extend to typical extremes.
Dots beyond whiskers are outliers.
Run the analysis to see the spread, central tendency, and outliers in your sentiment data.
EMOTION PROFILING RESULTS
Reading NRC output
Each emotion score represents the proportion of lexicon-matched tokens in a record that
carry that emotion. A record scoring 0.40 on "joy" has 40% of its emotionally-loaded words tagged as
joy-associated in the NRC lexicon. Scores do not sum to 1 because a single word can carry multiple
emotions.
Lexicon coverage matters: a record with only two matched tokens may have an extreme
emotion profile just by chance. The inspector below shows which tokens actually matched, so you can
judge whether the scores are driven by meaningful signal or incidental vocabulary.
The radar chart makes cross-group comparison intuitive: a brand whose texts form a
large "joy + trust" profile occupies a very different space from one whose texts form a "fear + anger"
shape—even if their overall VADER sentiment scores are similar.
Run the NRC Emotion Lexicon analysis on your current text data.
Dominant Emotion
–
Avg positive (binary):–
Avg negative (binary):–
Avg lexicon coverage:–
Emotion Profile (Radar)
Each axis = proportion of matched tokens carrying that emotion. Overlapping traces = group comparison.
Emotion Scores (Bar Chart)
Bars sorted by average normalized score. When grouping is active, shows grouped bars per emotion.
Emotion Profile by Group
Per-group breakdown of dominant emotion, top emotions, and lexicon coverage. Radar and bar chart above also show full group traces.
Group
n
Dominant Emotion
Top 3 Emotions
Avg Coverage
Emotion Deviation from Overall Average
+Green = group scores above overall average; −Red = below. Values in percentage points (pp).
Emotion Summary Table
Emotion
Avg (% of matched tokens)
Std. Dev.
NRC Interpretation
Run the analysis to generate an interpretation.
Record Inspector — Token-Level Emotion Coding
Pick any record to see which tokens matched the NRC lexicon and which emotions each one carries.
Choose a record above to inspect token-level coding.
DIMENSIONAL AFFECT RESULTS
Reading VAD output
Each record receives three scores in [0, 1]: Valence (how
pleasant the language is), Arousal (how activated or calm),
and Dominance (how in-control vs. controlled the tone is).
Scores near 0.5 are emotionally neutral on that dimension; scores far from 0.5 indicate strong
displacement in one direction.
The Valence × Arousal scatter plot places each record in Russell's circumplex. The
four quadrants reveal qualitatively different emotional states invisible to sentiment scoring—
excited (high-V, high-A), anxious (low-V, high-A), relaxed (high-V, low-A),
and sad / bored (low-V, low-A).
Dominance in marketing context: Customer complaints often show low valence and
low dominance ("I felt helpless"). Recommendation texts often combine high valence with high dominance
("I confidently recommend this to everyone"). These V–D combinations tell a story that valence alone cannot.
Run the NRC-VAD analysis to score your text on Valence, Arousal, and Dominance.
Dimensional Scores
Valence × Arousal Scatter (Russell's Circumplex)
Each dot = one record. X = valence, Y = arousal, marker size = dominance. Hover for details.
VAD Profile by Group
VAD Interpretation
Run the analysis to generate an interpretation.
Record Inspector — Token-Level VAD Coding
Pick any record to see which tokens matched the NRC-VAD lexicon and their individual V, A, and D scores.
Choose a record above to inspect token-level coding.
FULL COMPARISON — ALL THREE METHODS
How Mode 4 works
All three engines run on the same dataset. Use the comparison table to see where methods converge
(strong evidence) and where they diverge (interesting complexity worth investigating). The
combined CSV download exports all scores for every record in a single flat file.
📊VADERPending
🎭NRCPending
🧭VADPending
CROSS-METHOD COMPARISON
Methods Compared
Method
Key Result 1
Key Result 2
What it captures
What it misses
Worked Example: One Record, Three Methods
The record with the most extreme VADER compound score is used as the demonstration case.
Compare how the three methods characterize the same text differently.
Cross-Method Insights
Where methods agree, conclusions are more robust. Where they diverge, the gap is itself informative.
The analysis below flags the most notable convergences and divergences in your data.
Group Comparison Across Methods
Where all three methods agree on a group’s emotional character, the conclusion is robust. Divergence reveals nuance worth investigating.
Combined Export
A single CSV with VADER compound + pos/neg/neu + ambivalence, all 8 NRC emotion scores, and V/A/D
values for every record—all methods side-by-side.