Text Analysis Lab

Text Analysis VADER · NRC · VAD

Qualitative data—customer reviews, survey open-ends, social posts, interview excerpts—carries meaning that goes far beyond a simple positive/negative label. This lab gives you three complementary methods for quantifying that meaning: sentiment (the overall positive–negative charge of a text), emotion profiling (which of eight discrete emotions are present), and dimensional affect (how pleasant, activated, or dominant the language feels). Run one method at a time to go deep, or run all three at once and compare what each lens reveals—and what it hides.

👨‍🏫 Professor Mode: Guided Learning Experience

New to text analysis? Enable Professor Mode for step-by-step guidance through sentiment scoring, emotion profiling, and dimensional affect analysis!

OVERVIEW & CONCEPTS

📊

Sentiment Analysis

How positive or negative is this text, overall?

VADER scores each token against a hand-validated lexicon of 7,500+ entries, then applies six rule layers—negation, intensifiers, contrastive conjunctions, punctuation, capitalization, and bigram dampeners—to produce a single compound score in \([-1, 1]\) plus separate positive, neutral, and negative proportions.

VADER · Hutto & Gilbert, 2014
🎭

Emotion Profiling

Which discrete emotions does this text carry?

The NRC EmoLex maps over 14,000 English words onto Plutchik's eight primary emotions (anger, anticipation, disgust, fear, joy, sadness, surprise, trust) plus binary positive/negative sentiment. Each token is matched and its emotions tallied; the result is an emotion profile showing the relative weight of each dimension.

NRC EmoLex · Mohammad & Turney, 2013
🧭

Dimensional Affect

How pleasant, energized, and in-control does the language feel?

The NRC-VAD Lexicon rates 20,000 words on three independent continuous dimensions: Valence (pleasure–displeasure), Arousal (activation–calm), and Dominance (in control–controlled). Averaging across matched tokens places a record in Russell's circumplex—e.g., high-V high-A = excitement; low-V high-A = anxiety.

NRC-VAD · Mohammad, 2018
Deep dive: How VADER computes its scores

VADER starts with a lookup: each word token is checked against a valence lexicon where entries range from –4 (extremely negative) to +4 (extremely positive). Words not in the lexicon get a valence of 0. The raw scores are then adjusted by six rule layers applied in order:

  1. Booster words — intensifiers like "very," "extremely," or "incredibly" within a 3-token window before a sentiment word increase its absolute valence by 0.293 (full boost), 0.279 (one token back), or 0.263 (two tokens back). Downplayers like "somewhat" or "marginally" apply a corresponding negative adjustment.
  2. Negation — "not," "never," "without," or any word ending in "n't" within 3 tokens before a sentiment word multiplies its valence by –0.74, reversing and attenuating the signal.
  3. Contrastive "but" — when a sentence contains "but," words before it are attenuated (×0.5) and words after it are amplified (×1.5), capturing the flip implied by "it was good, but very expensive."
  4. Capitalization — if a sentiment word (but not all words) is in ALL CAPS, its absolute valence is boosted by 0.733 to capture emphasis like "LOVE it."
  5. Punctuation — each "!" up to three adds 0.292; two or more "?" add 0.18 (three or more add 0.36), reflecting how punctuation intensifies emotional expression in social text.
  6. Bigram dampeners — phrases like "kind of" or "sort of" before a sentiment word reduce its influence, capturing the hedge in "kind of nice."

After adjustments, individual valences are summed into \(S\). The compound score normalizes this sum: \[\text{compound} = \frac{S}{\sqrt{S^2 + 15}}\] Values ≥ 0.05 are labeled positive, ≤ −0.05 are labeled negative, and everything in between is labeled neutral. Separate positive, neutral, and negative proportions describe how many tokens carry each tone (each non-neutral token adds a "+1 punch" to its proportion bucket).

Ambivalence — a compound near zero can arise two ways: many positive words and many negative words cancelling out (high ambivalence) or very few sentiment words at all (genuinely neutral). The lab distinguishes these by computing \(\text{ambivalence} = \frac{2 \cdot \min(pos,\, neg)}{pos + neg}\) (0 = purely one-sided, 1 = perfectly balanced).

OutputRangeInterpretation
compound[−1, 1]Overall tone; sign = direction, magnitude = strength
pos[0, 1]Proportion of tokens contributing positive valence
neu[0, 1]Proportion of tokens that are lexically neutral
neg[0, 1]Proportion of tokens contributing negative valence
ambivalence[0, 1]How mixed (vs. genuinely neutral) a near-zero compound is
Deep dive: NRC Emotion Profiling

The NRC Word-Emotion Association Lexicon (EmoLex) was built by crowdsourcing human judgments of word–emotion associations. Annotators labeled whether each word evoked each of Plutchik's eight emotions and whether it was positive or negative. The result is a binary lookup: a word either does or does not carry each emotion.

EmotionOppositeExample words
😠 AngerFearfury, resentment, hostile
🔮 AnticipationSurpriseeager, expect, forward
🤢 DisgustTrustnauseating, vile, revolting
😨 FearAngerterrified, dread, nightmare
😊 JoySadnessbliss, delight, wonderful
😢 SadnessJoygrief, loss, heartbreaking
😲 SurpriseAnticipationastonishing, stunned, whoa
🤝 TrustDisgustreliable, honest, earnest

The analyzer tokenizes each text, strips punctuation, and looks up each word (case-insensitive). For each matched token, all of its associated emotions are incremented. Scores are then normalized by the number of matched tokens (lexicon coverage) so that a short review with three matched words and a long essay with thirty matched words can be compared fairly. A high joy score, for example, means a large share of the emotionally-loaded words in that text are joy-associated.

When NRC adds insight beyond VADER: Two texts can have the same VADER compound score but very different emotion profiles. A customer review reading "thrilling and terrifying" might score slightly positive on VADER but show high joy and high fear in NRC—suggesting a mixed-intense reaction rather than calm satisfaction.

Deep dive: VAD Dimensional Affect (Russell's Circumplex)

Rather than labeling emotions as discrete categories, dimensional models position affect in a continuous space. The NRC-VAD Lexicon provides continuous ratings (0–1) on three axes for each of 20,000 English words:

DimensionLow end (0)High end (1)Example contrast
Valence (V)Very unpleasantVery pleasant"grief" (0.06) vs. "triumph" (0.94)
Arousal (A)Very calm / deactivatedVery activated"lullaby" (0.20) vs. "panic" (0.90)
Dominance (D)Controlled / powerlessIn control / commanding"victim" (0.12) vs. "authority" (0.88)

The Valence–Arousal plane corresponds to Russell's (1980) circumplex model of affect, which organizes emotions into four quadrants: high-V high-A (excited, happy), low-V high-A (anxious, angry), high-V low-A (calm, relaxed), and low-V low-A (sad, bored). Placing your text data in this space tells you not just the direction of emotion (positive or negative) but also its energy level.

Dominance adds a third axis capturing power dynamics. Texts that feel "in control"—commands, brand authority claims, expert opinions—show high dominance. Texts that communicate vulnerability, pleading, or powerlessness show low dominance. In consumer research, dominance differences often emerge between complaint texts (low-D) and recommendation texts (high-D) even when valence is similarly positive.

When to use each method — and when to use all three
Question you're askingBest methodWhy
Is this review positive or negative? How strongly? VADER (Mode 1) Compound score is the gold standard for overall valence in short social text; rule layers handle informal language well.
What emotions are customers feeling? Which brands trigger fear vs. trust? NRC (Mode 2) Discrete emotion labels give stakeholders a vocabulary beyond "it was bad." Fear and disgust are very different business problems.
How energized/calm does brand language feel? Do complaint tweets feel "activated"? VAD (Mode 3) Arousal is invisible to sentiment analysis. High-arousal negative text (rage) calls for different intervention than low-arousal negative (disappointment).
Full picture: convergence and divergence across all three lenses All Three (Mode 4) Methods can agree (clearly negative, fearful, unpleasant) or diverge (neutral VADER but high anger NRC). Divergence is often the most interesting finding.

DATA SOURCE

📚

Use a Case Study

📊 Pre-loaded Text Datasets

Use these case studies to practice reading text analysis outputs before uploading your own data. Each scenario comes with a suggested analysis method—but you're always free to run any or all three engines on any dataset.

📤

Upload Your Data

Drag & Drop CSV file (.csv, .tsv, .txt, .xls, .xlsx)

The first row should be column headers. One column should contain your text data.

CONFIGURE YOUR DATA

Choose which column contains the text you want to analyze.

Optional: compare results across groups (e.g., brand, source, product rating).

Selects a column to use as row labels in output tables and inspectors.