Text Analysis Lab

Text Analysis VADER · NRC · VAD

Qualitative data—customer reviews, survey open-ends, social posts, interview excerpts—carries meaning that goes far beyond a simple positive/negative label. This lab gives you three complementary methods for quantifying that meaning: sentiment (the overall positive–negative charge of a text), emotion profiling (which of eight discrete emotions are present), and dimensional affect (how pleasant, activated, or dominant the language feels). Run one method at a time to go deep, or run all three at once and compare what each lens reveals—and what it hides.

OVERVIEW & CONCEPTS

📊

Sentiment Analysis

How positive or negative is this text, overall?

VADER scores each token against a hand-validated lexicon of 7,500+ entries, then applies six rule layers—negation, intensifiers, contrastive conjunctions, punctuation, capitalization, and bigram dampeners—to produce a single compound score in \([-1, 1]\) plus separate positive, neutral, and negative proportions.

VADER · Hutto & Gilbert, 2014

🎭

Emotion Profiling

Which discrete emotions does this text carry?

The NRC EmoLex maps over 14,000 English words onto Plutchik's eight primary emotions (anger, anticipation, disgust, fear, joy, sadness, surprise, trust) plus binary positive/negative sentiment. Each token is matched and its emotions tallied; the result is an emotion profile showing the relative weight of each dimension.

NRC EmoLex · Mohammad & Turney, 2013

🧭

Dimensional Affect

How pleasant, energized, and in-control does the language feel?

The NRC-VAD Lexicon rates 20,000 words on three independent continuous dimensions: Valence (pleasure–displeasure), Arousal (activation–calm), and Dominance (in control–controlled). Averaging across matched tokens places a record in Russell's circumplex—e.g., high-V high-A = excitement; low-V high-A = anxiety.

NRC-VAD · Mohammad, 2018

Deep dive: How VADER computes its scores

VADER starts with a lookup: each word token is checked against a valence lexicon where entries range from –4 (extremely negative) to +4 (extremely positive). Words not in the lexicon get a valence of 0. The raw scores are then adjusted by six rule layers applied in order:

Booster words — intensifiers like "very," "extremely," or "incredibly" within a 3-token window before a sentiment word increase its absolute valence by 0.293 (full boost), 0.279 (one token back), or 0.263 (two tokens back). Downplayers like "somewhat" or "marginally" apply a corresponding negative adjustment.
Negation — "not," "never," "without," or any word ending in "n't" within 3 tokens before a sentiment word multiplies its valence by –0.74, reversing and attenuating the signal.
Contrastive "but" — when a sentence contains "but," words before it are attenuated (×0.5) and words after it are amplified (×1.5), capturing the flip implied by "it was good, but very expensive."
Capitalization — if a sentiment word (but not all words) is in ALL CAPS, its absolute valence is boosted by 0.733 to capture emphasis like "LOVE it."
Punctuation — each "!" up to three adds 0.292; two or more "?" add 0.18 (three or more add 0.36), reflecting how punctuation intensifies emotional expression in social text.
Bigram dampeners — phrases like "kind of" or "sort of" before a sentiment word reduce its influence, capturing the hedge in "kind of nice."

After adjustments, individual valences are summed into \(S\). The compound score normalizes this sum: \[\text{compound} = \frac{S}{\sqrt{S^2 + 15}}\] Values ≥ 0.05 are labeled positive, ≤ −0.05 are labeled negative, and everything in between is labeled neutral. Separate positive, neutral, and negative proportions describe how many tokens carry each tone (each non-neutral token adds a "+1 punch" to its proportion bucket).

Ambivalence — a compound near zero can arise two ways: many positive words and many negative words cancelling out (high ambivalence) or very few sentiment words at all (genuinely neutral). The lab distinguishes these by computing \(\text{ambivalence} = \frac{2 \cdot \min(pos,\, neg)}{pos + neg}\) (0 = purely one-sided, 1 = perfectly balanced).

Output	Range	Interpretation
compound	[−1, 1]	Overall tone; sign = direction, magnitude = strength
pos	[0, 1]	Proportion of tokens contributing positive valence
neu	[0, 1]	Proportion of tokens that are lexically neutral
neg	[0, 1]	Proportion of tokens contributing negative valence
ambivalence	[0, 1]	How mixed (vs. genuinely neutral) a near-zero compound is

Deep dive: NRC Emotion Profiling

The NRC Word-Emotion Association Lexicon (EmoLex) was built by crowdsourcing human judgments of word–emotion associations. Annotators labeled whether each word evoked each of Plutchik's eight emotions and whether it was positive or negative. The result is a binary lookup: a word either does or does not carry each emotion.

Emotion	Opposite	Example words
😠 Anger	Fear	fury, resentment, hostile
🔮 Anticipation	Surprise	eager, expect, forward
🤢 Disgust	Trust	nauseating, vile, revolting
😨 Fear	Anger	terrified, dread, nightmare
😊 Joy	Sadness	bliss, delight, wonderful
😢 Sadness	Joy	grief, loss, heartbreaking
😲 Surprise	Anticipation	astonishing, stunned, whoa
🤝 Trust	Disgust	reliable, honest, earnest

The analyzer tokenizes each text, strips punctuation, and looks up each word (case-insensitive). For each matched token, all of its associated emotions are incremented. Scores are then normalized by the number of matched tokens (lexicon coverage) so that a short review with three matched words and a long essay with thirty matched words can be compared fairly. A high joy score, for example, means a large share of the emotionally-loaded words in that text are joy-associated.

When NRC adds insight beyond VADER: Two texts can have the same VADER compound score but very different emotion profiles. A customer review reading "thrilling and terrifying" might score slightly positive on VADER but show high joy and high fear in NRC—suggesting a mixed-intense reaction rather than calm satisfaction.

Deep dive: VAD Dimensional Affect (Russell's Circumplex)

Rather than labeling emotions as discrete categories, dimensional models position affect in a continuous space. The NRC-VAD Lexicon provides continuous ratings (0–1) on three axes for each of 20,000 English words:

Dimension	Low end (0)	High end (1)	Example contrast
Valence (V)	Very unpleasant	Very pleasant	"grief" (0.06) vs. "triumph" (0.94)
Arousal (A)	Very calm / deactivated	Very activated	"lullaby" (0.20) vs. "panic" (0.90)
Dominance (D)	Controlled / powerless	In control / commanding	"victim" (0.12) vs. "authority" (0.88)

The Valence–Arousal plane corresponds to Russell's (1980) circumplex model of affect, which organizes emotions into four quadrants: high-V high-A (excited, happy), low-V high-A (anxious, angry), high-V low-A (calm, relaxed), and low-V low-A (sad, bored). Placing your text data in this space tells you not just the direction of emotion (positive or negative) but also its energy level.

Dominance adds a third axis capturing power dynamics. Texts that feel "in control"—commands, brand authority claims, expert opinions—show high dominance. Texts that communicate vulnerability, pleading, or powerlessness show low dominance. In consumer research, dominance differences often emerge between complaint texts (low-D) and recommendation texts (high-D) even when valence is similarly positive.

When to use each method — and when to use all three

Question you're asking	Best method	Why
Is this review positive or negative? How strongly?	VADER (Mode 1)	Compound score is the gold standard for overall valence in short social text; rule layers handle informal language well.
What emotions are customers feeling? Which brands trigger fear vs. trust?	NRC (Mode 2)	Discrete emotion labels give stakeholders a vocabulary beyond "it was bad." Fear and disgust are very different business problems.
How energized/calm does brand language feel? Do complaint tweets feel "activated"?	VAD (Mode 3)	Arousal is invisible to sentiment analysis. High-arousal negative text (rage) calls for different intervention than low-arousal negative (disappointment).
Full picture: convergence and divergence across all three lenses	All Three (Mode 4)	Methods can agree (clearly negative, fearful, unpleasant) or diverge (neutral VADER but high anger NRC). Divergence is often the most interesting finding.

DATA SOURCE

Load a marketing use case:

📊 Pre-loaded Text Datasets

Use these case studies to practice reading text analysis outputs before uploading your own data. Each scenario comes with a suggested analysis method—but you're always free to run any or all three engines on any dataset.

Upload CSV/TSV file

Drag & Drop CSV file (.csv, .tsv, .txt, .xls, .xlsx)

The first row should be column headers. One column should contain your text data.

CONFIGURE YOUR DATA

Text column

Choose which column contains the text you want to analyze.

Enable grouping by column

Optional: compare results across groups (e.g., brand, source, product rating).

Row identifier column (optional)

Selects a column to use as row labels in output tables and inspectors.

SELECT ANALYSIS METHOD

Choose how you want to analyze your text. You can switch methods at any time without re-uploading data.

📊

Sentiment Analysis

VADER · Overall positive / negative tone

Score each record on a compound scale from −1 (strongly negative) to +1 (strongly positive). Reveals overall valence, label distribution, ambivalence, and group differences.

Best for: customer satisfaction, brand perception, comparing positive/negative signal across groups.

🎭

Emotion Profiling

NRC EmoLex · Eight discrete emotions

Map each record onto Plutchik's eight emotions (anger, fear, joy, sadness, disgust, surprise, anticipation, trust). Reveals which emotional registers dominate your text data.

Best for: understanding which emotions drive engagement, distinguishing fear from anger, identifying trust-building language.

🧭

Dimensional Affect

NRC-VAD · Valence, Arousal, Dominance

Place records in continuous emotional space. Valence = pleasant vs. unpleasant. Arousal = activated vs. calm. Dominance = in-control vs. controlled. Visualized on Russell's circumplex.

Best for: segmenting arousal level (rage vs. disappointment), analyzing power framing in text, circumplex mapping.

🔬

Full Comparison

All Three Methods · Side-by-side

Run VADER, NRC, and VAD on the same data and compare what each method reveals. Convergence across methods strengthens conclusions; divergence points to complexity worth investigating.

Best for: publishable analyses, presentations where you need to defend your methodology, and rich qualitative-to-quantitative interpretation.

SENTIMENT RESULTS

Reading VADER output

VADER returns a compound score (−1 to +1) for each record: values ≥ 0.05 are labeled positive, values ≤ −0.05 are labeled negative, and everything in between is labeled neutral. The compound score is the primary output; the separate pos, neu, and neg proportions show how much of the text's vocabulary contributes each tone.

A compound score near zero deserves scrutiny. The ambivalence score below tells you whether neutral-looking records are genuinely low in sentiment ("the product arrived") or highly mixed ("outstanding service, but terrible product quality"). Mixed records may warrant qualitative follow-up even if their average looks benign.

Sentiment summary

Average compound score: –

Records labeled positive: –

Records labeled neutral: –

Records labeled negative: –

Run the analysis to see overall sentiment across your text records.

Labels distribution

Bar chart shows how records are classified by VADER's compound score thresholds. Mixed-neutral records (high ambivalence near zero) are distinguished from genuinely neutral ones.

DETAIL & WORKED EXAMPLE

Export Per-Record Results

Download all VADER scores (compound, pos, neu, neg, ambivalence, label) as a CSV—one row per record.

How VADER scored two examples

The worked example below shows the token-by-token valence coding for one relatively positive record and one relatively negative record. Color-coded chips show each token's base lexicon value; hover a chip to see the full modifier chain (negation, booster, capitalization) that produced the final contribution.

Run the analysis to see how each token in a relatively positive record contributes to its final sentiment score.

Run the analysis to see how each token in a relatively negative record contributes to its final sentiment score.

How this example is computed

Each token is looked up in the VADER lexicon. Tokens with a known valence receive a colored chip; words not in the lexicon appear as plain gray chips. The six rule layers (see Concepts above) then adjust those base scores before they are summed and normalized to the compound score shown.

Tip: Hover over any colored chip in the Record Inspector to see the full modifier chain—what the base lexicon value was, which rule modified it, and what the final contribution was.

🔗 From examples to patterns: These two examples show individual token logic. To see how sentiment is distributed across all your records, scroll to the Histogram and Box Plot below.

Record Inspector — Token-Level Valence Coding

Select any record to see how each token's valence contribution builds the final compound score. Colored chips indicate tokens with non-zero lexicon values; hover for the full modifier chain. Gray chips are lexicon misses.

Select a record:

Token-level breakdown will appear here after running the analysis.

ANALYSIS REPORT

APA-Style Statistical Reporting

Run the analysis to generate an APA-style report of your sentiment results.

Managerial Interpretation

Run the analysis to generate a managerial interpretation of your sentiment results.

Summary of Estimates

Measure	Estimate	Std. Dev.	Min	Max

Sentiment Distribution: Histogram

How to read bar colors: Green = positive territory (≥ 0.05), red = negative (≤ −0.05), gray = neutral. Bar height = record count in that range.

Run the analysis to see how compound sentiment scores are distributed across your text records.

EMOTION PROFILING RESULTS

Reading NRC output

Each emotion score represents the proportion of lexicon-matched tokens in a record that carry that emotion. A record scoring 0.40 on "joy" has 40% of its emotionally-loaded words tagged as joy-associated in the NRC lexicon. Scores do not sum to 1 because a single word can carry multiple emotions.

Lexicon coverage matters: a record with only two matched tokens may have an extreme emotion profile just by chance. The inspector below shows which tokens actually matched, so you can judge whether the scores are driven by meaningful signal or incidental vocabulary.

The radar chart makes cross-group comparison intuitive: a brand whose texts form a large "joy + trust" profile occupies a very different space from one whose texts form a "fear + anger" shape—even if their overall VADER sentiment scores are similar.

Run the NRC Emotion Lexicon analysis on your current text data.

Dominant Emotion

–

Avg positive (binary): –

Avg negative (binary): –

Avg lexicon coverage: –

Emotion Profile (Radar)

Each axis = proportion of matched tokens carrying that emotion. Overlapping traces = group comparison.

Emotion Scores (Bar Chart)

Bars sorted by average normalized score. When grouping is active, shows grouped bars per emotion.

Emotion Profile by Group

Per-group breakdown of dominant emotion, top emotions, and lexicon coverage. Radar and bar chart above also show full group traces.

Group	n	Dominant Emotion	Top 3 Emotions	Avg Coverage

Emotion Deviation from Overall Average

+Green = group scores above overall average; −Red = below. Values in percentage points (pp).

Emotion Summary Table

Emotion	Avg (% of matched tokens)	Std. Dev.

NRC Interpretation

Run the analysis to generate an interpretation.

Record Inspector — Token-Level Emotion Coding

Pick any record to see which tokens matched the NRC lexicon and which emotions each one carries.

Select a record:

Choose a record above to inspect token-level coding.

DIMENSIONAL AFFECT RESULTS

Reading VAD output

Each record receives three scores in [0, 1]: Valence (how pleasant the language is), Arousal (how activated or calm), and Dominance (how in-control vs. controlled the tone is). Scores near 0.5 are emotionally neutral on that dimension; scores far from 0.5 indicate strong displacement in one direction.

The Valence × Arousal scatter plot places each record in Russell's circumplex. The four quadrants reveal qualitatively different emotional states invisible to sentiment scoring— excited (high-V, high-A), anxious (low-V, high-A), relaxed (high-V, low-A), and sad / bored (low-V, low-A).

Dominance in marketing context: Customer complaints often show low valence and low dominance ("I felt helpless"). Recommendation texts often combine high valence with high dominance ("I confidently recommend this to everyone"). These V–D combinations tell a story that valence alone cannot.

Run the NRC-VAD analysis to score your text on Valence, Arousal, and Dominance.

FULL COMPARISON — ALL THREE METHODS

How Mode 4 works

All three engines run on the same dataset. Use the comparison table to see where methods converge (strong evidence) and where they diverge (interesting complexity worth investigating). The combined CSV download exports all scores for every record in a single flat file.

📊 VADER Pending

🎭 NRC Pending

🧭 VAD Pending

CROSS-METHOD COMPARISON

Methods Compared

Method	Key Result 1	Key Result 2	What it captures	What it misses

Worked Example: One Record, Three Methods

The record with the most extreme VADER compound score is used as the demonstration case. Compare how the three methods characterize the same text differently.

Cross-Method Insights

Where methods agree, conclusions are more robust. Where they diverge, the gap is itself informative. The analysis below flags the most notable convergences and divergences in your data.

Combined Export

A single CSV with VADER compound + pos/neg/neu + ambivalence, all 8 NRC emotion scores, and V/A/D values for every record—all methods side-by-side.

👨‍🏫 Professor Mode: Guided Learning Experience

OVERVIEW & CONCEPTS

Sentiment Analysis

Emotion Profiling

Dimensional Affect

DATA SOURCE

Use a Case Study

📊 Pre-loaded Text Datasets

Upload Your Data

CONFIGURE YOUR DATA

SELECT ANALYSIS METHOD

SENTIMENT RESULTS

Reading VADER output

Sentiment summary

Labels distribution

DETAIL & WORKED EXAMPLE

Export Per-Record Results

How VADER scored two examples

Record Inspector — Token-Level Valence Coding

Sentiment Character Analysis

ANALYSIS REPORT

APA-Style Statistical Reporting

Managerial Interpretation

Grouped Sentiment Comparison

Summary of Estimates

Sentiment Distribution: Histogram

Sentiment Distribution: Box Plot (By Group)

EMOTION PROFILING RESULTS

Reading NRC output

Dominant Emotion

Emotion Profile (Radar)

Emotion Scores (Bar Chart)

Emotion Profile by Group

Emotion Deviation from Overall Average

Emotion Summary Table

NRC Interpretation

Record Inspector — Token-Level Emotion Coding

DIMENSIONAL AFFECT RESULTS

Reading VAD output

Dimensional Scores

Valence × Arousal Scatter (Russell's Circumplex)

VAD Profile by Group

VAD Interpretation

Record Inspector — Token-Level VAD Coding

FULL COMPARISON — ALL THREE METHODS

How Mode 4 works

CROSS-METHOD COMPARISON

Methods Compared

Worked Example: One Record, Three Methods

Cross-Method Insights

Group Comparison Across Methods

Combined Export