Univariate Statistics Analyzer

Descriptive Statistics Data Quality

Your first step in any data analysis: understand each variable individually before exploring relationships. Upload marketing data and instantly assess distributions, detect outliers, test for normality, and generate publication-ready summaries.

👨‍🏫 Professor Mode: Guided Learning Experience

New to descriptive statistics? Enable Professor Mode for step-by-step guidance through understanding distributions, central tendency, and data quality!

QUICK START: Choose Your Path

OVERVIEW & OBJECTIVE

What is univariate analysis? Univariate analysis examines one variable at a time to understand its distribution, central tendency (typical values), spread (variability), and shape. This is often your first step in any data analysis project—before looking at relationships between variables, you need to understand each variable individually.

Why start here? This tool helps you spot data quality issues (missing values, outliers), understand your customer base (what's typical vs. unusual), and make informed decisions about which variables matter for deeper analysis.

💡 When to use univariate analysis:

  • Data cleaning: Identify missing values, outliers, and data entry errors before any analysis
  • Assumption checking: Test if continuous variables are normally distributed (required for many statistical tests)
  • Variable selection: Understand which variables have enough variation to be useful predictors
  • Baseline profiling: Describe your customer base, transactions, or survey responses
Understanding Variable Types

Continuous Variables (Quantitative)

What they are: Numbers that can take any value within a range. Examples: age, revenue, website visit duration, number of clicks, satisfaction score (1-10).

What we measure:

  • Mean — The average. Tells you the "typical" value but can be skewed by outliers.
  • Median — The middle value when sorted. More resistant to outliers than the mean.
  • Mode — The most frequently occurring value.
  • Standard Deviation (SD) — How spread out the data is from the mean. Smaller = more consistent, larger = more variable.
  • Range & IQR — Range shows min to max. IQR (interquartile range) shows the middle 50% of data, filtering out extreme values.
  • Skewness — Measures if data leans left or right. Positive skew = long tail on right (most values are low). Negative skew = long tail on left (most values are high).
  • Kurtosis — Measures if data has heavy tails (extreme values) or is peaked. High kurtosis = more outliers.

Visualizations: Box plots show quartiles and outliers. Histograms show distribution shape. Violin plots combine both. Density plots smooth out histograms.

Categorical Variables (Qualitative)

What they are: Labels or categories. Examples: customer segment (A/B/C), product type, region, yes/no responses, gender, campaign name.

What we measure:

  • Frequency — How many times each category appears.
  • Percentage — What proportion of the total each category represents.
  • Mode — The most common category.
  • Unique values — How many distinct categories exist.

Visualizations: Bar charts show counts per category. Pie charts show proportions. Horizontal bars work better when you have many categories.

What this tool does for you
  • Auto-detection: Automatically identifies if each variable is continuous (numeric) or categorical (labels), so you don't have to decide.
  • Data quality dashboard: See missing data percentages, outlier counts, and distribution health at a glance.
  • Normality testing: Shapiro-Wilk test and Q-Q plots help you check if data is normally distributed (important for subsequent analyses).
  • Multiple visualizations: Choose from box plots, histograms, violin plots, and density plots for continuous variables; bar charts, pie charts, and horizontal bars for categorical variables.
  • Comprehensive statistics: Every relevant statistic calculated automatically—mean, median, SD, range, IQR, skewness, kurtosis for continuous; frequencies, percentages, mode for categorical.
  • Plain language explanations: Every statistic includes both technical definitions and practical interpretations ("what does this mean for my business?").
  • Export capabilities: Download summary tables as CSV for presentations or further analysis in Excel.
  • Manual override: If the tool misclassifies a variable, you can manually switch between continuous and categorical analysis.
Understanding Normality (Why It Matters)

Many statistical tests (t-tests, ANOVA, regression) assume your data follows a normal (bell-shaped) distribution. If your data is heavily skewed or has extreme outliers, these tests may give misleading results.

How to check for normality:

  • Visual inspection: Does the histogram look roughly bell-shaped?
  • Q-Q plot: Do points fall close to the diagonal line?
  • Shapiro-Wilk test: p > 0.05 suggests data could be normal; p < 0.05 suggests non-normality
  • Skewness & Kurtosis: Values near 0 suggest normality; |skew| > 1 or |kurtosis| > 2 suggest departures

⚠️ What if my data isn't normal?

Don't panic! You have options: (1) Use non-parametric tests that don't assume normality, (2) Transform your data (log, square root), or (3) With large samples (n > 30), many tests are robust to non-normality.

MARKETING SCENARIOS

📊 Real Marketing Data Scenarios

Select a preset scenario to explore real-world datasets with mixed variable types. Each scenario includes continuous metrics (revenue, engagement scores, visit duration) and categorical dimensions (segments, channels, regions) so you can practice interpreting both types.

INPUTS & SETTINGS

Upload Data or Select Variables

Drag & Drop data file (.csv, .tsv, .txt)

Provide a CSV or TSV with column headers. Up to 5,000 rows per file.