Feature Selection
Select continuous columns to use as predictors. All selected features must be numeric.
Compare five binary classification algorithms side-by-side on the same data. See how algorithm choice alone changes accuracy, precision, recall, and ROC curves — then examine the holdout set to spot which models overfit.
Binary classification assigns each observation to one of two classes (e.g., churn vs. retain, click vs. ignore). Different algorithms make different assumptions about the data, so no single classifier always wins. This tool lets you train multiple models on the same dataset and compare their performance on both training and holdout test data.
All five algorithms in this tool handle numeric (continuous) features only. In practice, most of these algorithms can also work with categorical features via encoding techniques like one-hot or ordinal encoding — that limitation is for simplicity here, not a real constraint of the methods.
| Algorithm | Decision Boundary | Best When… | Watch Out For… |
|---|---|---|---|
| Logistic Regression | Linear (hyperplane) | Classes are roughly linearly separable | Non-linear patterns, multicollinearity |
| Decision Tree | Axis-aligned rectangular splits | Non-linear patterns, mixed feature types | Overfitting (high variance without pruning) |
| k-Nearest Neighbors | Complex / local | Non-linear boundaries, small-medium data | Curse of dimensionality, slow on large data |
| Naive Bayes | Quadratic (Gaussian assumption) | Class-conditional features are roughly Gaussian | Correlated features violate independence assumption |
| Linear SVM | Linear (maximum-margin hyperplane) | High-dimensional data, clear margin between classes | Non-linear patterns (without kernel trick) |
Each scenario has different distributional characteristics designed to show that different algorithms excel on different problems. Try all three!
Provide a header row with numeric feature columns and a binary target column (0/1). Up to 2,000 rows recommended for responsive performance.
Drag & Drop raw data file (.csv, .tsv, .txt, .xls, .xlsx)
Include headers. All feature columns must be numeric. Target column must be binary (0/1).
These metrics reflect how well each model fits the data it was trained on.
| Metric |
|---|
Compare true/false positive and negative counts across all models side-by-side on holdout data.
The ROC curve plots the trade-off between true positive rate and false positive rate across all possible classification thresholds. A curve closer to the top-left corner indicates better discrimination.
When exactly 2 features are selected, we can visualize each algorithm’s decision regions on a 2D grid. Actual data points are overlaid (circles = class 0, triangles = class 1).
Auto-generated observations based on your comparison results. These highlight the most important patterns to notice.
| Metric | Meaning | Range |
|---|---|---|
| Accuracy | Proportion of all predictions that are correct | 0 – 1 (higher is better) |
| Precision | Of predicted positives, how many are truly positive | 0 – 1 (higher = fewer false alarms) |
| Recall | Of actual positives, how many were detected | 0 – 1 (higher = fewer missed cases) |
| F1 Score | Harmonic mean of precision and recall | 0 – 1 (balanced measure) |
| Specificity | Of actual negatives, how many were correctly identified | 0 – 1 |
| AUC | Area under the ROC curve; overall discrimination ability | 0.5 (random) – 1.0 (perfect) |
| Log Loss | Penalizes confident wrong predictions heavily | 0+ (lower is better) |
If a model scores much higher on the training set than the holdout set, it has memorized training noise rather than learned generalizable patterns. The gap column in the holdout table quantifies this.
Decision Trees are particularly prone to overfitting (try lowering max depth). Logistic Regression and Linear SVM typically show smaller gaps because their linear boundary can’t memorize complex noise.