Preprocessing & clustering
Advanced settings
Distance weight parameter (γ)
Gamma (γ) controls the relative weight of categorical vs. continuous variables in distance calculations. Auto-mode uses the average standard deviation of continuous features (typically works well). Increase γ to give categorical variables more influence; decrease to prioritize continuous variables.
Auto: γ = (will be calculated after data load)
Additional info & guidance
Start with k=3–4 and run diagnostics for k=2–8. Look for an elbow in the cost plot and high silhouette values (>0.3) to identify well-separated clusters. Because k-prototypes uses multiple random initializations, results are generally stable but may vary slightly between runs.
Standardization is recommended when continuous variables have very different scales (e.g., age 0–100 vs. spend $0–$10,000). Note: standardization affects auto-calculated gamma by changing variance structure. If clusters seem overly driven by one variable type, adjust gamma manually.