Skip to Main Content
Spotfire Ideas Portal
Status Future Consideration
Product Spotfire
Categories Analytics
Created by Guest
Created on Jul 4, 2025

Addtl. algorithm for Data Relationship tool allowing to mix categorical and numerical variables

The Data Relationship tool is quick and easy way to search for pairwise correlations. However, the implemented algorithms a) mostly make strong assumptions and b) only allow searching relations between specific combinations of data types such as e.g.

  • numerical vs. numerical (linear regression)

  • numerical vs. categorical (ANOVA)

  • etc.

We often need to search for correlations across a mix of numerical and categorical variables though, ideally with an algorithm which does not make strong assumptions (such as linear correlation or normally distributed data). An overview of methods in the numerical world is given here: https://doi.org/10.1093/bib/bbt051
We found that a normalized Mutual Information ("symmetric uncertainty") based on an appropriate binning for numerical variables does a good job and it would be nice to have such a generic algorithm for mixed data type use cases as part of the Data Relationship tool.

  • Attach files