Skip to Main Content
Spotfire Ideas Portal
Status Future Consideration
Product Spotfire
Categories Visualization
Created by Guest
Created on Feb 16, 2017

Enable Box Plot with In-Database Connections

The box plot cannot be currently used for an in-database data connection. The reason as given on the Spotfire documentation is as follows: "The box plot requires both aggregated and non-aggregated data and has not yet been adjusted to handle in-db data."

For our non-technical users we have set up Spotfire as a data exploration tool with an in-db SAP HANA data connection. Our users deal with large number of columns as Patient Age, Weight, Lab Values etc. and a box plot would give out-of-the-box a nice summary of data in these columns.

We are much in need of this feature as currently getting these details is only possible by a cross table which requires not only a lot of manual steps but is also very clumsy to arrive at such a particular analysis.

  • Attach files
  • Guest
    Reply
    |
    Oct 8, 2021

    For Oracle, Microsoft SQL, and SAP HANA (these are the ones I checked) the window function percentile_cont and median exist.

    SELECT DISTINCT [Category],
    
    PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY [Measure])
    OVER (PARTITION BY [Category]) AS q1,
    MEDIAN([Measure])
    OVER (PARTITION BY [Category]) AS median,
    PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY [Measure])
    OVER (PARTITION BY [Category]) AS q3
    FROM [Table]
    WHERE [Measure] IS NOT NULL


    For Spark this exist as well with very little difference in syntax

    SELECT DISTINCT [Category],
    
    PERCENTILE([Measure], 0.25)
    OVER (PARTITION BY [Category]) AS q1,
    MEDIAN([Measure])
    OVER (PARTITION BY [Category]) AS median,
    PERCENTILE([Measure], 0.75)
    OVER (PARTITION BY [Category]) AS q3
    FROM [Table]
    WHERE [Measure] IS NOT NULL


    Since there is a nice, structured way to calculate values for Boxplot via SQL, a boxplot for in-database connections should be possible.

    This SQL approach enables box plot for big data, pushing the calculation to the external database and no need for import of data in-memory.

  • Guest
    Reply
    |
    Feb 6, 2021

    In most of modern databases the SQL function PERCENTILE_CONT is supported and with this the calculation of the median and quartiles should be possible without loading all the data into in-memory. Then only the rows for the outliers needs to be loaded.


  • Admin
    Niklas Amberntsson
    Reply
    |
    Apr 12, 2017

    I generalized the title of this Idea so that it covers all external data sources so that users also with other databases will be interested in voting for the Idea.