Skip to Main Content
Spotfire Ideas Portal

Enable Box Plot with In-Database Connections

The box plot cannot be currently used for an in-database data connection. The reason as given on the Spotfire documentation is as follows: "The box plot requires both aggregated and non-aggregated data and has not yet been adjusted to handle in-db data."

For our non-technical users we have set up Spotfire as a data exploration tool with an in-db SAP HANA data connection. Our users deal with large number of columns as Patient Age, Weight, Lab Values etc. and a box plot would give out-of-the-box a nice summary of data in these columns.

We are much in need of this feature as currently getting these details is only possible by a cross table which requires not only a lot of manual steps but is also very clumsy to arrive at such a particular analysis.

  • Attach files
      Drop here to upload
    • Guest
      Reply
      |
      Oct 8, 2021

      For Oracle, Microsoft SQL, and SAP HANA (these are the ones I checked) the window function percentile_cont and median exist.

      SELECT DISTINCT [Category],
      
      PERCENTILE_CONT(0.25) WITHIN GROUP (ORDER BY [Measure])
      OVER (PARTITION BY [Category]) AS q1,
      MEDIAN([Measure])
      OVER (PARTITION BY [Category]) AS median,
      PERCENTILE_CONT(0.75) WITHIN GROUP (ORDER BY [Measure])
      OVER (PARTITION BY [Category]) AS q3
      FROM [Table]
      WHERE [Measure] IS NOT NULL


      For Spark this exist as well with very little difference in syntax

      SELECT DISTINCT [Category],
      
      PERCENTILE([Measure], 0.25)
      OVER (PARTITION BY [Category]) AS q1,
      MEDIAN([Measure])
      OVER (PARTITION BY [Category]) AS median,
      PERCENTILE([Measure], 0.75)
      OVER (PARTITION BY [Category]) AS q3
      FROM [Table]
      WHERE [Measure] IS NOT NULL


      Since there is a nice, structured way to calculate values for Boxplot via SQL, a boxplot for in-database connections should be possible.

      This SQL approach enables box plot for big data, pushing the calculation to the external database and no need for import of data in-memory.

    • Guest
      Reply
      |
      Feb 6, 2021

      In most of modern databases the SQL function PERCENTILE_CONT is supported and with this the calculation of the median and quartiles should be possible without loading all the data into in-memory. Then only the rows for the outliers needs to be loaded.


    • Admin
      Niklas Amberntsson
      Reply
      |
      Apr 12, 2017

      I generalized the title of this Idea so that it covers all external data sources so that users also with other databases will be interested in voting for the Idea.