Skip to Main Content
Spotfire Ideas Portal
Status Future Consideration
Product Spotfire
Categories Data Access
Created by Guest
Created on May 14, 2019

Control whether rows can be added when adding columns

Add a feature to the Add Columns transformation to control whether new rows can be added and what to do when multiple rows would be added (e.g. provide an aggregation function to use).

We sometimes experience issues where our users have built DXPs where the join conditions on their Add Columns transformations are incorrect, meaning that data tables can rapidly grow to many times their original size. This can have knock-on effects such as causing the DXP to take a long time to open, or to use excessive memory on our Web Players. As an example, I saw one DXP where a user took a table containing a few hundred rows, incorrectly added columns from another table, then did a similar operation to add more columns and ended up with a table of >50M rows - their charts still worked properly because they were showing aggregations such as min/max/average, but the report took 10 minutes to open when it should have taken a few seconds. I’ve seen other examples where the data is joined on a column that sometimes unexpectedly contained null values - a report that was working fine one day suddenly caused problems using up >100GB of RAM on our web player, just because a user had entered a null value into their data.

It is obviously useful to allow for Cartesian style joins when adding columns, but most users don’t expect the number of rows to increase and it would be preferable to see an error when the join operation didn’t result in a single row operation.

  • Attach files
  • Guest
    Reply
    |
    May 19, 2019

    Thanks Pete! I hope we can address this as part of making Spotfire's joins "smarter" in general.

  • Guest
    Reply
    |
    May 15, 2019

    Thanks Thomas. I'm not sure whether users are relying on the fact that add-columns can sometimes add rows or not, so I don't think a global preference would be the answer. I *think* what I'd suggest is a checkbox setting associated with the specific add-columns operation that says whether it's OK for it to create new rows or not, perhaps with a default mode of not adding rows. We've seen situations where users created what they thought were good joins and then had problems later because the data changed and their reports blew up (literally - we saw one where someone entered a couple of null values in a data source and the report used up all 250GB of RAM+disk on our web player before giving up!).

  • Guest
    Reply
    |
    May 14, 2019

    Thanks for a great idea Pete. I hope you are able to go in and edit the Add columns operation to the correct join type when this is discovered. Do you see this enhancement as a global preference? Or would a warning in the Add Columns dialog be ok?