Skip to Main Content
Spotfire Ideas Portal
Status Future Consideration
Product Spotfire
Categories Analytics
Created by Guest
Created on Feb 6, 2016

Improve data type matching between R and Spotfire Data Engine

Data types are a bit of an issue in Spotfire. The way it interacts with R is no exception. It's generally understandable that Spotfire tries to allocate different columns in any new dataset into different data types to facilitate visualization and analysis. E.g. One will want geographical coordinates to work with a map, have dates properly interpreted and ordered, etc. This is normally a good thing! However, this approach leads a problem: Spotfire also tries to enforce these data types into R data functions (open-source R with TSSS).

Why is this a problem? This is a problem because it renders R scripts and function supposed to run within Spotfire not testable. The recommended approach is to use a proper R IDE (e.g. RStudio) to develop the desired code and test it iteratively with a local instance of the data to be used in Spotfire afterwards. But, because it is impossible to export a data table WITH THEIR CORRESPONDING DATA TYPE to a CSV, XLS, or tab-separated file, the code developed is never guaranteed to be fully compatible with the data Spotfire will provide it with after it is implemented as a data function.

Based on my own experience of about a year of developing R functions to work within Spotfire applications, if the code has more than a couple of lines, it is bound not to work on a first attempt: One has then to tamper with the code, explicitly declare data types or simply keep trying different ways to do what is wanted until it finally works. This is often a deeply frustrating process.

In one example of a Spotfire application I'm developing right now, I managed to extract the data types from the instance of R running inside Spotfire and compare it with the same open-source R being used by Spotfire in its server. In this example only ~36% of the columns of the same dataset had the same data types. I.e. the majority of the fields in the data table have mismatching data types between the development environment (pure R) and the execution one (Spotfire data function with open-source R).

A possible work-around for this issue is to always explicitly declare ALL data types to be used in the R script, but that is not guaranteed to always work, as the data may be too severely disfigured by Spotfire that R may not understand it afterwards (dates are a usual suspect in this case...). Not to mention that large and dynamic datasets would make this approach not very sensible.

A more permanent solution would be to remove the data-type declaration from the interface Spotfire-R altogether. So that the data fed into R is always as "clean" as the data used to test the script in the first place. It could be provided as an option in the data function properties somewhere. This would allow a much larger chance of having compatible code and lead to a way less frustrating process of using Spotfire itself.

  • Attach files
  • Guest
    Reply
    |
    Sep 19, 2016

    If you export the data from Spotfire in SBDF format and read it into TERR with SpotfireUtils::ImportDataFromSBDF() the type information should be there.  Why are you using CSV format?