Skip to Main Content
Spotfire Ideas Portal
Status Future Consideration
Product Spotfire
Created by Guest
Created on Jan 25, 2017

Improve automatic WP dump capture feature on 7.8

On Spotfire 7.8 Tibco added a new cool feature to be able to do automatic dump capture from non-responsive Web Players:

https://docs.tibco.com/pub/spotfire_server/7.8.0/doc/html/TIB_sfire_server_tsas_admin_help/GUID-8D5082B5-3178-4ABE-A438-F5E108B0CDFA.html

We are really interested in this feature as we have long experienced Web Player lock ups some of which Tibco Support have not been able to solve as they don't know what's causing them or are too quick for us to be able to create a dump manually when they happen. It also allows us to have certainty in that service will be automatically restored should Web Player stop responding. The problem we got with this feature is the configuration implementation. Like always the devil is in the detail, never more true in this case. 

To enable this feature you need to set the nodemanager.memorydump-after-failures parameter which basically is the number of retries (on top of a default 10 retries) after which the Server decides to create the WP dump and restart it. So if you set this parameter to 2 then it will try to communicate 12 times and then it will dump/kill the WP worker process.

I done some testing and found the frequency of the retries to be all over the place. Here are some gaps between each retry on a test I did: 34s, 26s, 8s, 26s, 8s, 18s, 8s, 18s, 8s, 8s. We spoke with Tibco Support about this and this is expected as some errors surface immediately, others time out after some time, etc. This is far from ideal then. If the length of each retry is so fluctuating then choosing to control when to create the dump based on the number of retries seems like a poor choice as in effect this means there is NO guarantee of how long the server will wait before creating an automatic dump and killing the Web Player worker. It could be 1 min or it could be 10, it all depends how quick each of the communication attempts takes to fail or timeout. A better approach would have been to be able to set the number of seconds before the dump is created. Granted, this threshold may be slightly exceeded on some occasions if the timeout/failure of the last communication attempt is long, but it will immensely more accurate than using memorydump-after-failures. 

Why is important to set this parameter right? Because you don't want to cause Web Player dumps/restarts due to your server being very busy and doing Garbage Collection events which cause Web Player to be unresponsive and could trigger a "false positive" when using this new feature.

  • Attach files