Any-Any transforms appears double parse the XML data, which causes a problem for data that contains special characters. This does not occur with XML -> Tabular transformations.
Steps to reproduce:
1. Create this XML file:
--
<?xml version="1.0" encoding="utf-8" ?>
<sampleDocs>
<doc>
<id>ID_001</id>
<notes1>This is Notes1</notes1>
<notes2>This is Notes2</notes2>
</doc>
<doc>
<id>ID_002</id>
<notes1>cghchjcrffgxvcjhtgx#$#%%$^&#^%</notes1>
<notes2>cghchjcrffgxvcjhtgx#$#%%$^&#^%</notes2>
</doc>
</sampleDocs>
--
2. Point a File-XML data source at it
3. Create an XML to Tabular Mapping transformation
4. Observe it works as expected
5. Create an Any-Any transformation
6. Observe that, when executed, it fails with an error similar to "String index out of range: 30"
I did some investigation on this, and have determined that this appears to be caused by TDV double parsing the XML input (i.e, the first pass turns &# back into &#, which fails the second pass because it's not a valid character reference).
The following workarounds resolve the issue:
1. Wrap the input in a CDATA block: <notes1><![CDATA[cghchjcrffgxvcjhtgx#$#%%$^&#^%]]></notes1>
2. Double escape the ampersands: <notes2>cghchjcrffgxvcjhtgx#$#%%$^&amp;#^%</notes2>
Requested Solution:
Remove the double parsing behavior entirely (make Any-to-Any transforms behave identically to XML to Tabular Mapping by default), or make it a configurable setting of the TDV server.
Hi Jason:
Great - thanks for confirming.
Regards,
-Will
Hi Will,
This has been fixed in 7.0.8 HF2 which is available by contacting the support team.
Regards,
Jason