cancel
Showing results for 
Search instead for 
Did you mean: 

XML Generator, Entity referenced, but not declared

alex_panganiban
Contributor

I’m having trouble using the XML Generator with data that contains special symbols and characters. All the usual suspects (<, >, &, ", ', ) have been taken care of by using the HTML notations <, >, &amp, etc., but I’m having trouble with others that I feel should work. For instance, the symbol for the registered sign, ®, should be notated as ®, however, the XML generator gives me an error stating that “the entity, reg, is referenced but not delcared.” How do I go about declaring these. I have quite a number of symbols and accented/uumlat’ed letters that I need to use. Whether I use the rendered symbol or the HTML notation, the XML Generator fails.

How can I successfully include these symbols in my XML messages so that the XML Generator doesn’t see them as errors? See pics below for examples of my issue.

image
image

25 REPLIES 25

alex_panganiban
Contributor

Thanks, Matt. I’m looking forward to your response and what I can do to workaround this issue. Many of our vendors/brands have branding policies associated with the symbols and special characters that are being identified as undeclared and by the XML Generator and the XML Formatter, thus resulting in pipeline failure. Thanks again!

Hi Kuya,

Good day, here’s a PoC pipeline that escape the unicode string value

image

when parsed, the xml encode will the actual value…

image

so when doing a rewrite as an xml the actual char will be set (XML formatter)

image

XML Generator - special char_2021_06_07.slp (9.3 KB)

~EmEm

alex_panganiban
Contributor

Well, well! What a nice surprise. Great to hear from you Michael. I’ve missed you. Thanks for your response although I’m not sure it’s going to help me in the long run. If I were having problems with just this one special character, it would be easy to convert the escape code, however, considering the data that our vendors are supplying us, I need for the XML Generator and Formatter to be able to accept any UTF-8 compliant escape code.

The following is a list of what I have found in our data so far and none of these will pass through the XML Generator and XML Formatter snaps. And no doubt, vendors can, will, and should be able to add whatever they want to this list, as long as it’s HTML compliant. The snaps also sometimes do not recognize the character codes at all and I have had to convert them to either decimal or hexidecimal for the snaps to recognize them (for instance I had to convert &reg; to &#174; and &#xAE;. I see in your example you converted it to #00xAE; ).

® &reg;
  &nbsp;
“ &ldquo;
” &rdquo;
’ \rsquo;
– &ndash;
… &hellip;
— &mdash;
é &eacute;
É &Eacute;
™ &trade;
ó &oacute;
í &iacute;
ü &uuml;
Ü &Uuml;

alex_panganiban
Contributor

xml_SpecialCharactersSample.zip (9.8 KB)
@mbowen and @alchemiz I’m sharing a little sample of that illustrates that problem that I’m having.

Take note of the mapper after the XML Generator. In the mapper I use the string.replace() method. If I don’t do this, then the pipeline fails once it gets to the XML Parser that is downstream. By doing this, it also allows the XML formatter to complete successfully, however, I don’t wish to have the rendered symbol, but rather, I want to have the escape character code, just like the &lt; and &gt; remain intact.

How can I get either these 3 snaps (XML Generator, XML Parser, and XML Formatter) to treat all escape codes the same as it does for < and >, or how can I create customized snaps that are based on these 3 snaps. I am working against a deadline of June 16th, 2021 and this predictament is a big blocker for me.

Thanks, Alex

Thank you @alchemiz for the wonderful little pipeline.

@alex.panganiban.guild, you are correct that the Escape Special Characters option in the XML Generator snap only escapes these five entities: gt, lt, quot, amp, apos.

The snap is currently using a minimal xml escape handler, however, other snaps use a custom handler. So, this snap could also be updated to recognize/escape more xml.

Under the hood, the xml is parsed and transformed. It’s during the transformation that the register mark (&#174;) gets converted to a glyph (symbol). We may be able control this behavior too.

Creating a support ticket would allow for this work to get started, however, it likely would be completed and available by June 16.

Let me evaluate your sample pipeline and see if we can get a workaround in place that can help you sooner!