Forum Discussion
Does that other ETL process work with the same database(what Oracle users might call a schema)?
https://dev.mysql.com/doc/refman/5.7/en/charset-applications.html
The problem here is that unicode has a few problems, so it can’t be supported 100%, and everything and everyone support it a bit differently. The default is often ASCII, and if you want to support unicode:
- EVERYTHING must support it all the way… The input, processing, and output, as well as source and target.
- In some cases, like older Oracle installations, the operating system environment containing such instances must be setup right.
- In some cases, like some Microsoft installations, you must use the exact code page sometimes.
- With GRAPHIC characters, especially if they are the early microsoft/IBM ones, all bets might be off, and you may have to settle.
One good thing though, these are unicode, so it SHOULD be doable:
From wikipedia…
“The Unicode codepoints for the letter are U+0142 for the lower case, and U+0141 for the capital.[11] In the LaTeX typesetting system Ł and ł may be typeset with the commands \L{} and \l{}, respectively. The HTML-codes are Ł and ł for Ł and ł, respectively.”
Your first task should be to run the process with a few of the offending records in preview mode. Check the target(just before the converter in the preview window). If it is right, the problem is with the target driver, or some target centric area. Check the source. If it is bad, the problem is with the source driver, or some source centric area. THEY are usually the culprit. Then maybe try key points in between to find one that fails, and work back from there.
OH, and if you are using an oracle version prior to 12(I forget the EXACT version in 12 where they fixed it), and you find the problem is THERE, it might be an ENVIRONMENT problem. You will have to fix the environment(it is just a couple environment variable settings)
and restart the drivers and/or snaplex.
We’re using Aurora, so we’re communicating using MySQL libs on UTF-8 defined InnoDB tables.
Under the hood, we’ve drilled down into how to test this:
- JSON generator does not output correctly if you copy paste the value
- Inserting the value manually into the UTF-8 table and running a select against it successfully loads the character into Snaplogic (it is properly previewable)
- Selecting that value and then inserting it back into the table fails, as the value inputted is just a ? instead of the proper character
So, under the hood, it seems like there’s a problem with either the insert or merge steps, or both.
We’ve considered bypassing this by creating an S3 file that we upload, but this feels like overkill for what should be an otherwise simple insert.