How to convert utf-8 file to DOS(Disk Operating System format-ANSI)

Question

Hi All&nbsp; &nbsp; &nbsp; Need to convert following Input File data which is in UTF-8 to DOS format, I tried with Tanscoder snap but it&nbsp; throw an Error&nbsp;The Error Description is&nbsp;&nbsp;Any suggestion please?&nbsp;&nbsp;

endor_force · Answer

DOS format is a bit vague of the limitations of the target system.I get the same error when sending in a file with chinese characters in it and setting the output to basically any format other than UTF*. The error message states that there is something wrong with the input file and it may not be UTF-8, but verifying that the input file is UTF-8 using some common tests:&nbsp;file -i sample.csv
sample.csv: text/plain; charset=utf-8&nbsp;&nbsp;Using "chardet" in python shows a 99% confidence of utf-8 for the input file&nbsp;from chardet.universaldetector import UniversalDetector

files = ['sample.csv']

detector = UniversalDetector()
for filename in files:
    print(filename.ljust(20), end='')
    detector.reset()
    for line in open(filename, 'rb'):
        detector.feed(line)
        if detector.done: break
    detector.close()
    print(detector.result)&nbsp;&nbsp;Outputs:&nbsp;sample.csv {'encoding': 'utf-8', 'confidence': 0.99, 'language': ''}&nbsp;&nbsp;However when setting the output format to UTF-16 or UTF-32 it will work and the transcode will successfully create a file with a different UTF format, confirmed by chardet:&nbsp;&nbsp;test_output.csv          {'encoding': 'utf-32be', 'confidence': 0.85, 'language': ''}&nbsp;&nbsp;I have tried some different non-UTF prefixed output formats but any that i tried has failed so far.It seems to be related to the chinese characters, when removing the lines with these characters the transcoding works, but as soon as the chinese characters are introduced in the file then the transcoding fails.It could be that some of the other character sets does not have full support for all characters which are available in UTF formats.I would recommend you to get in contact with the snaplogic support team for further analysis or clarification.Full error, UTF-8 to US-ASCII as a sample:&nbsp;&nbsp;&nbsp;Transcoder[5c46bfd317f60c09d026aecf_6a02bd2b-c54d-4d54-bd03-66d0bd7a5e20 -- c927bdd5-dafb-4217-88d4-aba17d2291de]
com.snaplogic.snap.api.SnapDataException: Failed to transcode from UTF-8 to US-ASCII
	at com.snaplogic.snaps.transform.Transcoder.process(Transcoder.java:125)
	at com.snaplogic.snap.api.write.SimpleBinaryWriteSnap.doWork(SimpleBinaryWriteSnap.java:62)
	at com.snaplogic.snap.api.SimpleBinarySnap.execute(SimpleBinarySnap.java:57)
	at com.snaplogic.cc.snap.common.SnapRunnableImpl.executeSnap(SnapRunnableImpl.java:804)
	at com.snaplogic.cc.snap.common.SnapRunnableImpl.execute(SnapRunnableImpl.java:577)
	at com.snaplogic.cc.snap.common.SnapRunnableImpl.doRun(SnapRunnableImpl.java:869)
	at com.snaplogic.cc.snap.common.SnapRunnableImpl.call(SnapRunnableImpl.java:427)
	at com.snaplogic.cc.snap.common.SnapRunnableImpl.call(SnapRunnableImpl.java:116)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.base/java.util.concurrent.FutureTask.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)
Caused by: com.snaplogic.cc.snap.common.SnapStreamException: Exception while reading binary data from the stream
	at com.snaplogic.cc.snap.view.binary.BinaryOutputViewImpl.write(BinaryOutputViewImpl.java:279)
	at com.snaplogic.snap.api.OutBoundViewsImpl.write(OutBoundViewsImpl.java:287)
	at com.snaplogic.snaps.transform.Transcoder.process(Transcoder.java:102)
	... 13 more
Caused by: java.nio.charset.UnmappableCharacterException: Input length = 1
	at java.base/java.nio.charset.CoderResult.throwException(Unknown Source)
	at java.base/sun.nio.cs.StreamEncoder.implWrite(Unknown Source)
	at java.base/sun.nio.cs.StreamEncoder.implWrite(Unknown Source)
	at java.base/sun.nio.cs.StreamEncoder.write(Unknown Source)
	at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1613)
	at org.apache.commons.io.IOUtils.copyLarge(IOUtils.java:1591)
	at com.snaplogic.snaps.transform.Transcoder$1.write(Transcoder.java:115)
	at com.snaplogic.cc.snap.view.binary.BinaryOutputViewImpl.write(BinaryOutputViewImpl.java:217)
	... 15 more
Reason: The character set in the input data may not be UTF-8
Resolution: Please select the correct input character set.&nbsp;&nbsp;Input file which i tried with is attached.&nbsp;

endor_force · Answer

IBM850 would be the Latin multilingual MS-DOS charset i assume.

I tested on my pc and when typing out a file transcoded from UTF-8 to IBM850 in a windows dos prompt it looks ok, i have not verified on older dos versions or dosbox.

It will fail with error in transcoding if you have any unsupported character of the target charset, even the euro-sign (€) will cause failure. It seems like the transcoding is relying on old non-euro version of IBM850?
Other Multilingual Latin charsets with euro support such as 858 or 912 is not existing to select from.

For verification with the tools used previously, chardet would identify an IBM850 transcoded file as windows1252 with 73% confidence (which is not correct) and "file -i" says it is unknown 8-bit

rnbabar · Answer

Thanks&nbsp;endor_force&nbsp;for the detail analysis , I have tried with other sample file which do not have Chinese characters, It does not accept throw the same error, Instead I used&nbsp;ISO-8859-1 which accept the input, Now the problem is what is would be output character set. I can not see any ANSI or MS DOS character set there instead it shows Windows character setsI am trying to get the output in MS DOS ANSI character set format , any thoughts ?&nbsp;

Forum Discussion

How to convert utf-8 file to DOS(Disk Operating System format-ANSI)

3 Replies

Recent Discussions

Automating Git Commits for Untracked Assets

HTTP Client response formatted different from REST Get

Connecting Netsuite via restAPI or restlets

Issue with invoking triggered task using Groundplex URL

Trigger REST Get when status is completed