String text in JSON output incorrectly using csv parser/formatter to csv

I’ve got the following text in a description field in my raw JSON file, where the double quotes are obviously escaped. How can I change settings in the JSON formatter and/or the CSV parser/formatter to fix this problem? The snaps are at their default settings.

JSON:
“Charts the development of Black Art in the United States and Great Britain from 1945 to the present through Paul Gilroy’s notion of the \“Black Atlantic\” – the ways that a distinct aesthetic emerged from a fusion of West African, American, and British traditions. Also listed as ART 294.”,

But when output to csv, it looks like this:

colA: Charts the development of Black Art in the United States and Great Britain from 1945 to the present through Paul Gilroy’s notion of the \Black Atlantic\" – the ways that a distinct aesthetic emerged from a fusion of West African

colB: American

colC: and British traditions. Also listed as ART 294."

So, it appears that the commas in the text are causing it to split into multiple columns and the escaped double quotes don’t end up as just double quotes. In other words, the default setting of '' in the CSV formatter doesn’t appear to be working to resolve text fields like the one above.

I found this on a site, which states that if a backslash is immediately followed by a double quote (\") that the csvformat function won’t work: [CSV-200] CSVFormat cannot read its own output if input contain escape character followed by quote character - ASF JIRA. Not sure if this is the same function that your csv formatter snap is using?

UPDATE

The recommendation on the site above was to remove either the " or the \ and I removed the \ and it fixed my text problem!

1 Like

So, the site I found fixed the issue with my text for the description field getting spread incorrectly across three columns, but now I have roughly 250 rows out of about 7000 rows where the description is messed up, with characters like apostrophes displayed incorrectly on the csv output.

Issues:

  1. Apostrophe’s are incorrectly output, example: women’s becomes women’s
  2. “bad” and “wild” becomes “bad” and “wildâ€
  3. An em-dash in a text like P—GRK becomes P—GRK

Can someone help me with this?

New Note: I discovered if I open my csv in notepad++ the characters all appear to be correct and not messed up as above. So, I really don’t know what to do now.

Can someone please help me with this?

Hi @vincenr

Good day, this are extended characters/ascii code which maybe be viewed in excel or other app as special character

So, what do you want to do with this characters?

Here are some options that you can do (1) replace the extended characters with the regular ones or (2) just remove all those characters

$the_text_description.replace(/”|“|‘|’|–|—/g, m=> match m { '”'=> '"', '“'=> '"', "‘"=> "'", "’"=> "'", "—"=> "-", "–"=> "-" })

or you can also totally removed those characters

$the_text_description.replace(/”|“|‘|’|–|—/g, '')

Thanks,
MM

OK, I will try that period. Thank but please leave this ticket open until after I verify if it works.

I’d rather see the syntax for how to only include a-zA-z0-9.()’

How would the syntax work for CRLF? That is, if I only want to replace
CRLF with a space in a field?

Never mind, I figured it out.