Forum Discussion

Henchway's avatar
Henchway
Contributor
5 years ago

Sort all elements in JSON Object

Hi all,

I’m having an issue where i’m utilizing the the builtin snaps to get details on a pipeline. I’m then pushing these pipeline details to Gitlab to enable version control.
Now my issue is that for some reason (i couldn’t find the culprit yet) the json gets slightly modified either by gitlab or possibly the “read pipeline” snap doesn’t always print the json content in the exact same order.
This results in different base64 encoding and therefore also different SHAs, which is a pure pain when trying to build a version control module.

So now i’m looking into the option of sorting all element in the json object beforehand to make sure that the content aligns. Does anyone have an idea on how to do that?

I’ve looked into two options:
(i) Expression language: {}.extend($.entries().sort((left, right) => left[0].localeCompare(right[0])))
The issue here is, that it will only sort the first level and ignore any sublevels.
(ii) Using a Javascript / Python script to sort the nodes, however there i run into the issue that i’m pretty unfamiliar with the Script snap and it’s a massive pain to use it. With Python it would be pretty easy, but i’ll need the json library, which i’d have to import (and i have no idea where to even begin investigating that).

Edit: Looks like i don’t need to add third party libraries, however i still cant get Python to work:

I can parse inDoc as dictionary, however json.dumps() just won’t work.

# Import the interface required by the Script snap.
from com.snaplogic.scripting.language import ScriptHook
import json

class TransformScript(ScriptHook):
    def __init__(self, input, output, error, log):
        self.input = input
        self.output = output
        self.error = error
        self.log = log

    # The "execute()" method is called once when the pipeline is started
    # and allowed to process its inputs or just send data to its outputs.
    def execute(self):
        self.log.info("Executing Transform script")
        while self.input.hasNext():
            try:
                # Read the next input document, store it in a new dictionary, and write this as an output document.
                inDoc = self.input.next()

                dictionary = dict(inDoc)
                test = json.dumps(dictionary, sort_keys=True)
                

                outDoc = {
                    'original' : dictionary
                }
                self.output.write(inDoc, outDoc)
            except Exception as e:
                errDoc = {
                    'error' : str(e)
                }
                self.log.error("Error in python script")
                self.error.write(errDoc)

        self.log.info("Script executed")

    # The "cleanup()" method is called after the snap has exited the execute() method
    def cleanup(self):
        self.log.info("Cleaning up")

# The Script Snap will look for a ScriptHook object in the "hook"
# variable.  The snap will then call the hook's "execute" method.
hook = TransformScript(input, output, error, log)

Any tips are appreciated, thanks!

Best regards
Thomas

3 Replies

  • terrible_towel's avatar
    terrible_towel
    New Contributor II

    I do know the total count as it comes back from each request. I've tried setting the has next to 'true' and then the total pages to fetch = &entity.total_count, but it complains that 'entity' is undefined for some reason. What am I missing?

    • ddellsperger's avatar
      ddellsperger
      Admin

      your total pages to fetch will only be able to use input parameters, unfortunately not the data from the response. You won't be able to use the $entity within that, so you'll have to use the has next to determine if you've reached the end, you could do something like $entity.data.length >= 100 && (100 * (snap.out.totalCount + 1) <= $entity.total_count) in order to check for both 1) you've got all of the data expected in this response and 2) you haven't reached the limit of data.

      • terrible_towel's avatar
        terrible_towel
        New Contributor II

        It seems like this approach does allow me to reach next page, but instead of fetching the rest of the dataset, it just re-fetches the same data from the 1st fetch. So it looks like the Has next expression is working, meaning it evaluates to true, but isn't able to pull the rest of the data. It seems like I'm failing at notifying the API that I want the 2nd batch onwards. So if I start with offset=0, then I'm not able to increment that value.