Forum Discussion

dshen's avatar
dshen
Employee
9 years ago

Snaplogic Triggered Task using an OnPremises URL through Load Balanced Groundplex Nodes

This article describes how a triggered task is invoked in the SnapLogic Elastic Integration
Platform using an OnPremise URL through load balanced Groundplex nodes.

Assume that your organization has a SnapLogic Groundplex provisioned with 3 nodes. When an OnPremise URL is exposed for a triggered task, it will automatically suggest the hostname of one of the nodes that belongs to the Groundplex.

e.g., https://GP-Node1:<port>/api/1/rest/feed/<RELATIVE_PATH_TO_TASK>/

To provide redundancy across all nodes in the Groundplex when a triggered task is invoked, a load balancer can be placed in front of the Groundplex nodes. When a load balancer is setup and configured, Snaplogic will use the load balancer in the auto-generated OnPremise URL.

e.g., https://GP-LB:<PORT>/api/1/rest/feed/<RELATIVE_PATH_TO_TASK>/

The following diagram describes the flow of network requests made when remotely executing a triggered task using a load balancer OnPremise URL.

  1. A remote client invokes the triggered task using the OnPremise URL that points to the load balancer (e.g., GP-LB).
  2. The load balancer forwards the request to an active groundplex node. GP-Node1 is selected for the purpose of this example.
  3. The groundplex node that receives the triggered task request asks the Control Plane on which node the task should be executed.
  4. The Control Plane forwards the request to an active groundplex node. GP-Node2 is selected for the purpose of this example.
  5. The triggered task now prepares to be executed on GP-Node2. An HTTPS connection is created between GPNode-1 and GP-Node2 to enable data to be streamed between the nodes.
  6. The data is read/write from/to the end points.
  7. The response message is sent though GP-Node1 then GP-LB (load balancer) back to the caller.

23 Replies

  • Kulashekharan's avatar
    Kulashekharan
    New Contributor II

    Sorry if it is a dumb question, but is the Load balancer really useful here if control plane decides where it has to get executed? What purpose does the Load balancer server more than acting as a proxy may be?

    • christwr's avatar
      christwr
      Contributor III

      The load balancer would be to select which groundplex node to submit the initial request to. You don’t want to always point to some specific node, as it may be down at any given point for maintenance, etc. The load balancer would send to one of the groundplex nodes that are actually up and accepting requests.

      To provide redundancy across all nodes in the Groundplex when a triggered task is invoked, a load balancer can be placed in front of the Groundplex nodes

    • ram3's avatar
      ram3
      New Contributor

      @tlikarish @cjhoward18 Thank you for this blog post. My understanding is, if we use the Cloud URL then there is no need to setup the “GP_LB” right ? I am saying this because, Control plane has the information of which node is available and least loaded (has resources) to run the task & also has information of whether the task is already cached in the node.

      Additionally, do you discourage the triggering of tasks using Cloud URL for security reasons ? If so, how to disable the CloudURL. Thanks in advance

  • Thank you for the post. This is exactly what is been done in our environment.

    Please elaborate more on point # 3 as to how the groundplex node communicates to control plane to decide the node on which the task should be executed.

    Also as per point # 4, even though the control plane selects Node2 for execution, but the dashboard shows it as Node1. Am I correct with this statement? Please correct me if I’m wrong.

    Many thanks!!!

    • cstewart's avatar
      cstewart
      Former Employee

      In point #3, GP-Node1 uses its established connection to the control plane (the websockets over SSL connection) to notify the control plane of the request.
      In point #4, the execution will show as GP-Node2, not GP-Node1, GP-Node1 is only acting as the pass-through.

  • Thank you for the explanation. I have a follow up question. Does the entire request message, including HTTP headers, parameters, and body get sent to the Control Plane and forwarded to the node selected for execution? Or is data like parameters and body not sent to the Control Plane, but is sent directly from the initial landing node (GP-Node1) to the selected node (GP-Node2) over the HTTPS connection established in step 5? Putting it another way is the HTTPS connetion between GPNode-1 and GP-Node2 only used to send the response back from GP-Node2 to GP-Node1 or is it used for anything else?

    • tlikarish's avatar
      tlikarish
      Employee

      Most of the headers and parameters will get sent to the control plane, but a request body is not. Basically anything that can be converted into a pipeline parameter could be passed up as part of the request, but any data that is passed into an input view (binary or document) will be transferred between CC nodes themselves. There are some additional optimizations that have been made to try and avoid the control plane portion of this flow as well, so sometimes it’s even possible to avoid those steps as well and everything is handled locally between plex nodes.

      So in worst case, the headers and parameters may be passed up to control plane, but the body is passed between nodes.

  • sanjaynayak's avatar
    sanjaynayak
    New Contributor III

    Regarding point #1, any one has any info.

    1. A remote client invokes the triggered task using the OnPremise URL that points to the load balancer (e.g., GP-LB).

    What authentication mechanism we can use between remote client and Onpremise ground URL since ground url has no token like cloud url we have bearer token.

  • sanjaynayak's avatar
    sanjaynayak
    New Contributor III

    @dshen ,any one suggest is it real workflow, it seems like we found some gap.

    1- GP-LB is receiving the request from client, then as part of load balance we are telling to go to let sey-GP-Node1 in this example.
    2- From GP-Node1 to control plane flow(3) is to get the pipeline(meta data) to be execute or what ?
    3- If yes then control plane should chose GP-node1 to execute the pipeline since it was asked by GP-LB.

    Can any body explain the actual design.

    • cjhoward18's avatar
      cjhoward18
      Employee

      Hi,

      Step 2 only happens if the task is not already cached on the node itself.

      Step 3, the control plane plays no role in choosing the node for execution in the Ground invoked case.
      The communication is outbound from the GP Node to the control plane to fetch the depending assets for execution if they are not cached as stated before.

      • sanjaynayak's avatar
        sanjaynayak
        New Contributor III

        Thanks @cjhoward18 for the info, correct me this flow if my understanding is correct due to cache and we have two cases.

        1. The load balancer forwards the request to an active groundplex node. GP-Node1 is
          selected for the purpose of this example.

        Case1:

        In this case if task is already cached in GP-Node1, it will execute the task and will sent response back to GP-LB and request will not go to control plane at all.

        Case2:
        in case task is not cached then request will go from GP-Node1 to control plane and control plane will not decide as you mentioned which node is going to be execute the task, it must be the same node (GP-Node1), from where request goes to control plane.

        is there anyway we can check task is cached previously or not before execution?

  • bojanvelevski's avatar
    bojanvelevski
    Valued Contributor

    dshen cjhoward18 tlikarish PSAmmirata 

    Hello gentlemen.

    This is a very interesting subject! Reading this I can understand that we practically have two balancing processes here, one from the Load Balancer itself, and the other one from the Control Plane. What I couldn't understand was, what is the criteria when the Control Plane is deciding on which node it should execute the request. The Cache is mentioned throughout the comments, does that mean that the Control Plane is making the decision based on which node the pipeline is already cached? And if yes, does that mean that, if the node which received the request, has the pipeline cached on itself, will execute the request and it won't pass it on to some of the other nodes?

    Thank you,

    Bojan