Recent Discussions
Receiving Kafka Acknowledgement Time out
Hi Community!! Greeting!! We are trying to Read data from a Kafka consumer snap and are using a child pipeline (We are using Pipeline Execute) to write data into a zip folder. The pipeline is erroring with "SnapDataException: Timed out waiting for acknowledgment of consumed messages" error. While investigating with pipeline statistics data, we have observed that the Pipeline execute snap is sending one lesser output document than the number of input documents. Example: We have 4 documents flowing from Kafka consumer. We are using Pipeline execute snap right after reading data from Kafka. Now the statistics say there are 4 input documents. But the output says 3 documents. Contrary to this we can see 4 successful executions in the dashboard and the JCC log data confirm us with 4 triggers and 4 completion of child pipelines. This incident has been occurring intermittently. Has anyone experienced this issue? If so, please suggest us for a solution. Thank you!!pradhyumna_r2 years agoNew Contributor174KViews0likes11CommentsStructuring the Data receiving from the Union snap
Hi all, Can anyone help me to figure out how can I format the data in more structural manner? Input Data: [ { "account_name": "Prajwal", "Name": "Jhonson Inc.", "OwnerId": "0052w00000BUhAGAA1", "StageName": "Negotiation/Review", "CloseDate": "2024-07-15", "Description": "Renewal contract for XYZ Inc. with added services.", "campaign_name": "Navya-Snaplogic" }, { "Campaign_Id": "701Ig000000xF3BIAU" }, { "AccountId": "0012w000027SrONAA0" } ] Expected output data: [ { "account_name": "Prajwal", "Name": "Jhonson Inc.", "OwnerId": "0052w00000BUhAGAA1", "StageName": "Negotiation/Review", "CloseDate": "2024-07-15", "Description": "Renewal contract for XYZ Inc. with added services.", "campaign_name": "Navya-Snaplogic", "Campaign_Id": "701Ig000000xF3BIAU", "AccountId": "0012w000027SrONAA0" } ] the campaign_id and accountid should be incorporated into the first object. Thank you for your help! PrajwalSolvedPrajwal2 years agoNew Contributor II17KViews1like4CommentsDoes Autosync support Amazon S3 as a destination?
Does Autosync support Amazon S3 as a destination?Solvedabhishekp4 years agoEmployee4.1KViews0likes3CommentsJSON to XML conversion
Hello, Im trying to convert a json payload into xml. Im using the xml generator and it gets me close. The content under data is still in json format. Can someone help with converting the content under data into xml as well? Attached is the current and expected output, as well as the json input and xml generator code. ThanksMax2 years agoNew Contributor II3.6KViews0likes1CommentThe Data Lake will get Cloudy in 2017
Written by Chief Enterprise Architect, Ravi Dharnikota. Originally posted in SandHIll Magazine. In 2016, we saw a lot of interesting discussions around the data lake. In the spirit of reflecting on the past and looking at the future of this technology, here are the significant themes I foresee in 2017 for the enterprise data lake. The data lake is no longer just Hadoop Until recently, the enterprise data lake has largely been discussed, designed and implemented around Hadoop. Doug Cutting, creator of Hadoop, commented that Hadoop is at its core – HDFS, YARN and MapReduce. He went on to say that Spark, for many, is now favored over MapReduce. So that leaves a distributed storage system (HDFS) and a resource negotiator (YARN). HDFS has other alternatives in S3, Azure Blob/Azure Data Lake Store and Google Cloud Storage. YARN has alternatives like Mesos. Also, if you wanted to run Spark, it has its own resource management and job scheduling in-built, in stand-alone mode. This brings in other alternatives like S3/EMR/Redshift/Spark/Azure, which are outside of traditional Hadoop, into the enterprise data lake discussion. Separation of storage and compute Elasticity is among the most attractive promises of the cloud. Pay for what you consume when you need it, thus keeping your costs under control. It also leads to simple operational models where you can scale and manage your storage and compute independently. An example of this is the need for a marketing team that buys terabytes of data to merge that with other internal data but quickly realizes that the quality of this data leaves a lot to be desired. Their internal on-premises enterprise data lake is built for regular, predictable batch loads and investing in more nodes in their Hadoop cluster is not an option. Such a burst load can be well handled by utilizing elastic compute separate from storage. Data lake “cloudification” Hadoop has proven a challenge for enterprises, especially from an admin and skill set perspective. This has resulted in data lakes with the right intent, but poor execution, stuck in labs. At this point, the cloud seems like a great alternative with the promise of low to no maintenance and removing the skill set required around management, maintenance and admin of the data lake platform infrastructure. The cloud also allows for freedom to research new technologies (which are rapidly changing in the big data world) and experiment with solutions without a large up-front investment. Avoiding vendor lock-in Various cloud vendors today provide solutions for the data lake. Each vendor has its own advantage with respect to technology, ecosystem, price, ease of use, etc. Multiple options for cloud data lake platforms, on the one hand, provide an advantage for the enterprise to get exposure to the latest innovations, while on the other hand, the same prove to be a challenge to insulate changes from the consumers of these platforms. Like most technology on-ramps, vendors and customers will predictably try many solutions until they settle on one. Vendor changes will come frequently and significantly as their functionality and market focus improves. Enterprises will likely have expert infrastructure teams trying to implement and learning to tweak the interfaces of several vendor offerings concurrently. All this is a good and healthy practice. Application teams will be subjected to all this vendor and solution shakeout. Hundreds of users that have better things to do will naturally become inhibitors to the infrastructure team’s need to make continuous improvements in the cloud infrastructure selection. There needs to be a strategy to protect the application teams in this multi-cloud data lake environment against frequent changes to their data pipelines. Operationalizing the data lake The enterprise is a retrofit job. Nothing can be introduced into the enterprise in isolation. Any new system needs to work well with the technology, skill set and processes that exist. As data lakes graduate from the labs and become ready for consumption by the enterprise, one of the biggest challenges to adoption and success is around retrofitting and operationalizing it. A well-oiled process in an enterprise is mostly automated. How Continuous Integration and Continuous Deployment(CICD), versioning and collaboration are handled will define the enterprise data lake success in an enterprise. Self-service usage led by ease of use, API access and a security framework that supports authentication and authorization in the enterprise with well-defined access controls streamlines consumption of data and services and prevents chaos. As the demand for non-traditional data integration increases, the themes that have been developing over the last year should improve the success of enterprise data lake implementation and adoption.rdharnikota9 years agoFormer Employee3.6KViews0likes3CommentsREST Get Pagination in various scenarios
Hi all, There are various challenges witnessed while using REST GET pagination. In this article, we can discuss about these challenges and how to overcome these challenges with the help of some in-build expressions in snaplogic. Let's see the various scenarios and their solution. Scenario 1: API URL response has no total records indicator, but works with limit and offset: In this case, as there are no total records that the API is going to provide in advance the only way is to navigate each of the page until the last page of the API response. The last page of the API is the page where there are no records in the response output. Explanation of how it works and Sample data: has_next condition: $entity.length > 0 has_next explanation: If the URL has n documents and it is not sure if the next page iteration is valid, the function $entity.length will check the response array length from the URL output and proceeds with the next page iteration only when the $entity.length is greater than zero. If the response array length is equal to zero, it’s evident that there are no more records to be fetched and hence the condition on has_next “$entity.length > 0” will fail and stops the next iteration loop. next_url condition: $original.URL+"?limit=" + $original.limit + "&offset=" + ( parseInt($original.limit) * snap.out.totalCount ) next_url explanation: Limit (limit parameter) and API URL values are static, but the offset value will need to change for each iteration. Hence the approach is to multiply the default limit parameter (limit) with the snap.out.totalCount function to shift the offset per API page iteration. snap.out.totalCount is the snap system variable which used to hold the total number of documents that have passed through output views of the snap. In this “REST Get”, each API page iteration response output is one json array and hence the snap.out.totalCount will be equal to the number of API page iteration completed Sample response: For First API call: { "statusLine": { "protoVersion": "HTTP/1.1", "statusCode": 200, "reasonPhrase": "OK" }, "entity": [ { "year": "2022", "month": "08", "Name": "Mark", }, { "year": "2022", "month": "08", "Name": "John", },………………. 1000 records in this array ], "original": { "effective_date": "2023-08-31", "limit": "1000", "offset": "0", "URL": "https://Url.XYZ.com" } } For Second API Call: { "statusLine": { "protoVersion": "HTTP/1.1", "statusCode": 200, "reasonPhrase": "OK" }, "entity": [ { "year": "2024", "month": "08", "Name": "Ram", }, { "year": "2021", "month": "03", "Name": "Joe", },………………. 1000 records in this array ], "original": { "effective_date": "2023-08-31", "limit": "1000", "offset": "1000" "URL": "https://Url.XYZ.com" } } Scenario 2: API URL response has total records in the response header and pagination is using limit & offset: As there are total records, the total records column in the API response can be used to traverse through the API response pages. Explanation on how it works and Sample data: has_next condition: parseInt($original.limit) * snap.out.totalCount < $headers[total-records] has_next condition explanation: If the URL Response has n documents where n is equal to total, there needs a check whether the limit is less than total records, for example: if there were 120 total records and 100 as a limit, it loops through only 2 times. It loops through as below, limit = 100, snap.out.totalCount =0: has_next condition will evaluate 0 < 120 limit = 100, snap.out.totalCount =1 has_next condition will evaluate 100 < 120 limit = 100, snap.out.totalCount =2 has_next condition will evaluate 200 < 120 pagination breaks and next page is not processed next_url condition: $original.URL+"?limit=" + $original.limit + "&offset=" + (parseInt($original.limit)* snap.out.totalCount) next_url Explanation: Limit and url values are static, but the offset value need to be derived as limit multiplied with snap.out.totalCount function. snap.out.totalCount indicates the total number of documents that have passed through all of the Snap's output views. So it will traverse next API page until the has_next condition is satisfied Sample Response: For First API call: { "statusLine": { "protoVersion": "HTTP/1.1", "statusCode": 200, "reasonPhrase": "OK" }, "entity": [ { "year": "2022", "month": "08", "Name": "Mark" }, { "year": "2022", "month": "08", "Name": "John" },….....100 records ], "original": { "effective_date": "2023-08-31", "limit": "100", "offset": "0", "URL": "https://Url.XYZ.com" } "headers": { "total-records": [ "120" ] } } For Second API Call: { "statusLine": { "protoVersion": "HTTP/1.1", "statusCode": 200, "reasonPhrase": "OK" }, "entity": [ { "year": "2022", "month": "08", "Name": "Ram" }, { "year": "2022", "month": "08", "Name": "Raj" },….....20 records ], "original": { "effective_date": "2023-08-31", "limit": "100", "offset": "100", "URL": "https://Url.XYZ.com" } "headers": { "total-records": [ "120" ] } } Scenario 3: API has no total records indicator and pagination is using page_no: The scenario here is that, there is no total records indication in the API output but API has page number as parameter. So the API pagination is possible by incrementing the page number parameter by 1 until the length of the API output array length is greater than 0, else the pagination loop need to break. Explanation on how it works and Sample data: Has-next condition: $entity.length > 0 Has-next Condition Explanation: As there is no total record count known from API output, next page of the API need to be fetched if the current page has any output elements in the output array. next-url condition: $original.URL+"&page_no= " + $headers.page_no+1 Next-Url Condition Explanation: As every document has page number in it, same can be used in the has-next condition. Sample Response: For First API call: { "statusLine": { "protoVersion": "HTTP/1.1", "statusCode": 200, "reasonPhrase": "OK" }, "entity": [ { "year": "2022", "month": "08", "Name": "Mark", }, { "year": "2022", "month": "08", "Name": "John", },………………. 1000 records in this array ], "original": { "effective_date": "2023-08-31", "URL": "https://Url.XYZ.com" }, "headers": { "page_no": 1 } } For Second API call: { "statusLine": { "protoVersion": "HTTP/1.1", "statusCode": 200, "reasonPhrase": "OK" }, "entity": [ { "year": "2022", "month": "08", "Name": "Ram", }, { "year": "2022", "month": "08", "Name": "Raj", },………………. 1000 records in this array ], "original": { "effective_date": "2023-08-31", "URL": "https://Url.XYZ.com" }, "headers": { "page_no": 2 } } Scenario 4: has total records in the response header and pagination is using page_no The scenario is there is a total records count indicator and page number in the API Url response. API next page traverse can be through incrementing page number by 1 and validate if the total records count is less than the total rows fetched so far (multiplication of snap.out.totalCount and page limit). Explanation on how it works and Sample data: Has_next condition: parseInt($original.limit) * snap.out.totalCount < $headers[total-records] Has-next Explanation: If the URL Response has n documents where n is equal to total, has_next condition is to check whether the rows fetched is less than total records, For example: if there were have 120 total records and 100 as the limit factor for the API (predefined as part of design/implementation), it loops through exactly 2 times (first and second page only). it loops through as below, limit = 100, snap.out.totalCount =0: has_next condition will evaluate 0 < 120 limit = 100, snap.out.totalCount =1 has_next condition will evaluate 100 < 120 limit = 100, snap.out.totalCount =2 has_next condition will evaluate 200 < 120 pagination breaks and next page is not processed Next-url condition: $original.URL+"&page_no= " + $headers.page_no+1 next-url explanation: As every API URL output has page number in it, same can be used in the has-next condition and also in incrementing page number to get to the next document. Sample Response: { "statusLine": { "protoVersion": "HTTP/1.1", "statusCode": 200, "reasonPhrase": "OK" }, "entity": [ { "year": "2022", "month": "08", "Name": "Mark", }, { "year": "2022", "month": "08", "Name": "John", },………………. 100 records in this array ], "original": { "effective_date": "2023-08-31", "in_limit": "100", "URL": "https://Url.XYZ.com" }, "headers": { "page_no": 1 } } For Second API call: { "statusLine": { "protoVersion": "HTTP/1.1", "statusCode": 200, "reasonPhrase": "OK" }, "entity": [ { "year": "2022", "month": "08", "Name": "Ram", }, { "year": "2022", "month": "08", "Name": "Raja", },………………. 20 records in this array ], "original": { "effective_date": "2023-08-31", "limit": "100", "URL": "https://Url.XYZ.com" }, "headers": { "page_no": 2 } } Please give us Kudos if the article helps you😍Vishaal_Arun2 years agoNew Contributor II2.4KViews4likes2CommentsGoogle BigQuery Load (Streaming)
I am trying to load a table using the BigQuery Load (Streaming) and I am getting this error message even though the table DOES exist within my defined projectid. What is the number '610395521851' in place of the project id in the error message?: "message" : "Table 610395521851:LE_ODS_asuce_canvas.wrk_cd2_le_conversation_messages not found.",Solvedsnapation67132 years agoNew Contributor III2.3KViews0likes1CommentSimple JSON nesting issue
Hi All, I would like some advice/better solution to my query! I am preparing some data to send in a table via email, the data structure is as below [{ "statusCode": 202, "statusText": "Accepted", "payload_file_name": "s3:///bucket/api-payload/archive/api_payload_20241023_140821324_j1.json", "receivedDate": "2024-10-28T13:25:26.6271537Z", "processedDate": "2024-10-28T13:25:26.6271537Z", "rowsCount": 2, "transactionId": "{a0aa71d8-5c0a-4bdc-a39f-a42bf5143c3b}" }] I am preparing the fields in a parent pipeline and feeding the data as a string to a child, a mapper creates this using data.field_name formatting and "data" is passed as the pipeline parameter. When the child receives this, I am having some trouble getting the below tweaked code (from another post by koryknick) to map the fields from horizontal to vertical format (easier to read in a quick email notification) $email_data.entries().slice(0).map(x=> { "Field": x[0], "Value" : x[1] } This code works fine directly in line with the mapper snap creating the data.field_name data, but when passed to a child, it gives it a set of square braces [...], and the code no longer works as it did. I have tried many forms of grouping and mapping, and come up with what I think is a hacky solution, which something better could replace. My solution is simply to replace the [ and ] characters to nothing (.replace('[','')) in the incoming string to the child pipeline. I'm convinced there is a better way to do this with the expression language, but can't figure it out! Without the replacing of the square braces I've attached a sample pipeline with the mapping and payload as a parameter. Thanks in adance!SolvedLeeDuffy2 years agoNew Contributor II2.2KViews0likes4CommentsData reconciliation solutions?
One of my company's use cases for SnapLogic today is replication of data from Salesforce into internal Kafka topics for use throughout the enterprise. There have been various instances of internal consumers of the Kafka data reporting missing records. Investigations have found multiple causes for these data drops. Some of the causes are related to behavior that Salesforce describes as "Working As Designed". Salesforce has recommended other replication architectures, but there are various concerns with my company about using them (license cost, platform load) ... and we might still end up with missing data. So, we're looking into data reconciliation / auditing solutions. Are there any recommendations on a tool that can: * Identify record(s) where the record in Salesforce does not have a matching record (e.g. same timestamp) existing in Kafka * Generate a message containing relevant metadata (e.g. record Id, Salesforce object, Kafka topic) to be sent to a REST endpoint / message queue for reprocessingfeenst10 months agoNew Contributor III2.2KViews0likes6CommentsArray of Objects manipulation
Hi team, I would like to iterate thru an array of objects and verify if the objects has same num, code and date with different boxNumbers, then I should add the boxNumbers together and make that as a single object. If those three didn't match I should leave the object as is. Could you please help me on this? Sample Input data: [ { "product": [ { "num": "69315013901", "code": "C06024", "date": "2026-03-31", "boxNumber": [ "453215578875", "964070610419" ] }, { "num": "69315013901", "code": "C06024", "date": "2026-03-31", "boxNumber": [ "153720699865", "547398527901", "994797055803" ] }, { "num": "69315030805", "code": "083L022", "date": "2025-11-30", "boxNumber": [ "VUANJ6KYSNB", "DPPG4NWK695" ] } ] } ] Expected Output: [ { "product": [ { "num": "69315013901", "code": "C06024", "date": "2026-03-31", "boxNumber": [ "453215578875", "964070610419", "153720699865", "547398527901", "994797055803" ] }, { "num": "69315030805", "code": "083L022", "date": "2025-11-30", "boxNumber": [ "VUANJ6KYSNB", "DPPG4NWK695" ] } ] } ]Solvedlake2 years agoNew Contributor2.1KViews0likes4Comments