Forum Discussion
marenas - I would not recommend using Gate to combine the data files - it can cause excessive memory consumption for very large files since the data has to be stored completely in memory. I recommend the attached approach.
The trick here is in the Mapper on the bottom path and the second input view added to the CSV Parser. If you review the documentation, you will see that the second input view allows you to specify a header and also datatypes, if you choose. I simply added the header in the Mapper.
Then in the Union, it combines the data in the way you are looking for.
One thing to note is that Union will take documents as they come from each input view - meaning in this case that if both CSV Parsers are sending a large volume of records, you will see them intermixed - it does not wait for all of the documents on the first path before consuming the documents from the second path. There are easy fixes for this, but thought I would mention it in case it is a requirement that the data ordering be preserved between the input files.
Hope this helps!
If you are using Google/Gmail as your mail provider, Gmail’s Push Notification feature can utilize the Cloud Pub/Sub API to instruct Gmail to publish email deliveries to a Topic that match a certain condition (e.g. the new email has a particular label/in a subfolder). Outside of that, you would have to go back to polling the Users.messages:list API and managing the state yourself.
If using the Pub/Sub API, subscribers to a topic can then either pull and acknowledge messages, or in turn push the message(s) to another endpoint.
Pulling from that topic subscription can be done with scheduled tasks, and the REST Snap Pack with OAuth 2.0 Account.
First, the prerequisites defined by Google must be fulfilled:
- Create a new project in the Google Cloud Platform Console
- Enable Billing
- Enable the Cloud Pub/Sub API and the Gmail API
- Create new OAuth 2.0 Web Application API credentials under Identity and Access Management (IAM) → API Credentials
-
Create a Topic and grant
Pub/Sub Publisher
permission toserviceAccount:gmail-api-push@system.gserviceaccount.com
-
Create a Pull Subscription on that Topic and grant
Pub/Sub Subscriber
to the email account(s) you want subscribed to the email notifications.
Setting up the OAuth 2.0 Account is very similar to the instructions in the Connecting SaaS Providers with SnapLogic’s OAuth-enabled Snaps blog post. The relevant
scope
values are “https://www.googleapis.com/auth/gmail.readonly https://www.googleapis.com/auth/pubsub
”.Then a
watch
request needs to be executed semi-regularly on the topic to maintain the subscription:
robin-community-gmail-watch_2017_04_14.slp (4.2 KB)Another pipeline can then poll the subscription, and receive notifications when changes have occurred with the topic. The issue here is that Gmail’s API is designed for synchronization - meaning the notifications the topic receives are
historyId
s (just pointers to the fact that something changed).To find out what changed, a cache of previous
historyId
s needs to be maintained and used to query thehistory.list
API. For each history event (e.g.messagesAdded
), you then need to query theUsers.messages:get
API to get the actual email content. Then the cache should then be updated with the newly receivedhistoryId
.Since effective use of GMail requires some of level of tracking state, this can result in a pipeline that is a little busy but the following crude example shows that it is possible to use a pipeline to read new email messages:
robin-community-gmail-pull_2017_04_14.slp (26.7 KB)As for utilizing a Push Subscription model for your preferred non-polling solution, this would obviously suit an Ultra pipeline very well but is complicated by Google’s restrictions on only publishing to a domain that is owned and controlled by you and to an endpoint secured by a non-self-signed SSL/TLS certificate.
I haven’t investigated this yet but I imagine this would be possible by setting up a custom API Gateway (e.g. using App Engine) or reverse proxy registered and available at a domain you control, that handles the secure redirection to the Ultra pipeline URL.