SnapLogic - Integration Nation

jaybodra · ‎06-23-2017

Please can some shed some light on using a third party library (boto3, requests) in a python script

Thank you in advance!

Bhavin · ‎06-27-2017

AFAIK you can call 3rd party java libs inside script snap using py as scripting language, here is a sample script that uses aws java sdk, gets a list of s3 objects from a bucket

# Script begin
# Import the interface required by the Script snap.
import java.util
import json
import sys 
sys.path.append('/opt/snaplogic/userlibs/aws-java-sdk-1.9.6.jar')
from com.amazonaws import AmazonClientException
from com.amazonaws import AmazonServiceException
from com.amazonaws.regions import Region
from com.amazonaws.regions import Regions
from com.amazonaws.services.s3 import AmazonS3
from com.amazonaws.services.s3 import AmazonS3Client
from com.amazonaws.services.s3.model import Bucket
from com.amazonaws.services.s3.model import GetObjectRequest
from com.amazonaws.services.s3.model import ListObjectsRequest
from com.amazonaws.services.s3.model import ObjectListing
from com.amazonaws.services.s3.model import PutObjectRequest
from com.amazonaws.services.s3.model import S3Object
from com.amazonaws.services.s3.model import S3ObjectSummary
from com.snaplogic.scripting.language import ScriptHook

class TransformScript(ScriptHook):
    def __init__(self, input, output, error, log):
        self.input = input
        self.output = output
        self.error = error
        self.log = log

# The "execute()" method is called once when the pipeline is started
# and allowed to process its inputs or just send data to its outputs.
def execute(self):
    self.log.info("Executing Transform script")
    while self.input.hasNext():
        try:
            # Read the next document, wrap it in a map and write out the wrapper
            in_doc = self.input.next()
            # wrapper = java.util.HashMap()
            # bucket is a property set in a mapper that precedes script snap and holds the bucket 
            # name
            bucket = in_doc.get("bucket")
            # $_bucketPipelineParam is a pipeline param and holds the bucket name
            bucketParam = $_bucketPipelineParam
            s3 = AmazonS3Client()
            usEast1 = Region.getRegion(Regions.US_EAST_1)
            s3.setRegion(usEast1)
            objectListing = s3.listObjects(ListObjectsRequest().withBucketName(bucket))
            s3objectList = {}
            for objectSummary in objectListing.getObjectSummaries():
                wrapper.put(objectSummary.getKey(),objectSummary.getSize())

            self.output.write(in_doc, wrapper)
        except Exception as e:
            errWrapper = {
                'errMsg' : str(e.args)
            }
            self.log.error("Error in python script")
            self.error.write(errWrapper)

    self.log.info("Finished executing the Transform script")

# The Script Snap will look for a ScriptHook object in the "hook"
# variable.  The snap will then call the hook's "execute" method.
hook = TransformScript(input, output, error, log)

# Script end

I am on windows so I’ve copied aws-java-sdk-1.9.6.jar file to c:/opt/snaplogic/userlibs on all of the plex nodes, you need to copy 3rd party jars to all of the plex nodes and make sure to save it at a consitent, same path on all nodes.

Also on my nodes I have edited/created a credentials file located here C:\Users\Bkukadia\.aws\credentials which contains these key=value pairs

aws_access_key_id=AWSKEYAKIAIGFUBXI
aws_secret_access_key=AWSSECRET8++Kg6QNMX6

I think you can also use IAM roles but I am not much familiar with it.

For more details on aws java sdk check this out - AWS SDK for Java

Invoke aws java sdk via py_2017_06_27.slp (6.0 KB)

mohamadelmardin · ‎12-12-2017

This is an excellent post cause we are running into similar problem with integrating to different AWS services such as publishing to SNS topic and we are trying to install Boto3 on the JCC node so hopefully the python script will pick it up. Apparently from this post installing python Boto3 won’t work and we need to and download AWS JAVA SDK instead and invoke it from Python script. This is a good solution to us but we need to know the java import libraries for SNS.

Can someone help us by posting a similar code snippet but for publishing to SNS instead of S3 using the JAVA sdk from Python? What are the import libraries we need?

mohamadelmardin · ‎12-12-2017

Following up on my prevoius question, I guess I found the answer on this link:

http://docs.aws.amazon.com/sns/latest/dg/using-awssdkjava.html

But feel free to add to it or provide a Python code snippet for SNS if I missed anything…

jaybodra · ‎12-12-2017

# Import the interface required by the Script snap.
from com.snaplogic.scripting.language import ScriptHook
import java.util
import urlparse
import urllib2
import sys
import socket
import json

#referencing aws java lib
sys.path.append(‘/opt/snaplogic/userlibs/aws-java-sdk-1.11.156.jar’)
sys.path.append(‘/opt/snaplogic/userlibs/commons-logging-1.1.3.jar’)
sys.path.append(‘/opt/snaplogic/userlibs/jackson-databind-2.6.6.jar’)
sys.path.append(‘/opt/snaplogic/userlibs/jackson-core-2.6.6.jar’)
sys.path.append(‘/opt/snaplogic/userlibs/jackson-annotations-2.6.0.jar’)
sys.path.append(‘/opt/snaplogic/userlibs/httpcore-4.4.4.jar’)
sys.path.append(‘/opt/snaplogic/userlibs/httpclient-4.5.2.jar’)
sys.path.append(‘/opt/snaplogic/userlibs/joda-time-2.8.1.jar’)

#importing core/aws java lib
from java.io import ByteArrayInputStream
from java.lang import Exception
from com.amazonaws.auth import BasicAWSCredentials
from com.amazonaws.services.s3 import AmazonS3Client
from com.amazonaws.services.s3.model import PutObjectRequest
from com.amazonaws.services.s3.model import ObjectMetadata
from com.snaplogic.scripting.language import ScriptHook

class TransformScript(ScriptHook):
def init(self, input, output, error, log):
self.input = input
self.output = output
self.error = error
self.log = log

# The "execute()" method is called once when the pipeline is started
# and allowed to process its inputs or just send data to its outputs.
def execute(self):
    self.log.info("Executing Transform script")
    while self.input.hasNext():
        try:
            # Read the next document, wrap it in a map and write out the wrapper
            in_doc = self.input.next()
            
            try:
                #set "socket" time as 5 sec globally for s3 to read
                socket.setdefaulttimeout(5)
                #get object values from incoming json document
                bucket_name = $_bucket
                photo_id = in_doc.get("photo_id")
                listing_id = in_doc.get("listing_id")
                photo_url = in_doc.get("photo_url")
                
                #"parse" url for photo_name and set location for file to be saved on s3
                parsed_url = urlparse.urlsplit(photo_url)
                path = parsed_url[2].split("/")
                photo_name = path[len(path) - 1]
                s3_location = $_subdir + $_pathdelimiter + listing_id + $_pathdelimiter + photo_name
                
                #creating s3 client   
                #access photo content using "photo_url"
                url_response = urllib2.urlopen(photo_url, timeout=5)
                content_length = long(url_response.headers['Content-Length'])
                content_type = url_response.headers['Content-Type']
                obj_metadata = ObjectMetadata()
                obj_metadata.setContentLength(content_length)
                obj_metadata.setContentType(content_type)
                data = url_response.read()
                bais = ByteArrayInputStream(data)
                bais.reset()
                try: 
                    basic_cred = BasicAWSCredentials("access_token","secret_token")
                    s3 = AmazonS3Client(basic_cred) 
                    s3.putObject(PutObjectRequest(bucket_name, s3_location, bais, obj_metadata))
                    bais.close()
                    wrapper = {
                            "photo_id" : photo_id,
                            "listing_id" : listing_id,
                            "photo_url" : photo_url,
                            "photo_name" : photo_name,
                            "photo_downloaded" : True,
                            "error" : None
                        }
                except Exception as s3_error:
                    wrapper = {
                            "photo_id" : photo_id,
                            "listing_id" : listing_id,
                            "photo_url" : photo_url,
                            "photo_name" : photo_name,
                            "photo_downloaded" : False,
                            "error" : "s3_error, " + str(s3_error)
                        }
            except BaseException as photo_url_error:
                wrapper = {
                            "photo_id" : photo_id,
                            "listing_id" : listing_id,
                            "photo_url" : photo_url,
                            "photo_name" : photo_name,
                            "photo_downloaded" : False,
                            "error" : "photo_url_error, " + str(photo_url_error)
                        }
  		#output
            self.output.write(in_doc, wrapper)
        #error view output when error view is enabled
  	except Exception as e:
            errWrapper = {
                'errMsg' : str(e.args)
            }
            self.log.error("Error in python script")
            self.error.write(errWrapper)

    self.log.info("Finished executing the Transform script")

# The Script Snap will look for a ScriptHook object in the “hook”
#variable. The snap will then call the hook’s “execute” method.
hook = TransformScript(input, output, error, log)

this is my very first snaplogic jython script and works well.
Couple of things to keep in mind,

Java datatypes are used by AWS and python datatypes are used by SL jython (basically python) script
most of the python datatypes gets converted to java,but there are exception example ‘byte array’, which has different representation in java and python
Please use error view to debug your script error
You can run native java or python codes using subprocess module of jython
most of the python trick works.
snaplogic also uses aws sdk for its connection to various aws services. so the authentication mentioned in initial post might give you error.

you can write a java code and convert it to jython if it gets too complicated.

link to jython doc: http://www.jython.org/docs/library/indexprogress.html

there was a time when I tried to added more complexity to this code and I realized jython cannot perform and/or handle what I intend to do.

So my new approach was to use unix snap and run my actual python script. Yes actual python script

Hopefully some day snaplogic will add native Java and Python support.

SnapLogic - Integration Nation

How to use 3rd party python libraries in python Script