Forum Discussion
8 Replies
- BhavinFormer Employee
AFAIK 3rd party py libs like boto3 are not currently supported within py script, neverthless you can still invoke your your py scripts that has ref to boto3 libs, the flow is
1 - create py scripts using your fav editor, outside snaplogic and save it on all of snaplogic nodes (aka jcc)
2 - get UnixExec snap from you SnapLogic account team (this is a field snap that you can request from SnapLogic), this will help you run py scripts out side of SnapLogic via ssh on JCC node/system
3 - Add UnixExec snap in your pipeline, where ever you need to invoke your py script
- jaybodraNew Contributor III
Thank you so much Bhavin!
I am assuming I can pass pipeline variables as parameters to my python script like we run it on shell. Please can you shed some light on this.
- BhavinFormer Employee
Please help me understand your usecase, what are you trying to do?
- jaybodraNew Contributor III
this is something I am trying to do,
processing url using python lib and writing results to s3. I need to pass url, sub_dir_name, doc_name and delimiters as argument to the script
- BhavinFormer Employee
AFAIK you can call 3rd party java libs inside script snap using py as scripting language, here is a sample script that uses aws java sdk, gets a list of s3 objects from a bucket
# Script begin # Import the interface required by the Script snap. import java.util import json import sys sys.path.append('/opt/snaplogic/userlibs/aws-java-sdk-1.9.6.jar') from com.amazonaws import AmazonClientException from com.amazonaws import AmazonServiceException from com.amazonaws.regions import Region from com.amazonaws.regions import Regions from com.amazonaws.services.s3 import AmazonS3 from com.amazonaws.services.s3 import AmazonS3Client from com.amazonaws.services.s3.model import Bucket from com.amazonaws.services.s3.model import GetObjectRequest from com.amazonaws.services.s3.model import ListObjectsRequest from com.amazonaws.services.s3.model import ObjectListing from com.amazonaws.services.s3.model import PutObjectRequest from com.amazonaws.services.s3.model import S3Object from com.amazonaws.services.s3.model import S3ObjectSummary from com.snaplogic.scripting.language import ScriptHook class TransformScript(ScriptHook): def __init__(self, input, output, error, log): self.input = input self.output = output self.error = error self.log = log # The "execute()" method is called once when the pipeline is started # and allowed to process its inputs or just send data to its outputs. def execute(self): self.log.info("Executing Transform script") while self.input.hasNext(): try: # Read the next document, wrap it in a map and write out the wrapper in_doc = self.input.next() # wrapper = java.util.HashMap() # bucket is a property set in a mapper that precedes script snap and holds the bucket # name bucket = in_doc.get("bucket") # $_bucketPipelineParam is a pipeline param and holds the bucket name bucketParam = $_bucketPipelineParam s3 = AmazonS3Client() usEast1 = Region.getRegion(Regions.US_EAST_1) s3.setRegion(usEast1) objectListing = s3.listObjects(ListObjectsRequest().withBucketName(bucket)) s3objectList = {} for objectSummary in objectListing.getObjectSummaries(): wrapper.put(objectSummary.getKey(),objectSummary.getSize()) self.output.write(in_doc, wrapper) except Exception as e: errWrapper = { 'errMsg' : str(e.args) } self.log.error("Error in python script") self.error.write(errWrapper) self.log.info("Finished executing the Transform script") # The Script Snap will look for a ScriptHook object in the "hook" # variable. The snap will then call the hook's "execute" method. hook = TransformScript(input, output, error, log) # Script end
I am on windows so I’ve copied aws-java-sdk-1.9.6.jar file to c:/opt/snaplogic/userlibs on all of the plex nodes, you need to copy 3rd party jars to all of the plex nodes and make sure to save it at a consitent, same path on all nodes.
Also on my nodes I have edited/created a credentials file located here C:\Users\Bkukadia\.aws\credentials which contains these key=value pairs
aws_access_key_id=AWSKEYAKIAIGFUBXI
aws_secret_access_key=AWSSECRET8++Kg6QNMX6I think you can also use IAM roles but I am not much familiar with it.
For more details on aws java sdk check this out - AWS SDK for Java
- mohamadelmardinNew Contributor III
This is an excellent post cause we are running into similar problem with integrating to different AWS services such as publishing to SNS topic and we are trying to install Boto3 on the JCC node so hopefully the python script will pick it up. Apparently from this post installing python Boto3 won’t work and we need to and download AWS JAVA SDK instead and invoke it from Python script. This is a good solution to us but we need to know the java import libraries for SNS.
Can someone help us by posting a similar code snippet but for publishing to SNS instead of S3 using the JAVA sdk from Python? What are the import libraries we need?
- mohamadelmardinNew Contributor III
Following up on my prevoius question, I guess I found the answer on this link:
http://docs.aws.amazon.com/sns/latest/dg/using-awssdkjava.html
But feel free to add to it or provide a Python code snippet for SNS if I missed anything…
- jaybodraNew Contributor III
# Import the interface required by the Script snap.
from com.snaplogic.scripting.language import ScriptHook
import java.util
import urlparse
import urllib2
import sys
import socket
import json#referencing aws java lib
sys.path.append(‘/opt/snaplogic/userlibs/aws-java-sdk-1.11.156.jar’)
sys.path.append(‘/opt/snaplogic/userlibs/commons-logging-1.1.3.jar’)
sys.path.append(‘/opt/snaplogic/userlibs/jackson-databind-2.6.6.jar’)
sys.path.append(‘/opt/snaplogic/userlibs/jackson-core-2.6.6.jar’)
sys.path.append(‘/opt/snaplogic/userlibs/jackson-annotations-2.6.0.jar’)
sys.path.append(‘/opt/snaplogic/userlibs/httpcore-4.4.4.jar’)
sys.path.append(‘/opt/snaplogic/userlibs/httpclient-4.5.2.jar’)
sys.path.append(‘/opt/snaplogic/userlibs/joda-time-2.8.1.jar’)#importing core/aws java lib
from java.io import ByteArrayInputStream
from java.lang import Exception
from com.amazonaws.auth import BasicAWSCredentials
from com.amazonaws.services.s3 import AmazonS3Client
from com.amazonaws.services.s3.model import PutObjectRequest
from com.amazonaws.services.s3.model import ObjectMetadata
from com.snaplogic.scripting.language import ScriptHookclass TransformScript(ScriptHook):
def init(self, input, output, error, log):
self.input = input
self.output = output
self.error = error
self.log = log# The "execute()" method is called once when the pipeline is started # and allowed to process its inputs or just send data to its outputs. def execute(self): self.log.info("Executing Transform script") while self.input.hasNext(): try: # Read the next document, wrap it in a map and write out the wrapper in_doc = self.input.next() try: #set "socket" time as 5 sec globally for s3 to read socket.setdefaulttimeout(5) #get object values from incoming json document bucket_name = $_bucket photo_id = in_doc.get("photo_id") listing_id = in_doc.get("listing_id") photo_url = in_doc.get("photo_url") #"parse" url for photo_name and set location for file to be saved on s3 parsed_url = urlparse.urlsplit(photo_url) path = parsed_url[2].split("/") photo_name = path[len(path) - 1] s3_location = $_subdir + $_pathdelimiter + listing_id + $_pathdelimiter + photo_name #creating s3 client #access photo content using "photo_url" url_response = urllib2.urlopen(photo_url, timeout=5) content_length = long(url_response.headers['Content-Length']) content_type = url_response.headers['Content-Type'] obj_metadata = ObjectMetadata() obj_metadata.setContentLength(content_length) obj_metadata.setContentType(content_type) data = url_response.read() bais = ByteArrayInputStream(data) bais.reset() try: basic_cred = BasicAWSCredentials("access_token","secret_token") s3 = AmazonS3Client(basic_cred) s3.putObject(PutObjectRequest(bucket_name, s3_location, bais, obj_metadata)) bais.close() wrapper = { "photo_id" : photo_id, "listing_id" : listing_id, "photo_url" : photo_url, "photo_name" : photo_name, "photo_downloaded" : True, "error" : None } except Exception as s3_error: wrapper = { "photo_id" : photo_id, "listing_id" : listing_id, "photo_url" : photo_url, "photo_name" : photo_name, "photo_downloaded" : False, "error" : "s3_error, " + str(s3_error) } except BaseException as photo_url_error: wrapper = { "photo_id" : photo_id, "listing_id" : listing_id, "photo_url" : photo_url, "photo_name" : photo_name, "photo_downloaded" : False, "error" : "photo_url_error, " + str(photo_url_error) } #output self.output.write(in_doc, wrapper) #error view output when error view is enabled except Exception as e: errWrapper = { 'errMsg' : str(e.args) } self.log.error("Error in python script") self.error.write(errWrapper) self.log.info("Finished executing the Transform script")
# The Script Snap will look for a ScriptHook object in the “hook”
#variable. The snap will then call the hook’s “execute” method.
hook = TransformScript(input, output, error, log)this is my very first snaplogic jython script and works well.
Couple of things to keep in mind,- Java datatypes are used by AWS and python datatypes are used by SL jython (basically python) script
- most of the python datatypes gets converted to java,but there are exception example ‘byte array’, which has different representation in java and python
- Please use error view to debug your script error
- You can run native java or python codes using subprocess module of jython
- most of the python trick works.
- snaplogic also uses aws sdk for its connection to various aws services. so the authentication mentioned in initial post might give you error.
you can write a java code and convert it to jython if it gets too complicated.
link to jython doc: http://www.jython.org/docs/library/indexprogress.html
there was a time when I tried to added more complexity to this code and I realized jython cannot perform and/or handle what I intend to do.
So my new approach was to use unix snap and run my actual python script. Yes actual python script
Hopefully some day snaplogic will add native Java and Python support.