ContributionsMost RecentMost LikesSolutionsRe: Finding Duplicate data existing in Database A good starting point is finding a SQL implementation - it will give you an idea of what’s possible on your database. For instance in PostgreSQL you would want to use INSERT … ON CONFLICT (product_id) … however the only options seem to be DO NOTHING or DO UPDATE SET …. It’s not possible to raise an error (which would kill the command) or run an arbitrary stored procedure. Instead you would need to use something like this WITH both as (SELECT id FROM TABLE table1 INTERSECT SELECT id FROM TABLE2) INSERT id, value INTO table VALUES (SELECT id, value FROM table2 WHERE id NOT IN both.id) AS X; INSERT * INTO duplicates FROM (SELECT * FROM table2 WHERE id IN both.id) AS X; (that’s off the top of my head - there may be small syntax errors.) This is why snaps don’t always implement “obvious” features like writing duplicates to an error view. We want to keep the database snaps as consistent as possible and sometimes a feature that is easy to implement on one database would require a lot of work under the covers on other databases. Faster PostgreSQL Bulk Loads available We are pleased to announce that ‘latest’ has an updated PostgreSQL bulk load snap with significantly better performance. We’re routinely seeing 3x performance in our tests but your mileage may vary. This is mostly drop-and-go but there is a small difference from the existing behavior. When an error occurred the existing snap would sometimes write to both output and error views depending upon where the error occurred. It’s now consistent with other snaps and does not write to the output view if there’s an error. This applies to all settings for error handling. Unfortunately this modification is not applicable to other snaps. Re: SnapLogic SDK - Accessing pipeline properties and parameters There’s quite a few additional values you may find interesting. Skimming the code I see pipe.args.get(*), e.g., pipe.args.get(‘CONTENT_TYPE’) pipe.flags.immediate_mode pipe.flags.is_suggest pipe.instance_version pipe.property_map.instance_version pipe.property_map.info.label.value pipe.plexPath pipe.projectPath pipe.ruuid pipe.suspendable pipe.target_runtime pipe.tmpDir pipe.update_time pipe.update_user_id There’s already convenience functions for some of these values, e.g., propertyValues.inImmediateMode(), but I don’t have a list of them at hand. Related - if you haven’t seen it before java can be bad at deleting temporary files even if you set the ‘deleteOnExit()’ flag and explicitly delete it in your finalizer. At least on Linux systems - and that’s what the groundplexes and cloudplexes usually run on. The ‘pipe.tmpDir’ is at a location managed by the snap executor and it deletes everything in that directory after all of the child threads (snaps) have exited. It’s a really good idea to use it instead of java.getProperty(“java.io.tmpdir”) since these undeleted files can cause problems over time. Re: Insert error with JDBC insert and PGSQL Could you submit a ticket for that? It should ideally contain the table’s schema, a sample input document, and any errors that show the generated sql. For background we perform a database metadata query to learn what the database expects or provides and have a few dozen classes that convert between the database types and Document field types. In theory it should be easy to make the correct choice but even in the standard field types there are subtle differences, sometimes even between different major versions of a single product, that can cause problems with the conversions. Off the top of my head the only reason we would try to use a varchar instead of a date or int is if the database metadata is reporting an unrecognized field type. That might be due to a new field type in a recently released version, a locally installed extension (e.g., postgis) that defines its own types, or simply an oversight. In some of those cases we can add that value to the internal lookup tables in the next release but we need to know what we need to add. Thanks Re: Insert error with JDBC insert and PGSQL It’s… complicated. You should generally assume that different snaps use different connections. In rare situations it looks like a connection is passed from one snap to the next but there should be a reset between snaps. E.g., with postgresql we reset the default schemas. Within an individual snap it’s… complicated. There’s usually a single connection used during both validation and execution. However if the snap is at the end of a pipeline that has a long delay between launch and the first documents hitting the snap it’s possible that they’ll use a different connection. It’s theoretically possible that a connection will be changed execution but I don’t think that happens in practice. We shouldn’t do anything during validation that’s required during execution but something could have slipped through. Could you write up two tickets? One with enough details about this pipeline for us to try to duplicate it, and a more general one asking for support for pg_temp? We can check a few things in the current snaps but full support for pg_temp (e.g., guaranteeing the same connection in all snaps) will require a lot of research into its feasibility. Re: Failed to initialize pool: Database may be already in use: "C:/opt/snaplogic/run/bstore-Emirates_Training-sidekick-dev-1.mv.db" If you are using a generic JDBC account try adding “AUTO_SERVER” = “TRUE” to the Url properties. You’ll also need to upload the h2 jar at ‘Jdbc jars’. If you’ve explicitly specified the database URL try appending “;AUTO_SERVER=TRUE” to the URL everywhere you call it. You’ll want to do the same if you start the server with a script. In any case you should start with a generic JDBC snap and account and verify that you can access the database with an existing snap. That lets you rule out any unexpected issues such as bad permissions, incompatible jars, etc., before you start developing your own code. That can save you a tremendous amount of effort down the road. Re: Load a zip file from a download url link Note for readers from the future: the snap peeks at the first few bytes of the file to determine if it’s a .zip or .7z file. It doesn’t look at the file extension. Those are standard values and it’s not an exaggeration to say that if those bytes aren’t set then it’s not a .zip or .7z file. Since some applications that support additional formats will quietly use the correct decoder it’s possible that you think you have a .zip file but it’s actually something else. When in doubt it’s best to create an empty archive, or one with just a single entry, and attach it your question. That allows us to verify how the third party library we use is identifying the file. There are some proprietary extensions the zip format, e.g., to support strong encryption, and it’s possible that the library is returning an unrecognized MIME type. We won’t be able to read the encrypted entries but we could provide a more meaningful error message in this case. Re: Accounts and Snaps This is a surprisingly difficult issue and it’s under active research for the best approach. The short answer is that each snap is an independent process and each account for each snap is also an independent instance. That’s not a problem - as you suggest we can use the account information to create a key into a static cache in order to retrieve a previously established connection. The not-so-short answer is that we recently switched to using a single classloader for all snaps and accounts in a pipeline but then encountered a situation where one pipeline had snaps with accounts that depend upon mutually incompatible driver jars. I know we were discussing reintroducing separate classloaders for accounts but I don’t know the details. E.g., are the classes marked with an @Account annotation loaded via a different classloader, or only the objects set via Guice injection? Let me know if you need to know the details. In practice? You should be fine using a static cache (e.g., a Guice Cache object, or a WeaklyLinkedMap) as long as your cache is declared to use the most general thing possible. E.g., return a DataSource instead of an OracleDataSource, much less an Oracle12xArticulatingGumboDataSource. As an aside - unless you have a compelling reason to do otherwise your cache should contain factories instead of specific instances. JDBC DataSources instead of JDBC Connections, JMS Connections instead of JMS Sessions. It’s not any more difficult to implement and it guarantees each snap gets a fresh, thread-safe copy. Re: Setting up Kerberos on SnapLogic groundplex for authentication to Cloudera - Hive account and snap pack Creating a keytab file is straightforward. If you are creating a keytab file for a user with a password you should use the ‘kutil’ program. $ ktuil ktutil: add_entry -password -p principal -k knvo -e enctype (enter password) ktutil: write_kt keytabfile ktutil quit where principal is your principal, e.g., bob@example.com or bob/hdfs@example.com for a more restricted principal knvo is the key version number. 1 should be fine. enctype is the encryption type. This is typically something like aes128-cts-hmac-sha1-96, des3-cbc-sha1, or arcfour-hmac. You should check with your system administrator to get the precise encryption types required. You can call this line multiple times, once for each encryption type. keytabfile is your keytab file. It traditionally ends with the .keytab extension. You can verify the new file with ‘klist -kt keytabfile’. If you are creating a keytab file for a server you must use the ‘kadmin’ program. If the server principal does not exist yet: $ kadmin kadmin: add_principal principal kadmin: ktadd -k keytabfile principal If the server principal already exists: $ kadmin kdamin: ktadd -k keytabfile -norandkey principal where principal is something like “hdfs/172.3.1.7@MYORG.EXAMPLE.COM”.