Big Data

8 Topics

HDFS Reader - Reading a partition with space
I am using HDFS Reader snap to read the data from Hadoop for the ETL. I am unable to read few partitions which has space character. Same partitions, i am able list using hadoop fs -ls command using escape character before the space(\ ). I have tried replacing space with escape character & %20, but it didn’t work. Can any one suggest if you know any workaround reading the partitions with space.
priyankaan
5 years ago Place Designing and Running Pipelines
2KViews
0likes
0Comments
Updating Databricks/Delta Lake tables in Standard Pipelines
Hi community, I was wondering if anyone has tried doing Inserts/Updates to Databricks/Delta Lake Table using the generic JDBC Snaps (or any other work around if you have it). Using the latest Databricks JDBC Driver, I was able to setup an account successfully but I only seem to be able to do SELECT statements using the Generic JDBC - Execute Snap, if I try any other statement like an INSERT/UPDATE, I get the following error: Failure: SQL operation failed, Reason: SQL [null]; Error message not found: NOT_IMPLEMENTED. Can’t find resource for bundle java.util.PropertyResourceBundle, key NOT_IMPLEMENTED, Resolution: Please check for valid Snap properties and input data.
fkhan060
5 years ago Place Designing and Running Pipelines
2.8KViews
0likes
1Comment
Duplicate data using SQL offset
We are attempting to load transactional tables on a pipeline we created Issue faced - While loading data from a transactional table by leveraging the help of limit and offset in SQL SERVER select snap (partition and loading), we are getting duplicate records on the destination table. Cause expected – While loading data, since the source table is getting updated(insertion, deletion and updates) limit and offset values may change with the data mapping, resulting in duplication of reads. We looked at the documentation to understand how the offset feature works but not much data there. Can someone share some further insight on this feature and how it works that may be contributing to the above issue
debra_paponette
5 years ago Place Designing and Running Pipelines
2.2KViews
0likes
0Comments
KERBEROS issues - HDFS cluster
Hi , We’ve implemented Kerberos in our hadoop cluster, but we are having issues with our pipelines. We are getting this error 🙂 "Reason: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS] " “Caused by: java.util.concurrent.ExecutionException: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS] at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:206) at com.snaplogic.snap.api.binary.SimpleReader.doRead(SimpleReader.java:332) … 30 more Caused by: org.apache.hadoop.security.AccessControlException: SIMPLE authentication is not enabled. Available:[TOKEN, KERBEROS] at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62” All was configured correctly , keytabs are ok, i’m able to do a kinit from any groundplex nodes, time on KDC servers and kerberos clients is synced. Pipelines worked ok for some while, but we saw that after we restarted jcc process on the groundplex nodes pipelines started to fail. JCE extension was installed as per https://docs-snaplogic.atlassian.net/wiki/spaces/SD/pages/2015960/How+to+Configure+a+Groundplex+for+CDH+with+Kerberos+Authentication Any ideas ? Thank you, Petrica
Petrica
8 years ago Place Designing and Running Pipelines
5.6KViews
0likes
3Comments
HBase Integration
Are there any plans to provide snaps which support interaction with HBase?
ScottWiemann
8 years ago Place Designing and Running Pipelines
3.6KViews
0likes
2Comments
Setting up Kerberos on SnapLogic groundplex for authentication to Cloudera - Hive account and snap pack
I’m looking for information on how to set up Kerberos on the SnapLogic groundplex for authentication to Cloudera. I want to use Hive account and snap pack. I see this documentation: https://docs-snaplogic.atlassian.net/wiki/spaces/SD/pages/2015960/How+to+Configure+a+Groundplex+for+CDH+with+Kerberos+Authentication Is that all there is to it? Or are there more steps?
PSAmmirata
8 years ago Place Designing and Running Pipelines
6.8KViews
0likes
8Comments
Parquet without Hadooplex
Has anyone come up with a “simple” process to convert files to Parquet format without the need to use a Hadooplex and without going through HDFS. I would like to read a text file from S3 and convert it to Parquet back onto S3.
gomezn
9 years ago Place Designing and Running Pipelines
3.8KViews
0likes
2Comments
Configuring SnapLogic Hive Account for Hive on HortonWorks HDP (No Kerberos)
Below are steps to configure the Hive Account in SnapLogic to connect to a Hive Database on HortonWorks Step 1: Go to Product Downloads | Cloudera and download the HortonWorks JDBC Driver for Apache Hive Step 2: If Kerberos is not enabled on HortonWorks HDP then Create a Hive Account and use the sample below as a reference
pkona
9 years ago Place Designing and Running Pipelines
2.7KViews
0likes
0Comments