Writing a forEach

Hello,

I have a use case where I am getting my input in tab delimited form, which after doing a groupBy N I am converting to an array so that I can make some comparison operation.

I need to extract the records from this array group for below mentioned conditions,

  1. All records between a line starting with ‘8’ and value of 5th column as 3 till line starting with ‘9’.
  2. All records between a line starting with ‘8’ and value of 5th column as 4 till line starting with ‘9’.
  3. All records between a line starting with ‘8’ and value of 5th column as 5 till line starting with ‘9’.

Wanted to write a forEach to achieve 3 arrays as mentioned above. After getting the 3 arrays I need to check if all value of 3rd col of array1 are present in value of 4th col of array2 and pic only the ones that are present.

For doing all these I wanted to get some help to be able to get the desired output in optimum way.

Can you post some sample input records and expected output from that input.

Input:

   6	1151	10	04222016	0	0	0	4.4	10000978	6	4	UNIX			ABC	Business.4.G.0011
    8	1151	10	04222016	4	0	0	4.4	10000978	6	4	UNIX				
    4	1111111110029069	01152014	14171	898	898	500000	898	0	000	000	840	000	000	000	000	000	000	000									35928	07222014	10062014															
    4	1111111110029150	01152014	14171	000	000	1500000	000	0	000	000	840	000	000	000	000	000	000	000									1476	07172014																
    4	1111111110029440	01152014	14171	000	000	500000	000	0	000	000	840	000	000	000	000	000	000	000									48916	06172014																																																			
    9	1151	10	04222016	16	2380	0	4.4	10000978	6	4	UNIX
    8	1151	10	04222016	3	0	0	4.4	10000978	6	4	UNIX				
    4	1111111110029069	01152014	14171	898	898	500000	898	0	000	000	840	000	000	000	000	000	000	000									35928	07222014	10062014															
    4	1111111110029150	01152014	14171	000	000	1500000	000	0	000	000	840	000	000	000	000	000	000	000									1476	07172014																
    4	1111111110029440	01152014	14171	000	000	500000	000	0	000	000	840	000	000	000	000	000	000	000									48916	06172014																
    4	1111111110029580	01152014	14171	000	000	500000	000	0	000	000	840	000	000	000	000	000	000	000									147909	06102015	10062014															
    4	1111111110029630	01152014	14171	000	000	000		0	000	000	840	000	000	000	000	000	000	000																										
    4	1111111110029770	01152014	14171	107398	122210	500000	122210	0	228	000	840	000	000	000	000	000	000	000									228	06182014	07062014															
    4	1111111110029879	01152014	14171	000	000	2500000	000	0	000	000	840	000	000	000	000	000	000	000									3488	06052014																
    9	1151	10	04222016	16	2380	0	4.4	10000978	6	4	UNIX				
    8	1151	10	04222016	31	0	0	4.4	10000978	6	4	UNIX				
    9	1151	10	04222016	31	0	0	4.4	10000978	6	4	UNIX				
    8	1151	10	04222016	32	0	0	4.4	10000978	6	4	UNIX				
    9	1151	10	04222016	32	0	0	4.4	10000978	6	4	UNIX				
    7	1151	10	04222016	0	141182	122410553	4.4	10000978	6	4	UNIX				Business.4.G.0011

Output 2 arrays

8	1151	10	04222016	4	0	0	4.4	10000978	6	4	UNIX				
4	1111111110029069	01152014	14171	898	898	500000	898	0	000	000	840	000	000	000	000	000	000	000									35928	07222014	10062014															
4	1111111110029150	01152014	14171	000	000	1500000	000	0	000	000	840	000	000	000	000	000	000	000									1476	07172014																
4	1111111110029440	01152014	14171	000	000	500000	000	0	000	000	840	000	000	000	000	000	000	000									48916	06172014																																																			
9	1151	10	04222016	16	2380	0	4.4	10000978	6	4	UNIX

And

8	1151	10	04222016	3	0	0	4.4	10000978	6	4	UNIX				
4	1111111110029069	01152014	14171	898	898	500000	898	0	000	000	840	000	000	000	000	000	000	000									35928	07222014	10062014															
4	1111111110029150	01152014	14171	000	000	1500000	000	0	000	000	840	000	000	000	000	000	000	000									1476	07172014																
4	1111111110029440	01152014	14171	000	000	500000	000	0	000	000	840	000	000	000	000	000	000	000									48916	06172014																
4	1111111110029580	01152014	14171	000	000	500000	000	0	000	000	840	000	000	000	000	000	000	000									147909	06102015	10062014															
4	1111111110029630	01152014	14171	000	000	000		0	000	000	840	000	000	000	000	000	000	000																										
4	1111111110029770	01152014	14171	107398	122210	500000	122210	0	228	000	840	000	000	000	000	000	000	000									228	06182014	07062014															
4	1111111110029879	01152014	14171	000	000	2500000	000	0	000	000	840	000	000	000	000	000	000	000									3488	06052014																
9	1151	10	04222016	16	2380	0	4.4	10000978	6	4	UNIX

Thanks krupalibshah for sharing the sample data. I would like to know how big is your input file ? Reason I am asking you this because in Snaplogic I am not sure if it is easily doable because in your case when you are making an array you need to keep track of last record too so that you know in which array you have to put the record. I was thinking if a script task can do this for you but it will not because script task works on single record. However, if your input file is not that big you can easily do this using any programming/scripting language like Python.

Hi Aditya,

Can you please let us know what is approx size limit permissible? Currently we have less clarity on the same.

Thanks
Suvigya.

Hi Suvigya,

As I mentioned in my previous post, I am not sure how we can do this easily in Snaplogic, may be someone more expert in it will give you some good idea. I asked you about size because if it is not that big like few MB(s) or depending on the machine on which you are processing this file, you can write a simple program in any programming language, I would prefer Python because you can do complex things with few lines of code in it, you can achieve desired output very easily. Implementing your requirement is pretty easy in Python.

Thanks
Aditya

Hi Suvigya,

I just read your another post regarding making multiple documents as one, though you have mentioned in this post too that you are using GroupByN but I didn’t know that with the help of GroupByN you can combine all the documents as one. So if you use a script snap after the group by N snap you can implement your logic either in Java/Python/JavaScript.

Thanks
Aditya

Actually, a script task doesn’t really work on a single record. It is simply that the example has you read in one record at a time.

For example, I had to create a special CSV script to replace the CSV script that snaplogic comes with. That script reads in, and keeps, the first row as a header, and reads the others using values from the first row. That first row could easily have been a thousand, and joining with the first row could have been processing all the data in the snap.

The size limit is based on your system, and various things like how soon you want it. Unfortunately, I think a lot of memory is not freed until you finish, though I could be wrong. Of course you should try to limit data requirements.

Basically, the snap has five processing points.

  1. Before you read any data.
  2. As you read the data.
  3. At some breakpoint you determine.
  4. Errors/exceptions as they occur.
  5. At the end.

And you can write out the records to any area as you want on your conditions and timing.

Steve