SnapGPT - Security and Data Handling Protocols
Authors: Aaron Kesler, Jump Thanawut, Scott Monteith Security and Data Handling Protocols for SnapGPT SnapLogic acknowledges and respects the data concerns of our customers. The purpose of this document is to present our data handling and global data protection standards for SnapGPT. Overview & SnapLogic’s Approach to AI / LLM: SnapLogic utilizes high-quality Enterprise Language Learning Models (LLMs), selecting the most appropriate one for each specific task. Current support includes Azure OpenAI GPT, Anthropic Claude on Amazon Bedrock, and Google Vertex PaLM. Product & Data: Product Features & Scope: SnapGPT offers a range of features, each designed to enhance user experience and productivity in various aspects of pipeline and SQL query generation: Input Prompts: This feature allows customers to interact directly with the LLM by providing input prompts. These prompts are the primary method through which users can specify their requirements or ask questions to the LLM. Describe Pipeline: This skill enables users to obtain a comprehensive description of an existing pipeline. It helps in understanding and documenting the pipeline's structure and functionality. Analyze Pipeline: This feature ingests the entire pipeline configuration and analyzes it to make suggestions for optimization and improvement. It assists users in enhancing the efficiency and effectiveness of their pipelines. Mapper Configuration: Facilitates the configuration of the mapper snap by generating expressions to simplify the process of mapping input to output. Pipeline Generation: Users can create prototype pipelines using simple input prompts. This feature is geared towards streamlining the pipeline creation process, making it more accessible and less time-consuming. SQL Generation without Schema: Tailored for situations where the schema information is not available or cannot be shared, this feature generates SQL queries based solely on the customer's prompt, offering flexibility and convenience. SQL Generation with Schema (coming feb 2024): This advanced feature generates SQL queries by taking into account the schema of the input database. It is particularly useful for creating contextually accurate and efficient SQL queries. Data Usage & Opt-Out Options: At SnapLogic, we recognize the importance of data security and user privacy in the rapidly evolving Generative AI space. SnapGPT has been designed with these principles at its core, ensuring that customers can leverage the power of AI and machine learning while maintaining control over their data. Our approach prioritizes transparency, giving users the ability to opt-out of data sharing, and aligning with industry best practices for data handling. This commitment reflects our dedication to not only providing advanced AI solutions but also ensuring that these solutions align with the highest standards of privacy and data protection. Data Usage in SnapGPT: SnapGPT is designed to handle customer data with the utmost care and precision, ensuring that data usage is aligned with the functionality of each feature: Customer Input and Interaction: Customer inputs, such as prompts or pipeline configurations, are key to the functionality of SnapGPT. This data is used solely for the purpose of processing specific requests and generating responses or suggestions relevant to the user's query. No data is retained for model training purposes. Feature-Specific Data Handling: Each feature/skill of SnapGPT, like pipeline analysis or SQL generation, uses customer data differently. See the table below for details on each skill. Skill Name Description of the Skill Data Transferred to LLM Input Prompts Direct input prompts from customers are transferred to the LLM and tracked by SnapLogic analytics. Prompt details only; these are not stored or used for training by the LLM. Describe & Analyze Pipeline Allows customers to describe a pipeline, with the entire pipeline configuration relayed to the LLM. Entire pipeline configuration excluding account credential information. Mapper Configuration Enables sending input schema information within the prompt to the LLM for the “Mapper configuration” feature. Input schema information without account credential information. Pipeline Generation Uses input prompts to create pipeline prototypes by transmitting them to the LLM. Input prompts only; not stored or used for training by the LLM. SQL Generation W/out Schema Generates SQL queries based only on the customer's prompt in situations where schema information cannot be shared. Only the customer's prompt; no schema information is used. SQL Generation W/ Schema (Feb 2024) Generates accurate SQL queries by considering the schema of the input database. Schema of the input database excluding any account credentials, enhancing query accuracy. Future Adaptations: In the near future, we intend to offer customers opt-out options. Choosing to opt-out of including any environment-specific data in SnapGPT prompts can impact the quality of response from SnapGPT as it will lack additional context. As of the current version, usage of SnapGPT will include sending the data from the features listed above to the LLMs. We recommend that customers who are not comfortable with the described data transfers to wait for the opt-out option to become available. Impact of Opting Out: Choosing to opt-out of data sharing may impact the functionality and effectiveness of SnapGPT. For example, opting out of schema retrieval in SQL Generation may lead to less precise query outputs. Users are advised to consider these impacts when setting their data sharing preferences. Data Processing: Architecture: Data Flow: Data Retention & Residency: SnapLogic is committed to ensuring the secure handling and appropriate residency of customer data. Our data retention policies are designed to respect customer privacy while providing the necessary functionality of SnapGPT: Data Retention: No Retention for Model Training: SnapGPT is designed to prioritize user privacy. Therefore, no customer data processed by SnapGPT is retained for the purpose of model training. This ensures that user data is not used in any way to train or refine the underlying AI models. Storing Usage Data for Adoption Tracking: While we do not retain data for model training, SnapLogic stores usage data related to SnapGPT in Heap Analytics. This is strictly for the purpose of tracking product adoption and usage patterns. The collection of usage data helps us understand how our customers interact with SnapGPT, enabling us to continuously improve the product and tailor it to user needs. Data Residency: Location-Based Data Storage: Our control planes in the United States and the EMEA region adhere to the specific data residency policies of these locations. We ensure compliance with regional data protection and privacy laws, offering customers the assurance that their data is managed in accordance with local regulations. Controls – Admin, Groups, Users: SnapLogic provides robust control mechanisms for administrators, while ensuring that group and user-level controls align with organizational policies: Administrators have granular control over the use of SnapGPT within their organization. They can determine what data is shared with the LLM and have the ability to opt out of data sharing to meet specific data retention and sharing policies. Additionally, admins can control user access to various features and skills, ensuring alignment with organizational needs and security policies. Group Controls: Currently, groups do not have specific controls over SnapGPT. Group-level policies are managed by administrators to ensure consistency and security across the organization. User Controls: Users can access and utilize the features and skills of SnapGPT to which they are entitled. User entitlements are managed by administrators, ensuring that each user has access to the necessary tools for their role while maintaining data security and compliance. Guidelines for Secure and Compliant use of SnapGPT At SnapLogic, we understand the critical importance of data security and compliance in today’s digital landscape. As such, we are dedicated to providing our customers with the tools and knowledge necessary to utilize SnapGPT in a way that aligns with their internal information security (InfoSec) and privacy policies. This section offers guidelines to help ensure that your interaction with SnapGPT is both secure and compliant with your organizational standards. Customer Data Control: Customers are encouraged to actively manage and control the data they share with SnapGPT. By understanding and utilizing the available admin and user controls, customers can ensure that their use of SnapGPT aligns with their internal InfoSec and privacy policies. Best Practices for Data Sharing: We recommend that customers review and follow best practices for data sharing, especially when working with sensitive or confidential information. This includes using anonymization or pseudonymization techniques where appropriate, and sharing only the data in prompts and pipelines that is necessary for the task at hand. Integrating with Internal Policies: Customers should integrate their use of SnapGPT with their existing InfoSec and privacy frameworks. This integration ensures that data handling through SnapGPT remains consistent with the organization’s overall data protection strategy. Regular Review and Adjustment: Customers are advised to regularly review their data sharing settings and practices with SnapGPT, adjusting them as necessary to remain aligned with evolving InfoSec and privacy requirements. Training and Awareness: We also suggest that customers provide regular training and awareness programs to their users about the responsible and secure use of AI tools like SnapGPT, emphasizing the importance of data privacy and protection. Compliance: For detailed information on SnapLogic’s commitment to compliance with various regulatory standards and data security measures, please visit our comprehensive overview at SnapLogic Security & Compliance (https://www.snaplogic.com/security-standards). This resource provides an in-depth look at how we adhere to global data protection regulations, manage data security, and ensure the highest standards of compliance across all our products, including SnapGPT. For specific compliance inquiries or more information on how we handle compliance in relation to SnapGPT, please contact the SnapLogic Compliance Team at Security@snaplogic.com. For further details or inquiries regarding SnapGPT or any other SnapLogic AI services, please contact our SnapLogic AI Services Team ( ai-services@snaplogic.com). For more information on SnapLogic Security and Compliance: https://www.snaplogic.com/security-standards3.9KViews2likes0CommentsProject Structures and Team Rights Guide
Overview This document outlines the recommended organizational structure for Project Spaces, Projects and team rights within your SnapLogic org. Authors: SnapLogic Enterprise Architecture team Integration Storage Hierarchy In the SnapLogic platform, integrations (pipelines) are managed within the following hierarchy: Organization - Multiple customer organizations are configured based on their development environment e.g. DEV, QA, STAGING, UAT, and PROD etc. Project Space - Team in a Business Unit or Business Group with an access to a workspace to collaborate and implement integration at one central location Project - Group pipelines with an integration, aggregation, and reporting required for the type of function Pipeline - A specific implementation of an integration For example, the path “/MyDevOrg/SLEntArch/SamplePipelines/WorkdayToSnowflake” is broken down into the following components: Organization - MyDevOrg Project Space - SLEntArch Project - SamplePipelines Pipeline - WorkdayToSnowflake Recommended Hierarchy Naming Conventions Clarity is one of the most important factors when naming projects spaces, projects, and pipelines. Naming each level of the hierarchy must make sense to all developers and administrators so that integrations can be found quickly and easily. If using Triggered Tasks, it is important to ensure that no characters are used that will violate HTTP URI naming conventions. Further, you may wish to use characters that don’t require an URL encoding, such as spaces, commas, etc. Below is an example naming convention for your project spaces, projects, and pipelines: Project spaces are named based on business unit or project team: Sales_and_Marketing EnterpriseBI ProfessionalServices Projects are named based on business target endpoint Sales_and_Marketing/MarketingForecast EnterpriseBI/SalesDashboardReporting ProfessionalServices/CommunityExamples Integrations (pipelines) are named based on business function Sales_and_Marketing/MarketingForecast/Fall_Marketing_New_Customers_To_SalesForce EnterpriseBI/SalesDashboardReporting/Supplier_Invoices_To_Workday ProfessionalServices/CommunityExamples/Community_27986_Example_JSON_MultiArray_Join Keep in mind the shallow hierarchy (project space/project) when considering your naming scheme for project spaces and projects. In most orgs, it is acceptable to assign a project space to each business unit and allow that business unit to create projects within their project space based on integration function or target. However, If you expect a very large number of pipelines to be created by a single business unit, you might want to consider an allowance for multiple project spaces for a given business unit. Shared Folders Root Shared A special project named “shared” is added to each SnapLogic Organization (org). Using the org name in the above example, this would be /MyDevOrg/shared. This is commonly referred to as the “root shared” folder. This folder will always exist and is automatically assigned with Full Access (read, write, execute) to all members of the “admins” group of the org and Read-Execute access to all other users. As a best practice, the root shared folder should only contain objects (accounts, files, pipelines, and tasks) that all SnapLogic users in your org should have access to use. Some examples may include: SMTP account used for the Email Sender snap Readonly database account used to access common, public tables Shared expression libraries that contain global static variables or user defined functions for common string/date manipulation Shared pipelines such as error handlers or globally re-usable code Project Space Shared Another special project named “shared” is added to each project space in the org. Using the example path above, this would be /MyDevOrg/SLEntArch/shared. This folder will always exist under each project space and inherits the permissions assigned to the project space. As a best practice, the project space shared folder should only contain objects (accounts, files, pipelines, and tasks) that all SnapLogic users with access to the Project Space should have access to use. Some examples may include: Database accounts used to access common tables within your business unit Shared expression libraries that contain static variables and user defined functions common to your business unit Shared/reusable pipelines common to your business unit User Groups We recommend that you create the following Groups in all of your SnapLogic orgs: Operators - this group contains the users that may need to manually execute a pipeline but do not require Full Access to the projects Migrators - this group contains the users that will perform object migrations in your orgs but do not need to Execute pipelines You should also create Developer groups specific to each Project Space and/or Project within your org. Using the example project spaces and projects listed in the Naming Conventions section of this document, you may want to add the following Groups: Groups specific to Project Space Sales_and_Marketing_Developers EnterpriseBI_Developers ProfessionalServices_Developers Groups specific to Project MarketingForecast_Developers SalesDashboardReporting_Developers CommunityExamples_Developers You may choose to enable access for developers to only see objects within the project they are working in, or you could allow read-only access to all projects within their project space to allow for some cross-project design examples. Typically, the Developer groups will have Full Access in your development org for the Projects that they are working in, with Read-Execute access to the Project Space “shared” folder and Read-Execute access to the root “shared” folder. Developer groups will also have Read-Only access in all non-development orgs for the same Project Space “shared” and Projects that they can access in your development org. If you have a larger SnapLogic development community in your organization, you may wish to distribute the administration of Projects and create Admin groups for each Project Space who will be assigned ownership of the Project Space, which allows them to create new Projects and maintain all permissions within the Project Space. Default Users We recommend that the following service accounts be added to your org(s): snaplogic_admin@<yourdomain> - this user should own the root SnapLogic shared folder, and all/most SnapLogic Project Spaces in your org(s); add this user to the “admins” group snaplogic_service@<yourdomain> - this user should own all of your SnapLogic tasks and have permissions related to executing tasks for all Projects. Note that Read-Execute Access is required as a minimum; Full Access is required if any files are written back to the SLDB of the Project during processing. Add this user to the “Operators” group Note that during migration of tasks to your non-development org(s), you should either use the snaplogic_service@<yourdomain> user to perform the migration, or use the Update Asset Owner API to change the owner of the task after migration. Tasks are owned by the user that creates it; so if a user in the Migrators group performs the migration, they will be assigned as the owner and may not have permissions to successfully execute the task in the target org(s). Hierarchy Permissions Recommended access to the root “shared” project: admin@snaplogic.com - Owner “admins” group - Full Access “members” group - Read/Execute Access “Operators” group - Read/Execute Access “Migrators” group - Full Access “Support” group - Read/Execute Access You may wish to limit Execute Access to only certain teams. If so, change the “members” group to Read Only Access and grant Read/Execute Access to your desired team groups. If you perform migrations only within specific day/time windows, you can add/remove users from the Migrators group using a scheduled task that calls the Groups API to replace all members of the Migrators group and either remove all users from the group (close the migration window) or restore users to the group (open the migration window). Recommended access to the Project Space “shared” project: admin@snaplogic.com - Owner “admins” group - Full Access “members” group - Read-Only Access (optional) “Operators” group - Read/Execute Access “Migrators” group - Full Access “<Project>_Admins” group(s) - Full access in development “<Project>_Developers” group(s) - Read/Execute Access in development You may choose to grant Read-Only access to your <Project>_Admins and <Project>_Developers groups in non-development environments depending on your support team structure Recommended access to the Projects: admin@snaplogic.com - Owner “admins” group - Full Access “members” group - Read-Only Access (optional) “Operators” group - Read/Execute Access “Migrators” group - Full Access “<Project>_Admins” group(s) - Full Access (only in development) “<Project>_Developers” group(s) - Full Access (only in development) You may choose to grant Read-Only access to your <Project>_Admins and <Project>_Developers groups in non-development environments depending on your support team structure.3.9KViews2likes0Comments