Title: Snowflake
Locale: en
URL: https://sensorswave.cn/en/docs/data-center/pipeline/destinations/snowflake/
Description: Configure a Snowflake export pipeline to regularly sync event and user data to your Snowflake data warehouse
The Snowflake connector syncs event and user data from Sensors Wave to your Snowflake data warehouse on a recurring schedule, enabling your data team to run deep SQL analysis and join with other business datasets.

Typical use cases:

- Sync user behavior events to Snowflake to join with orders and CRM data
- Export user profile data for dbt modeling and data mart construction
- Retain historical event data from Sensors Wave to meet compliance or long-term analytics requirements

The Snowflake connector supports two export types:

- **Event export**: Incrementally syncs user behavior events to the `SENSORSWAVE_EVENTS` table (default, customizable at creation)
- **User export**: Incrementally syncs user profile attributes to the `SENSORSWAVE_USERS` table (default, customizable at creation)

Each export type requires a separate pipeline.

## Prerequisites

Before creating a pipeline in Sensors Wave, complete the following setup in your Snowflake account. We recommend creating a dedicated role, user, and database for Sensors Wave to follow the principle of least privilege.

### Step 1: Create a Virtual Warehouse

Skip this step if you already have a warehouse to use.

```sql
CREATE WAREHOUSE SENSORSWAVE_WH
  WITH WAREHOUSE_SIZE = 'XSMALL'
  AUTO_SUSPEND = 600
  AUTO_RESUME = TRUE
  INITIALLY_SUSPENDED = TRUE;
```

### Step 2: Create the Target Database and Schema

```sql
CREATE DATABASE SENSORSWAVE_DB;
CREATE SCHEMA SENSORSWAVE_DB.SENSORSWAVE_SCHEMA;
```

### Step 3: Create a Dedicated Role and Grant Permissions

```sql
CREATE ROLE SENSORSWAVE_ROLE;

GRANT USAGE ON WAREHOUSE SENSORSWAVE_WH TO ROLE SENSORSWAVE_ROLE;
GRANT USAGE ON DATABASE SENSORSWAVE_DB TO ROLE SENSORSWAVE_ROLE;
GRANT USAGE ON SCHEMA SENSORSWAVE_DB.SENSORSWAVE_SCHEMA TO ROLE SENSORSWAVE_ROLE;
GRANT CREATE TABLE ON SCHEMA SENSORSWAVE_DB.SENSORSWAVE_SCHEMA TO ROLE SENSORSWAVE_ROLE;
```

### Step 4: Generate an RSA Key Pair

The Snowflake connector uses Key Pair Authentication and does not support username/password login. Run the following commands in your local terminal to generate a key pair:

```bash
# Generate a 2048-bit RSA private key
openssl genrsa -out sensorswave_rsa_key.p8 2048

# Extract the public key
openssl rsa -in sensorswave_rsa_key.p8 -pubout -out sensorswave_rsa_key.pub
```

Extract the public key content (without PEM header/footer lines and newlines) for the next step:

```bash
grep -v "PUBLIC KEY" sensorswave_rsa_key.pub | tr -d '\n'
```

> **Tip**: To add password protection to the private key, include the `-aes256` flag in the generation command, for example: `openssl genrsa -aes256 -out sensorswave_rsa_key.p8 2048`. If you set a passphrase, you must enter it in the `Private Key Passphrase` field under "Configuration Reference".

### Step 5: Create a User and Assign the Public Key

```sql
CREATE USER SENSORSWAVE_USER
  DEFAULT_ROLE = SENSORSWAVE_ROLE
  DEFAULT_WAREHOUSE = SENSORSWAVE_WH
  RSA_PUBLIC_KEY = 'MIIBIjANBgkqh...';    -- Replace with your public key content (no header/footer)

GRANT ROLE SENSORSWAVE_ROLE TO USER SENSORSWAVE_USER;
```

> **Note**: The private key file (`sensorswave_rsa_key.p8`) is a sensitive credential. Store it securely. Sensors Wave encrypts private keys at rest and will not display them again after saving.

## Create a Pipeline

After completing the Snowflake prerequisites, create a pipeline in Sensors Wave.

1. Go to **Data Center → Pipeline** and click **New Pipeline**.
2. In the dialog, select **Snowflake** as the destination and click **Next**.
3. Select the **data source type**:
   - **Events**: Sync user behavior events
   - **Users**: Sync user profile attributes
4. Fill in the Snowflake connection settings (see "Configuration Reference" below).
5. Click **Test Connection** and wait for verification to pass.
6. Configure the **execution frequency** (see "Execution Frequency" below).
7. To sync only specific events, configure event filtering in **Advanced Options** (events export only).
8. Click **Save**. The pipeline starts immediately and automatically creates the target table in Snowflake.

## Configuration Reference

### Connection Settings

| Field | Required | Description | Example |
|-------|----------|-------------|---------|
| `Account URL` | Yes | Your Snowflake account URL | `https://abc12345.us-east-1.snowflakecomputing.com` |
| `User` | Yes | Snowflake username | `SENSORSWAVE_USER` |
| `Private Key` | Yes | PEM-format private key content (including header/footer lines) | `-----BEGIN RSA PRIVATE KEY-----\n...` |
| `Private Key Passphrase` | No | Passphrase for an encrypted private key; leave blank if unencrypted | — |
| `Database` | Yes | Target database name | `SENSORSWAVE_DB` |
| `Warehouse` | Yes | Compute warehouse name | `SENSORSWAVE_WH` |
| `Schema` | Yes | Target schema | `SENSORSWAVE_SCHEMA` |
| `Table` | Yes | Target table name, customizable; defaults to `SENSORSWAVE_EVENTS` (events) or `SENSORSWAVE_USERS` (users); created automatically on save | `SENSORSWAVE_EVENTS` |
| `Role` | No | Snowflake role to use; defaults to the user's default role | `SENSORSWAVE_ROLE` |

### Execution Frequency

**Interval Execution**: Triggered at fixed intervals (1–23 hours) starting from when the pipeline is created.

| Option | Recommended Use |
|--------|----------------|
| Every 1 hour (default) | Most analytics use cases |
| Every 2 hours | Reduce frequency for high-volume data |
| Every 6 hours | Regular incremental sync |
| Every 12 hours | Low-frequency updates such as user profiles |
| Every 24 hours | Daily batch export |

**Scheduled Execution**: Triggered at a fixed time in a specified timezone, ideal for off-peak execution windows. Select a timezone and time; the pipeline runs once daily at that moment.

### Timezone Configuration

Scheduled execution requires a timezone. The timezone is resolved using the following priority:

> **User-configured timezone > Project timezone > System default timezone (CN clusters: `Asia/Shanghai`; overseas clusters: `UTC`)**

If no timezone is configured, the project-level timezone setting is used.

## Data Model

### Events Table (default `SENSORSWAVE_EVENTS`)

Event data is written to this table in append-only mode. The examples below use the default table name.

| Column | Type | Description |
|--------|------|-------------|
| `time` | `TIMESTAMP_NTZ` | Event timestamp (local time in SENSORSWAVE's server timezone) |
| `distinct_id` | `VARCHAR(128)` | Client-side user identifier |
| `trace_id` | `VARCHAR(128)` | Request trace ID |
| `ssid` | `BIGINT` | Server-side user identifier (SSID) |
| `anon_id` | `VARCHAR(128)` | Anonymous ID |
| `login_id` | `VARCHAR(128)` | Login ID |
| `event` | `VARCHAR(128)` | Event name, e.g., `$pageview`, `purchase` |
| `properties` | `VARIANT` | Event properties JSON |
| `user_properties` | `VARIANT` | Snapshot of user properties at event time |
| `received_at` | `TIMESTAMP_NTZ` | Server receive time (local time in SENSORSWAVE's server timezone) |

Example: querying `VARIANT` properties:

```sql
-- Query order_id and total_amount from event properties
SELECT
    time,
    event,
    properties:order_id::STRING AS order_id,
    properties:total_amount::FLOAT AS total_amount
FROM SENSORSWAVE_EVENTS
WHERE event = 'purchase'
  AND time >= '2026-04-01'
LIMIT 100;
```

### Users Table (default `SENSORSWAVE_USERS`)

User data is written to this table using MERGE (matched by `ssid`: update if exists, insert if not).

| Column | Type | Description |
|--------|------|-------------|
| `ssid` | `BIGINT` | Server-side user identifier, primary key |
| `login_id` | `VARCHAR(128)` | Login ID |
| `anon_id` | `VARCHAR(128)` | Anonymous ID |
| `properties` | `VARIANT` | User profile attributes JSON |
| `created_at` | `TIMESTAMP_NTZ` | User creation time |
| `updated_at` | `TIMESTAMP_NTZ` | Last updated time for user attributes |

Example: querying `VARIANT` user properties:

```sql
-- Query membership level from user properties
SELECT
    ssid,
    login_id,
    properties:membership_level::STRING AS membership_level,
    updated_at
FROM SENSORSWAVE_USERS
WHERE properties:membership_level IS NOT NULL
LIMIT 100;
```

## Event Filtering

Event filtering applies to **event exports only** and supports two modes:

- **Include events (allowlist)**: Export only the events listed; all other events are skipped
- **Exclude events (blocklist)**: Export all events except those listed

Configure the filter mode and event names (e.g., `$pageview`, `purchase`) in **Advanced Options**.

> **Note**: Filtering changes take effect in the next sync window after the current window completes; already-exported data is not reprocessed.

## Test Connection

After clicking **Test Connection**, the system validates your Snowflake configuration in the following steps:

1. **Establish connection**: Verify that the `Account URL`, username, and private key are valid
2. **Execute SQL**: Verify the user has basic query permissions
3. **Check warehouse**: Verify the specified `Warehouse` exists and the user can use it
4. **Check database**: Verify the specified `Database` exists and the user can access it
5. **Check schema**: Verify the specified `Schema` exists, the user can access it, and has `CREATE TABLE` permission

Each step's result is shown in real time to help you quickly pinpoint issues.

### Common Connection Errors

| Error | Possible Cause | Resolution |
|-------|---------------|------------|
| Connection failed | Incorrect `Account URL` format or mismatched private key | Check the URL format; verify you are using the correct private key file |
| Warehouse not found | Typo in warehouse name or warehouse does not exist | Check the warehouse name (case-sensitive); verify it exists in Snowflake |
| No CREATE TABLE permission | Role lacks `CREATE TABLE` permission | Run `GRANT CREATE TABLE ON SCHEMA ... TO ROLE ...` |
| Invalid private key format | Key content was corrupted during copy | Ensure the private key includes the full header and footer lines (`-----BEGIN RSA PRIVATE KEY-----`) |

## Pipeline Management and Monitoring

### Pipeline Status

A pipeline's lifecycle states are:

```
New → Running → Stopped (irreversible)
```

> **Note**: Stopping a pipeline is irreversible. All scheduling stops and no new sync tasks are created. To resume exporting, you must create a new pipeline. Historical run records remain accessible after stopping.

### Viewing Run History

In the pipeline detail page's **Runs** tab, view execution details for each sync task:

- **Status**: Running / Success / Failed
- **Data range**: The time window covered by this sync
- **Rows exported**: Number of rows successfully written to Snowflake
- **Execution time**: Task start and end timestamps
- **Logs**: Click a row to expand and view detailed execution logs

### Viewing Metrics

The **Metrics** tab shows overall pipeline performance (defaults to the last 30 days):

- Success / failure run count trend charts
- Daily exported rows trend chart
- Supports custom date range filtering

## Historical Data Backfill

### What is Backfill

After a pipeline is created, incremental sync only processes new data from the pipeline creation time forward. To load data for a period before the pipeline was created, use the backfill feature.

### Steps

1. In the pipeline detail page, switch to the **Backfill** tab.
2. Click **Start Backfill** and select the date range to backfill (start date and end date).
3. Confirm the estimated number of windows and click **Confirm**.

Backfill uses the project timezone to split the range into calendar-day windows and processes them serially.

### Backfill and Incremental Sync Interaction

- While a backfill is running, the pipeline's scheduled incremental sync **automatically pauses**—no data is lost; sync resumes from where it left off after the backfill completes or is cancelled
- Only one backfill can run at a time per pipeline
- If a backfill is interrupted (e.g., service restart), it automatically resumes from the last completed window—completed windows are not re-processed

### Cancel Backfill

Click **Cancel** while a backfill is in progress to stop it. The cancellation takes effect after the current window finishes; the in-progress window is not interrupted.

## Delivery Semantics and Data Consistency

| Data Type | Task Type | Write Method | Delivery Semantics |
|-----------|-----------|-------------|-------------------|
| Events | Scheduled incremental sync | Append | at-least-once |
| Events | Historical backfill | Idempotent overwrite per window | exactly-once (within a window) |
| Users | Scheduled incremental sync / Historical backfill | Upsert by `ssid` | exactly-once |

## Timezone Handling

### Timezone of Time Columns

All time columns in Snowflake (`time`, `received_at`, `created_at`, `updated_at`) use `TIMESTAMP_NTZ` (no timezone information) and store the **local time in SENSORSWAVE's server timezone**. CN clusters typically use `Asia/Shanghai` (UTC+8); overseas clusters typically use `UTC`.

**Example (CN cluster)**: If the server timezone is `Asia/Shanghai` (UTC+8) and an event occurred at 2026-04-13 15:30:00 Beijing time, Snowflake stores `2026-04-13 15:30:00`.

### What Timezone Affects

| Setting | Timezone Impact |
|---------|----------------|
| Scheduled execution trigger time | Calculated in the configured timezone (e.g., `Asia/Shanghai` 08:00 fires at 8 AM Beijing time) |
| Backfill date range | Split into calendar-day windows (00:00–24:00) using the project timezone |
| Time column values in Snowflake | Fixed to the SENSORSWAVE's server timezone, independent of user-configured timezone |

## Notes

- **Stopping a pipeline is irreversible**: Stopping requires creating a new pipeline to resume; historical run records are retained
- **Private key format**: `Private Key` must be the full PEM format including `-----BEGIN RSA PRIVATE KEY-----` and `-----END RSA PRIVATE KEY-----` header/footer lines
- **Target table is auto-created**: The target table is created automatically in Snowflake when you save the pipeline—no manual table creation needed
- **Incremental sync pauses during backfill**: Scheduled sync is paused while backfill runs and resumes automatically after—no data is lost
- **Time columns are in local time**: Snowflake time columns store SENSORSWAVE's server timezone, not UTC; account for this when doing cross-timezone analysis
- **Duplicate event rows**: `SENSORSWAVE_EVENTS` may contain a small number of duplicate rows.

  **Why this happens**: Scheduled incremental sync for events uses append writes with at-least-once delivery semantics. Duplicates come primarily from two scenarios.

  - **Scenario 1: Incremental task retries on failure**. If a scheduled incremental sync task hits a node failure, network error, or Snowflake write timeout, it retries the current batch to avoid data loss. If the previous attempt actually succeeded in part or in full, the retry appends duplicate rows to the events table. This scenario is rare and limited in scope.
  - **Scenario 2: Backfill overlaps with incremental sync**. Historical backfill is idempotent within its own window (re-running the same window produces no duplicates), but it runs as a separate pipeline from scheduled incremental sync, and the two write independently. The `time` field records when the event actually occurred, while incremental sync advances by the event's server-side ingestion time. When both pipelines cover the same late-arriving event, duplicates appear. If an event occurs at `T` but is ingested at `T + N`:
    1. You trigger a backfill whose window is split by event occurrence time and covers `T`. The event is already in historical storage (ingested at `T + N`), so the backfill exports it based on its event time `T`.
    2. After the backfill completes, incremental sync resumes from its previous ingestion-time cursor. When it advances to `T + N`, it exports the same late-arriving event a second time.

  User data — whether written by scheduled incremental sync or backfill — uses upsert by `ssid` with exactly-once semantics, so it is not affected.

  **Solutions**: The two approaches below can be used independently or together.

  **Option 1: Create a deduplication view (recommended as the default query entry point)**

  Create a view in Snowflake that keeps the latest row per event identity:

  ```sql
  CREATE VIEW SENSORSWAVE_EVENTS_DEDUP AS
  SELECT * EXCLUDE (rn)
  FROM (
      SELECT *,
             ROW_NUMBER() OVER (
                 PARTITION BY time, event, distinct_id, trace_id
                 ORDER BY received_at DESC NULLS LAST
             ) AS rn
      FROM SENSORSWAVE_EVENTS
  )
  WHERE rn = 1;
  ```

  **Option 2: Only backfill data older than 7 days**

  The server rejects events that arrive more than 7 days late, so time windows older than 7 days will not receive any further late-arriving events through incremental sync. Limiting the backfill end date to "current date − 7 days" eliminates duplicates at the source. For data within the last 7 days, rely on Option 1 and query through the deduplication view.

## FAQ

**Q: Test Connection fails with "No CREATE TABLE permission". What should I do?**

Verify that you ran `GRANT CREATE TABLE ON SCHEMA ... TO ROLE ...`, and confirm the role used by the pipeline matches the role that was granted the permission.

**Q: The pipeline is "Running". Can I modify the connection settings or event filters?**

Yes. Go to the pipeline detail page and click **Edit** to update the configuration. Changes take effect in the next sync window after the current one completes; frequency changes take effect immediately.

**Q: There are duplicate rows in the events table. How do I handle this?**

Scheduled incremental sync for events uses at-least-once delivery, so duplicates come from two scenarios: (1) an incremental sync task re-writes the same batch on retry after a failure (rare); (2) a backfill window covers a late-arriving event's occurrence time `T` and exports it by event time, and incremental sync then exports it again when it advances to ingestion time `T + N`. There are two ways to handle it:

- **Deduplicate at query time**: Use the deduplication view `SENSORSWAVE_EVENTS_DEDUP` from the "Notes" section above as the default query entry point. Alternatively, deduplicate using `ROW_NUMBER()` on the combination of `time`, `event`, `distinct_id`, and `trace_id`, keeping the record with the latest `received_at`. This covers both scenarios.
- **Avoid at backfill time**: Only backfill data older than "current date − 7 days". The server rejects events arriving more than 7 days late, so this range will not receive any new late-arriving events via incremental sync. This eliminates duplicates from scenario 2.

**Q: How do I query data inside `VARIANT` columns (`properties` and `user_properties`)?**

Use Snowflake's colon (`:`) path syntax to access nested fields, and `::TYPE` for type casting:

```sql
-- Access the page_url field (string type) from properties
SELECT properties:page_url::STRING FROM SENSORSWAVE_EVENTS;

-- Access a numeric field
SELECT properties:item_count::INT FROM SENSORSWAVE_EVENTS;
```

**Q: What timezone is the `time` column in Snowflake?**

The `time` column uses `TIMESTAMP_NTZ` and stores the local time in SENSORSWAVE's server timezone, with no timezone offset. CN clusters typically use `Asia/Shanghai` (UTC+8); overseas clusters typically use `UTC`. To convert to another timezone, adjust the source timezone based on your actual cluster, for example (CN cluster):

```sql
CONVERT_TIMEZONE('Asia/Shanghai', 'America/New_York', time)
```

---

**Last updated**: April 13, 2026