FileLog Migration to Snowflake

Context

FileLog package currently writes integration logs (Yodlee, Plaid, etc.) to S3 as JSON files. This approach has encountered performance limitations and architectural issues that warrant a strategic migration to Snowflake.

Trigger Issue

Sentry Issue: RET-API-SGC - S3 503 Slow Down errors

Statistics (as of 2026-03-16):

1,735 occurrences since 2025-04-18
462 users affected
Latest burst: 20 errors in 3 seconds

Current State: S3-Based FileLog

Architecture

User Request
  ↓
Yodlee API Call
  ↓
Connector::logRequestResponseToFile()
  ↓
FileLog::write(['integration', 'yodlee', $advisor_id, $household_id], ...)
  ↓
FileLog::handleStorageSaving() → dispatch()->afterResponse()
  ↓
FileLog::savePendingLogs() (Laravel termination callback)
  ↓
S3::writeStream() → rightcapital-prd bucket

Current Path Structure:

logs/integration/yodlee/{advisor_id}/{household_id}/{timestamp}_{endpoint}.json
Example: logs/integration/yodlee/16476/320463/20260315-222803_000_providerAccounts_PUT.json

Identified Problems

1. S3 Rate Limiting

AWS S3 Limits (per prefix):

PUT/DELETE: 3,500 req/s
GET/HEAD: 5,500 req/s

Problem: All logs for a household share the same S3 prefix → same partition → shared rate limit

Trigger Pattern:

Micro-burst: 6 concurrent requests in ~100ms
→ Exceeds S3 Token Bucket capacity
→ 503 Slow Down error

2. Poor Queryability

Current State:

Logs stored as individual JSON files in S3
Requires downloading files to query
No indexing or search capabilities
Time-consuming for debugging

Typical Query Process:

# Need to download files manually
aws s3 ls s3://rightcapital-prd/logs/integration/yodlee/16476/320463/
aws s3 cp s3://rightcapital-prd/logs/.../file.json - | jq '.'

3. Error Handling Gaps

Code Issues (FileLog.php:354-374):

foreach (static::$pending_logs as [$file_path, $temp]) {
    rescue(fn () => use_stream(
        $temp,
        fn ($stream) => static::getStorage()->writeStream($file_path, $stream),
    ));
}

Problems:

Silent failure with rescue()
No retry mechanism
No rate limiting between writes
Uses sync queue (blocks process)

4. Maintenance Overhead

Manual log cleanup required
No automatic archival
Difficult to enforce retention policies
S3 costs accumulate over time

Proposed Solution: Migrate to Snowflake

Target Architecture

User Request
  ↓
Yodlee API Call
  ↓
Connector::logRequestResponseToFile()
  ↓
SnowflakeFileLogger::write()
  ↓
Snowflake INSERT (async via Snowpipe or direct)
  ↓
INTEGRATION_LOGS table

Benefits

✅ Performance

No rate limiting issues (Snowflake scales horizontally)
Batch inserts supported
Async ingestion via Snowpipe

✅ Queryability

-- Direct SQL queries instead of S3 downloads
SELECT *
FROM integration_logs
WHERE advisor_id = 16476
  AND household_id = 320463
  AND endpoint LIKE '%providerAccounts%'
  AND timestamp >= '2026-03-15'
ORDER BY timestamp DESC;

✅ Analytics Integration

Already in data warehouse
Join with other tables (advisors, households, accounts)
Use existing BI tools (Looker, Tableau)

✅ Cost Efficiency

Snowflake compression (5-10x better than S3)
Automatic data retention/archival
Pay for compute only when querying

✅ Maintainability

Single source of truth
Schema enforcement
Version control for structure changes

Proposed Schema

CREATE TABLE integration_logs (
    -- Identifiers
    log_id VARCHAR PRIMARY KEY,
    integration_type VARCHAR NOT NULL,  -- 'yodlee', 'plaid', etc.
    advisor_id INTEGER,
    household_id INTEGER,

    -- Request/Response
    endpoint VARCHAR NOT NULL,
    http_method VARCHAR,
    request_url VARCHAR,
    request_headers VARIANT,
    request_body VARIANT,
    response_status INTEGER,
    response_headers VARIANT,
    response_body VARIANT,

    -- Metadata
    timestamp TIMESTAMP_NTZ NOT NULL,
    duration_ms INTEGER,
    error_message VARCHAR,
    trace_id VARCHAR,

    -- Partitioning
    log_date DATE NOT NULL,

    -- Audit
    created_at TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP()
)
CLUSTER BY (log_date, integration_type, advisor_id);

Migration Options

Option 1: Direct Snowflake Insert (Synchronous)

Implementation:

class SnowflakeFileLogger
{
    public function write(array $paths, string $postfix, mixed $data): void
    {
        DB::connection('snowflake')->table('integration_logs')->insert([
            'log_id' => Str::uuid(),
            'integration_type' => $paths[1],  // 'yodlee'
            'advisor_id' => $paths[2],
            'household_id' => $paths[3],
            'endpoint' => $postfix,
            'request_url' => $data['url'],
            'request_method' => $data['method'],
            'request_body' => json_encode($data['body']),
            'response_status' => $data['response']['status'] ?? null,
            'response_body' => json_encode($data['response']['body'] ?? null),
            'timestamp' => now(),
            'log_date' => now()->toDateString(),
        ]);
    }
}

Pros:

Simple implementation
Immediate availability

Cons:

Blocks request termination
Network latency impact

Option 2: Queue-Based Async Insert (Recommended)

Implementation:

class SaveIntegrationLogJob implements ShouldQueue
{
    public $queue = 'low-priority';
    public $tries = 5;
    public $backoff = [1, 5, 10, 30, 60];

    public function handle()
    {
        DB::connection('snowflake')
            ->table('integration_logs')
            ->insert($this->logData);
    }
}

class SnowflakeFileLogger
{
    public function write(array $paths, string $postfix, mixed $data): void
    {
        dispatch(new SaveIntegrationLogJob([
            // ... log data
        ]))->onQueue('low-priority');
    }
}

Pros:

Non-blocking
Built-in retry
Backpressure handling

Cons:

Requires queue infrastructure
Slightly delayed availability

Option 3: Snowpipe + S3 Staging (Hybrid)

Implementation:

// Phase 1: Write to S3 staging (minimal change)
FileLog::write($paths, $postfix, $data);

// Phase 2: Snowpipe auto-ingests from S3
// Snowflake configuration:
CREATE PIPE integration_logs_pipe
AS COPY INTO integration_logs
FROM @integration_logs_stage
FILE_FORMAT = (TYPE = JSON);

Pros:

Minimal code changes
Snowflake handles ingestion
Can keep S3 as backup

Cons:

Still requires S3
Ingestion delay (minutes)
Snowpipe costs

Migration Strategy

Phase 1: Dual-Write (4 weeks)

Goal: Validate Snowflake logging without disrupting existing system

Steps:

Create Snowflake table schema
Implement SnowflakeFileLogger class

Enable dual-write (both S3 and Snowflake)

FileLog::write($paths, $postfix, $data);           // S3
SnowflakeFileLogger::write($paths, $postfix, $data); // Snowflake

Monitor for consistency and errors
Fix any schema/data issues

Success Criteria:

100% of logs written to both systems
Snowflake data matches S3 (spot-check 1000 samples)
No performance regression

Phase 2: Read from Snowflake (2 weeks)

Goal: Validate Snowflake can replace S3 for queries

Steps:

Update debugging tools to query Snowflake
Train team on SQL queries
Monitor query performance
Optimize indexes/clustering if needed

Success Criteria:

All debugging queries use Snowflake
Query response time < 5 seconds for typical cases
No S3 downloads needed for log analysis

Phase 3: Snowflake-Only (1 week)

Goal: Remove S3 logging dependency

Steps:

Disable S3 writes in FileLog
Archive existing S3 logs to Glacier
Remove FileLog S3 code (after 1 month buffer)

Success Criteria:

No S3 writes for integration logs
No Sentry errors related to S3 throttling
Team comfortable with Snowflake-only logging

Phase 4: Cleanup (1 week)

Goal: Remove legacy code and optimize

Steps:

Delete FileLog package (or mark deprecated)
Update documentation
Set up Snowflake data retention policies
Create monitoring dashboards

Rollback Plan

At any phase, rollback is simple:

// Disable Snowflake write
SnowflakeFileLogger::$enabled = false;

// Continue with S3 only
FileLog::write($paths, $postfix, $data);

Rollback Triggers:

Snowflake unavailability > 5 minutes
Data loss or corruption detected
Performance regression > 2x baseline

Quick Wins (Before Migration)

While planning the Snowflake migration, implement these immediate fixes:

1. Time-Based S3 Prefix Sharding ⭐ Immediate

// Connector.php - Change paths structure
$now = CarbonImmutable::now();
$paths = [
    'integration',
    self::getVendorName(),
    $now->format('Y/m/d/H'),  // Hourly partitioning
    $advisor->id ?? 'unknown',
    $household->id ?? 'unknown',
];

Impact: Eliminates S3 throttling immediately

2. AWS SDK Retry Configuration

's3' => [
    'options' => [
        'http' => [
            'retries' => [
                'mode' => 'adaptive',
                'max_attempts' => 10,
            ],
        ],
    ],
],

Impact: Handles transient S3 errors gracefully

3. Frontend Debounce

const handleRefresh = useDebouncedCallback(
  async () => {
    await refreshYodleeAccount();
  },
  2000,
  { leading: true, trailing: false }
);

Impact: Reduces micro-bursts from user clicks

Cost Analysis

Current S3 Costs (Estimated)

Assumptions:

~1M integration logs per month
Average file size: 10KB
Storage: 10GB/month cumulative

Costs:

S3 Storage: $0.023/GB × 10GB = $0.23/month
S3 PUT: $0.005/1000 × 1M = $5.00/month
S3 GET: $0.0004/1000 × 100K = $0.04/month
Data Transfer: negligible (same region)

Total: ~$5.27/month (~$63/year)

Projected Snowflake Costs

Assumptions:

1M rows per month
Compressed size: 2GB/month (5x compression)
Query frequency: 100 queries/day

Costs:

Storage: $40/TB/month × 0.002TB = $0.08/month
Compute: ~$2/hour × 1 hour/month = $2.00/month
(assuming XS warehouse for queries)

Total: ~$2.08/month (~$25/year)

Savings: ~$38/year + operational efficiency gains

Note: Cost savings are minimal, but operational benefits are significant:

Faster debugging (saves engineer time)
Better analytics (business insights)
Reduced complexity (maintenance savings)

Risks and Mitigations

Risk	Likelihood	Impact	Mitigation
Snowflake downtime	Low	High	Keep S3 dual-write for 1 month
Data loss during migration	Medium	High	Validate with checksums, dual-write period
Query performance issues	Medium	Medium	Pre-optimize schema, add clustering
Team learning curve	High	Low	Training sessions, SQL examples doc
Schema evolution needs	Medium	Medium	Use VARIANT for flexible fields

Success Metrics

Migration Success

✅ 100% of logs in Snowflake
✅ Zero S3 throttling errors
✅ <5s p95 query latency
✅ Team adoption (>80% use Snowflake for debugging)

Operational Success

✅ <1 incident per month related to logging
✅ Log retention policy enforced automatically
✅ Integration log analytics dashboard in production

Timeline

Phase	Duration	Estimated Start	Owner
Quick Wins (S3 fixes)	1 week	2026-03-18	TBD
Phase 1: Dual-Write	4 weeks	2026-04-01	TBD
Phase 2: Read from Snowflake	2 weeks	2026-05-01	TBD
Phase 3: Snowflake-Only	1 week	2026-05-15	TBD
Phase 4: Cleanup	1 week	2026-05-22	TBD

Total Estimated Duration: ~8 weeks

Decision Status

Status: Proposed
Decision Date: TBD
Review Date: 2026-03-25 (engineering team meeting)
Next Steps:
1. Present migration plan to team
2. Get budget approval
3. Implement quick wins (S3 fixes)
4. Begin Phase 1 (dual-write) development

Last Updated: 2026-03-16 Document Owner: TBD

FileLog Migration to Snowflake

Context

Trigger Issue

Current State: S3-Based FileLog

Architecture

Identified Problems

1. S3 Rate Limiting

2. Poor Queryability

3. Error Handling Gaps

4. Maintenance Overhead

Proposed Solution: Migrate to Snowflake

Target Architecture

Benefits

✅ Performance

✅ Queryability

✅ Analytics Integration

✅ Cost Efficiency

✅ Maintainability

Proposed Schema

Migration Options

Option 1: Direct Snowflake Insert (Synchronous)

Option 2: Queue-Based Async Insert (Recommended)

Option 3: Snowpipe + S3 Staging (Hybrid)

Migration Strategy

Phase 1: Dual-Write (4 weeks)

Phase 2: Read from Snowflake (2 weeks)

Phase 3: Snowflake-Only (1 week)

Phase 4: Cleanup (1 week)

Rollback Plan

Quick Wins (Before Migration)

1. Time-Based S3 Prefix Sharding ⭐ Immediate

2. AWS SDK Retry Configuration

3. Frontend Debounce

Cost Analysis

Current S3 Costs (Estimated)

Projected Snowflake Costs

Risks and Mitigations

Success Metrics

Migration Success

Operational Success

Timeline

Related Documentation

Decision Status