FileLog Migration to Snowflake
Context
Section titled “Context”FileLog package currently writes integration logs (Yodlee, Plaid, etc.) to S3 as JSON files. This approach has encountered performance limitations and architectural issues that warrant a strategic migration to Snowflake.
Trigger Issue
Section titled “Trigger Issue”Sentry Issue: RET-API-SGC - S3 503 Slow Down errors
Statistics (as of 2026-03-16):
- 1,735 occurrences since 2025-04-18
- 462 users affected
- Latest burst: 20 errors in 3 seconds
Current State: S3-Based FileLog
Section titled “Current State: S3-Based FileLog”Architecture
Section titled “Architecture”User Request ↓Yodlee API Call ↓Connector::logRequestResponseToFile() ↓FileLog::write(['integration', 'yodlee', $advisor_id, $household_id], ...) ↓FileLog::handleStorageSaving() → dispatch()->afterResponse() ↓FileLog::savePendingLogs() (Laravel termination callback) ↓S3::writeStream() → rightcapital-prd bucketCurrent Path Structure:
logs/integration/yodlee/{advisor_id}/{household_id}/{timestamp}_{endpoint}.jsonExample: logs/integration/yodlee/16476/320463/20260315-222803_000_providerAccounts_PUT.jsonIdentified Problems
Section titled “Identified Problems”1. S3 Rate Limiting
Section titled “1. S3 Rate Limiting”AWS S3 Limits (per prefix):
- PUT/DELETE: 3,500 req/s
- GET/HEAD: 5,500 req/s
Problem: All logs for a household share the same S3 prefix → same partition → shared rate limit
Trigger Pattern:
Micro-burst: 6 concurrent requests in ~100ms→ Exceeds S3 Token Bucket capacity→ 503 Slow Down error2. Poor Queryability
Section titled “2. Poor Queryability”Current State:
- Logs stored as individual JSON files in S3
- Requires downloading files to query
- No indexing or search capabilities
- Time-consuming for debugging
Typical Query Process:
# Need to download files manuallyaws s3 ls s3://rightcapital-prd/logs/integration/yodlee/16476/320463/aws s3 cp s3://rightcapital-prd/logs/.../file.json - | jq '.'3. Error Handling Gaps
Section titled “3. Error Handling Gaps”Code Issues (FileLog.php:354-374):
foreach (static::$pending_logs as [$file_path, $temp]) { rescue(fn () => use_stream( $temp, fn ($stream) => static::getStorage()->writeStream($file_path, $stream), ));}Problems:
- Silent failure with
rescue() - No retry mechanism
- No rate limiting between writes
- Uses
syncqueue (blocks process)
4. Maintenance Overhead
Section titled “4. Maintenance Overhead”- Manual log cleanup required
- No automatic archival
- Difficult to enforce retention policies
- S3 costs accumulate over time
Proposed Solution: Migrate to Snowflake
Section titled “Proposed Solution: Migrate to Snowflake”Target Architecture
Section titled “Target Architecture”User Request ↓Yodlee API Call ↓Connector::logRequestResponseToFile() ↓SnowflakeFileLogger::write() ↓Snowflake INSERT (async via Snowpipe or direct) ↓INTEGRATION_LOGS tableBenefits
Section titled “Benefits”✅ Performance
Section titled “✅ Performance”- No rate limiting issues (Snowflake scales horizontally)
- Batch inserts supported
- Async ingestion via Snowpipe
✅ Queryability
Section titled “✅ Queryability”-- Direct SQL queries instead of S3 downloadsSELECT *FROM integration_logsWHERE advisor_id = 16476 AND household_id = 320463 AND endpoint LIKE '%providerAccounts%' AND timestamp >= '2026-03-15'ORDER BY timestamp DESC;✅ Analytics Integration
Section titled “✅ Analytics Integration”- Already in data warehouse
- Join with other tables (advisors, households, accounts)
- Use existing BI tools (Looker, Tableau)
✅ Cost Efficiency
Section titled “✅ Cost Efficiency”- Snowflake compression (5-10x better than S3)
- Automatic data retention/archival
- Pay for compute only when querying
✅ Maintainability
Section titled “✅ Maintainability”- Single source of truth
- Schema enforcement
- Version control for structure changes
Proposed Schema
Section titled “Proposed Schema”CREATE TABLE integration_logs ( -- Identifiers log_id VARCHAR PRIMARY KEY, integration_type VARCHAR NOT NULL, -- 'yodlee', 'plaid', etc. advisor_id INTEGER, household_id INTEGER,
-- Request/Response endpoint VARCHAR NOT NULL, http_method VARCHAR, request_url VARCHAR, request_headers VARIANT, request_body VARIANT, response_status INTEGER, response_headers VARIANT, response_body VARIANT,
-- Metadata timestamp TIMESTAMP_NTZ NOT NULL, duration_ms INTEGER, error_message VARCHAR, trace_id VARCHAR,
-- Partitioning log_date DATE NOT NULL,
-- Audit created_at TIMESTAMP_NTZ DEFAULT CURRENT_TIMESTAMP())CLUSTER BY (log_date, integration_type, advisor_id);Migration Options
Section titled “Migration Options”Option 1: Direct Snowflake Insert (Synchronous)
Section titled “Option 1: Direct Snowflake Insert (Synchronous)”Implementation:
class SnowflakeFileLogger{ public function write(array $paths, string $postfix, mixed $data): void { DB::connection('snowflake')->table('integration_logs')->insert([ 'log_id' => Str::uuid(), 'integration_type' => $paths[1], // 'yodlee' 'advisor_id' => $paths[2], 'household_id' => $paths[3], 'endpoint' => $postfix, 'request_url' => $data['url'], 'request_method' => $data['method'], 'request_body' => json_encode($data['body']), 'response_status' => $data['response']['status'] ?? null, 'response_body' => json_encode($data['response']['body'] ?? null), 'timestamp' => now(), 'log_date' => now()->toDateString(), ]); }}Pros:
- Simple implementation
- Immediate availability
Cons:
- Blocks request termination
- Network latency impact
Option 2: Queue-Based Async Insert (Recommended)
Section titled “Option 2: Queue-Based Async Insert (Recommended)”Implementation:
class SaveIntegrationLogJob implements ShouldQueue{ public $queue = 'low-priority'; public $tries = 5; public $backoff = [1, 5, 10, 30, 60];
public function handle() { DB::connection('snowflake') ->table('integration_logs') ->insert($this->logData); }}
class SnowflakeFileLogger{ public function write(array $paths, string $postfix, mixed $data): void { dispatch(new SaveIntegrationLogJob([ // ... log data ]))->onQueue('low-priority'); }}Pros:
- Non-blocking
- Built-in retry
- Backpressure handling
Cons:
- Requires queue infrastructure
- Slightly delayed availability
Option 3: Snowpipe + S3 Staging (Hybrid)
Section titled “Option 3: Snowpipe + S3 Staging (Hybrid)”Implementation:
// Phase 1: Write to S3 staging (minimal change)FileLog::write($paths, $postfix, $data);
// Phase 2: Snowpipe auto-ingests from S3// Snowflake configuration:CREATE PIPE integration_logs_pipeAS COPY INTO integration_logsFROM @integration_logs_stageFILE_FORMAT = (TYPE = JSON);Pros:
- Minimal code changes
- Snowflake handles ingestion
- Can keep S3 as backup
Cons:
- Still requires S3
- Ingestion delay (minutes)
- Snowpipe costs
Migration Strategy
Section titled “Migration Strategy”Phase 1: Dual-Write (4 weeks)
Section titled “Phase 1: Dual-Write (4 weeks)”Goal: Validate Snowflake logging without disrupting existing system
Steps:
- Create Snowflake table schema
- Implement
SnowflakeFileLoggerclass - Enable dual-write (both S3 and Snowflake)
FileLog::write($paths, $postfix, $data); // S3SnowflakeFileLogger::write($paths, $postfix, $data); // Snowflake
- Monitor for consistency and errors
- Fix any schema/data issues
Success Criteria:
- 100% of logs written to both systems
- Snowflake data matches S3 (spot-check 1000 samples)
- No performance regression
Phase 2: Read from Snowflake (2 weeks)
Section titled “Phase 2: Read from Snowflake (2 weeks)”Goal: Validate Snowflake can replace S3 for queries
Steps:
- Update debugging tools to query Snowflake
- Train team on SQL queries
- Monitor query performance
- Optimize indexes/clustering if needed
Success Criteria:
- All debugging queries use Snowflake
- Query response time < 5 seconds for typical cases
- No S3 downloads needed for log analysis
Phase 3: Snowflake-Only (1 week)
Section titled “Phase 3: Snowflake-Only (1 week)”Goal: Remove S3 logging dependency
Steps:
- Disable S3 writes in FileLog
- Archive existing S3 logs to Glacier
- Remove FileLog S3 code (after 1 month buffer)
Success Criteria:
- No S3 writes for integration logs
- No Sentry errors related to S3 throttling
- Team comfortable with Snowflake-only logging
Phase 4: Cleanup (1 week)
Section titled “Phase 4: Cleanup (1 week)”Goal: Remove legacy code and optimize
Steps:
- Delete FileLog package (or mark deprecated)
- Update documentation
- Set up Snowflake data retention policies
- Create monitoring dashboards
Rollback Plan
Section titled “Rollback Plan”At any phase, rollback is simple:
// Disable Snowflake writeSnowflakeFileLogger::$enabled = false;
// Continue with S3 onlyFileLog::write($paths, $postfix, $data);Rollback Triggers:
- Snowflake unavailability > 5 minutes
- Data loss or corruption detected
- Performance regression > 2x baseline
Quick Wins (Before Migration)
Section titled “Quick Wins (Before Migration)”While planning the Snowflake migration, implement these immediate fixes:
1. Time-Based S3 Prefix Sharding ⭐ Immediate
Section titled “1. Time-Based S3 Prefix Sharding ⭐ Immediate”// Connector.php - Change paths structure$now = CarbonImmutable::now();$paths = [ 'integration', self::getVendorName(), $now->format('Y/m/d/H'), // Hourly partitioning $advisor->id ?? 'unknown', $household->id ?? 'unknown',];Impact: Eliminates S3 throttling immediately
2. AWS SDK Retry Configuration
Section titled “2. AWS SDK Retry Configuration”'s3' => [ 'options' => [ 'http' => [ 'retries' => [ 'mode' => 'adaptive', 'max_attempts' => 10, ], ], ],],Impact: Handles transient S3 errors gracefully
3. Frontend Debounce
Section titled “3. Frontend Debounce”const handleRefresh = useDebouncedCallback( async () => { await refreshYodleeAccount(); }, 2000, { leading: true, trailing: false });Impact: Reduces micro-bursts from user clicks
Cost Analysis
Section titled “Cost Analysis”Current S3 Costs (Estimated)
Section titled “Current S3 Costs (Estimated)”Assumptions:
- ~1M integration logs per month
- Average file size: 10KB
- Storage: 10GB/month cumulative
Costs:
S3 Storage: $0.023/GB × 10GB = $0.23/monthS3 PUT: $0.005/1000 × 1M = $5.00/monthS3 GET: $0.0004/1000 × 100K = $0.04/monthData Transfer: negligible (same region)
Total: ~$5.27/month (~$63/year)Projected Snowflake Costs
Section titled “Projected Snowflake Costs”Assumptions:
- 1M rows per month
- Compressed size: 2GB/month (5x compression)
- Query frequency: 100 queries/day
Costs:
Storage: $40/TB/month × 0.002TB = $0.08/monthCompute: ~$2/hour × 1 hour/month = $2.00/month(assuming XS warehouse for queries)
Total: ~$2.08/month (~$25/year)Savings: ~$38/year + operational efficiency gains
Note: Cost savings are minimal, but operational benefits are significant:
- Faster debugging (saves engineer time)
- Better analytics (business insights)
- Reduced complexity (maintenance savings)
Risks and Mitigations
Section titled “Risks and Mitigations”| Risk | Likelihood | Impact | Mitigation |
|---|---|---|---|
| Snowflake downtime | Low | High | Keep S3 dual-write for 1 month |
| Data loss during migration | Medium | High | Validate with checksums, dual-write period |
| Query performance issues | Medium | Medium | Pre-optimize schema, add clustering |
| Team learning curve | High | Low | Training sessions, SQL examples doc |
| Schema evolution needs | Medium | Medium | Use VARIANT for flexible fields |
Success Metrics
Section titled “Success Metrics”Migration Success
Section titled “Migration Success”- ✅ 100% of logs in Snowflake
- ✅ Zero S3 throttling errors
- ✅ <5s p95 query latency
- ✅ Team adoption (>80% use Snowflake for debugging)
Operational Success
Section titled “Operational Success”- ✅ <1 incident per month related to logging
- ✅ Log retention policy enforced automatically
- ✅ Integration log analytics dashboard in production
Timeline
Section titled “Timeline”| Phase | Duration | Estimated Start | Owner |
|---|---|---|---|
| Quick Wins (S3 fixes) | 1 week | 2026-03-18 | TBD |
| Phase 1: Dual-Write | 4 weeks | 2026-04-01 | TBD |
| Phase 2: Read from Snowflake | 2 weeks | 2026-05-01 | TBD |
| Phase 3: Snowflake-Only | 1 week | 2026-05-15 | TBD |
| Phase 4: Cleanup | 1 week | 2026-05-22 | TBD |
Total Estimated Duration: ~8 weeks
Related Documentation
Section titled “Related Documentation”- RET-API-SGC Sentry Issue
- FileLog Source Code
- Yodlee Connector
- AWS S3 Performance Guidelines
- Snowflake Data Loading Best Practices
Decision Status
Section titled “Decision Status”- Status: Proposed
- Decision Date: TBD
- Review Date: 2026-03-25 (engineering team meeting)
- Next Steps:
- Present migration plan to team
- Get budget approval
- Implement quick wins (S3 fixes)
- Begin Phase 1 (dual-write) development
Last Updated: 2026-03-16 Document Owner: TBD