Skip to content

Collector

Collector is a standalone service that downloads data files from Vendor SFTP servers, processes them, and uploads to S3 for consumption by Retail API.

Repository: gitlab.rightcapital.io/integrations/collector

flowchart LR
    subgraph Vendor
        SFTP[Vendor SFTP]
    end

    subgraph Collector
        D[Downloader]
        P[Processor]
        U[Uploader]
        F[Finalizer]
    end

    subgraph AWS
        S3[(S3 Bucket)]
    end

    SFTP --> D
    D --> P
    P --> U
    U --> S3
    S3 --> F

Collector processes files through four sequential stages:

Downloads files from Vendor SFTP server to local temp directory.

  • Connects to Vendor SFTP using configured credentials
  • Downloads files matching expected patterns
  • Filters by date/timestamp to get only new files
  • Stores in local temp directory

Prepares downloaded files for upload.

  • Decompresses archives (ZIP, GZ, etc.)
  • Validates file format and content
  • Removes original compressed files after extraction
  • Organizes files by type

Uploads processed files to S3 and cleans up local storage.

  • Uploads files to S3 with structured path
  • Updates timestamp.txt to track last processed time
  • Clears temp directory after successful upload

Creates LATEST symlinks for easy consumption.

  • Generates LATEST files pointing to most recent data
  • These LATEST files are what Retail API reads
  • Enables consistent file paths regardless of date
s3://bucket/vendor-name/
├── rep_code_1/
│ ├── 2024-01-15/
│ │ ├── accounts.csv
│ │ ├── positions.csv
│ │ └── securities.csv
│ ├── 2024-01-16/
│ │ └── ...
│ ├── LATEST_accounts.csv # Symlink to latest
│ ├── LATEST_positions.csv
│ └── LATEST_securities.csv
├── rep_code_2/
│ └── ...
└── timestamp.txt
vendor/{rep_code}/{date}/{filename}
  • vendor: Vendor identifier (e.g., schwab, fidelity)
  • rep_code: Advisor’s unique identifier in Vendor system
  • date: File date in YYYY-MM-DD format
  • filename: Original or normalized file name

Note: Some vendors don’t use Rep Code segmentation - all advisors’ data is in a single file.

Common file types processed by Collector:

File TypeContentTypical Format
AccountsAccount metadata (number, type, status)CSV
PositionsHoldings (security, quantity, value)CSV
SecuritiesSecurity master dataCSV
Tax LotsCost basis informationCSV
TransactionsTrade historyCSV

Each Vendor has a configuration defining:

  • SFTP connection details (host, port, credentials)
  • File patterns to download
  • Processing rules (decompression, parsing)
  • S3 destination paths

Key metrics to monitor:

  • File arrival time: Did Vendor deliver files on schedule?
  • Download success: Any SFTP connection failures?
  • Processing errors: File format issues?
  • Upload completion: All files in S3?
IssueSymptomResolution
Missing filesLATEST not updatedCheck Vendor SFTP, may be delayed
Auth failureDownload failsVerify SFTP credentials, check if rotated
Corrupt fileProcessing errorContact Vendor, request re-send
Wrong formatParsing failsVendor may have changed format