Collector
Collector is a standalone service that downloads data files from Vendor SFTP servers, processes them, and uploads to S3 for consumption by Retail API.
Repository: gitlab.rightcapital.io/integrations/collector
Architecture
Section titled “Architecture”flowchart LR
subgraph Vendor
SFTP[Vendor SFTP]
end
subgraph Collector
D[Downloader]
P[Processor]
U[Uploader]
F[Finalizer]
end
subgraph AWS
S3[(S3 Bucket)]
end
SFTP --> D
D --> P
P --> U
U --> S3
S3 --> F
Processing Stages
Section titled “Processing Stages”Collector processes files through four sequential stages:
1. Downloader
Section titled “1. Downloader”Downloads files from Vendor SFTP server to local temp directory.
- Connects to Vendor SFTP using configured credentials
- Downloads files matching expected patterns
- Filters by date/timestamp to get only new files
- Stores in local temp directory
2. Processor
Section titled “2. Processor”Prepares downloaded files for upload.
- Decompresses archives (ZIP, GZ, etc.)
- Validates file format and content
- Removes original compressed files after extraction
- Organizes files by type
3. Uploader
Section titled “3. Uploader”Uploads processed files to S3 and cleans up local storage.
- Uploads files to S3 with structured path
- Updates
timestamp.txtto track last processed time - Clears temp directory after successful upload
4. Finalizer
Section titled “4. Finalizer”Creates LATEST symlinks for easy consumption.
- Generates
LATESTfiles pointing to most recent data - These LATEST files are what Retail API reads
- Enables consistent file paths regardless of date
S3 File Structure
Section titled “S3 File Structure”s3://bucket/vendor-name/├── rep_code_1/│ ├── 2024-01-15/│ │ ├── accounts.csv│ │ ├── positions.csv│ │ └── securities.csv│ ├── 2024-01-16/│ │ └── ...│ ├── LATEST_accounts.csv # Symlink to latest│ ├── LATEST_positions.csv│ └── LATEST_securities.csv├── rep_code_2/│ └── ...└── timestamp.txtPath Convention
Section titled “Path Convention”vendor/{rep_code}/{date}/{filename}- vendor: Vendor identifier (e.g.,
schwab,fidelity) - rep_code: Advisor’s unique identifier in Vendor system
- date: File date in
YYYY-MM-DDformat - filename: Original or normalized file name
Note: Some vendors don’t use Rep Code segmentation - all advisors’ data is in a single file.
File Types
Section titled “File Types”Common file types processed by Collector:
| File Type | Content | Typical Format |
|---|---|---|
| Accounts | Account metadata (number, type, status) | CSV |
| Positions | Holdings (security, quantity, value) | CSV |
| Securities | Security master data | CSV |
| Tax Lots | Cost basis information | CSV |
| Transactions | Trade history | CSV |
Configuration
Section titled “Configuration”Each Vendor has a configuration defining:
- SFTP connection details (host, port, credentials)
- File patterns to download
- Processing rules (decompression, parsing)
- S3 destination paths
Monitoring
Section titled “Monitoring”Key metrics to monitor:
- File arrival time: Did Vendor deliver files on schedule?
- Download success: Any SFTP connection failures?
- Processing errors: File format issues?
- Upload completion: All files in S3?
Common Issues
Section titled “Common Issues”| Issue | Symptom | Resolution |
|---|---|---|
| Missing files | LATEST not updated | Check Vendor SFTP, may be delayed |
| Auth failure | Download fails | Verify SFTP credentials, check if rotated |
| Corrupt file | Processing error | Contact Vendor, request re-send |
| Wrong format | Parsing fails | Vendor may have changed format |
Related
Section titled “Related”- File-based Integrations - How Retail API consumes these files
- Nightly Sync - Scheduling and monitoring