Data Transfers¶
Moving data between CANFAR storage systems, external sources, and your local computer.
🎯 Transfer Methods Overview
Efficient data movement strategies:
- Web Interfaces: Simple uploads and downloads for small files
- Command-Line Tools: Efficient transfers for large datasets
- Automated Workflows: Scripted transfers and synchronisation
- Performance Optimisation: Choosing the right method for your data size
Efficient data transfer is essential for astronomy workflows. CANFAR provides multiple transfer methods optimised for different scenarios, from small file uploads to large dataset synchronisation.
🔄 Transfer Overview¶
Transfer Types by Method¶
Method | Best For | Speed | Complexity | Interactive | Automated |
---|---|---|---|---|---|
Web Upload/Download | Small files (<1GB) | Slow | Simple | ✅ | ❌ |
Direct URLs | Medium files, scripting | Medium | Simple | ⚠️ | ✅ |
VOSpace CLI | All sizes, Vault access | Medium | Medium | ✅ | ✅ |
SSHFS Mount | Local file operations | Medium | Medium | ✅ | ⚠️ |
rsync via SSHFS | Large datasets, sync | Fast | Advanced | ⚠️ | ✅ |
Storage System Access¶
Source → Destination | Method | Command Example |
---|---|---|
Local → ARC Projects | SSHFS, Direct URL, VOSpace | vcp file.fits vos:/arc:projects/[project]/ |
Local → Vault | VOSpace CLI, Web | vcp file.fits vos:[user]/ |
Local → Scratch | Only during sessions | cp file.fits /scratch/ (within session) |
ARC → Vault | VOSpace CLI | vcp /arc/projects/[project]/file.fits vos:[user]/ |
Vault → ARC | VOSpace CLI | vcp vos:[user]/file.fits /arc/projects/[project]/ |
Scratch ↔ ARC | Direct copy | cp /scratch/file.fits /arc/projects/[project]/ |
📤 Upload Methods¶
Small Files (<1GB): Web Interface¶
ARC Projects and Home¶
- Navigate to storage: ARC File Manager
- Select destination: Choose your home or project directory
- Upload files: Click "Add" → "Upload Files"
- Select files: Choose files from your computer
- Confirm upload: Click "Upload" then "OK"
Note on a notebook session you can also use the JupyterLab Upload button.
Vault (VOSpace)¶
- Navigate to Vault: VOSpace File Manager
- Select destination: Browse to your space
- Upload files: Same process as ARC storage
- Set permissions: Right-click → Properties to set sharing permissions
Medium Files (1-100GB): Command Line¶
Using Direct URLs (ARC only)¶
# Authenticate first
cadc-get-cert -u [user]
# Upload to ARC Home
curl -E ~/.ssl/cadcproxy.pem \
-T myfile.fits \
https://ws-uv.canfar.net/arc/files/home/[user]/myfile.fits
# Upload to ARC Projects
curl -E ~/.ssl/cadcproxy.pem \
-T myfile.fits \
https://ws-uv.canfar.net/arc/files/projects/[project]/myfile.fits
Using VOSpace CLI¶
# Install VOS tools (if not already available)
pip install vos
# Authenticate
cadc-get-cert -u [user]
# Upload to Vault
vcp myfile.fits vos:[user]/data/
# Upload to ARC via VOSpace API
vcp myfile.fits arc:projects/[project]/data/
# Upload with progress monitoring
vcp --verbose myfile.fits vos:[user]/large_files/
Large Files (>100GB): Advanced Methods¶
SSHFS Mount + rsync¶
# 1. Mount CANFAR storage locally
mkdir ~/canfar_mount
sshfs -p 64022 [user]@ws-uv.canfar.net:/ ~/canfar_mount
# 2. Sync large datasets with rsync
rsync -avzP --partial \
./large_dataset/ \
~/canfar_mount/arc/projects/[project]/data/
# 3. Unmount when complete
umount ~/canfar_mount
VOSpace Bulk Transfer¶
# Sync entire directories
vsync ./local_data/ vos:[user]/backup/
# Parallel transfers (faster for many files)
vcp --nstreams=4 large_file.tar vos:[user]/archives/
📥 Download Methods¶
From ARC Storage¶
Web Interface¶
- Navigate: ARC File Manager
- Select files: Check boxes next to desired files
- Download options:
- ZIP: Single archive (recommended for multiple files)
- URL List: Generate download links for scripting
- HTML List: Individual download links
Command Line¶
# Direct URL download
curl -E ~/.ssl/cadcproxy.pem \
https://ws-uv.canfar.net/arc/files/home/[user]/myfile.fits \
-o myfile.fits
# Via VOSpace API
vcp arc:home/[user]/myfile.fits ./
# Multiple files with wildcards
vcp "arc:projects/[project]/data/*.fits" ./local_data/
From Vault (VOSpace)¶
Command Line¶
# Single file
vcp vos:[user]/data.fits ./
# Directory with all contents
vcp vos:[user]/survey_data/ ./local_survey/
Python API¶
import vos
client = vos.Client()
# Download single file
client.copy("vos:[user]/data.fits", "./local_data.fits")
# Download with progress callback
def progress_callback(bytes_transferred, total_bytes):
percent = (bytes_transferred / total_bytes) * 100
print(f"Progress: {percent:.1f}%")
client.copy("vos:[user]/large_file.fits",
"./large_file.fits",
callback=progress_callback)
🔄 Inter-Storage Transfers¶
Moving Data Between Storage Systems¶
Scratch to ARC (Within Sessions)¶
# Process data in scratch for speed
cp /arc/projects/[project]/raw_data.fits /scratch/
python reduce_data.py /scratch/raw_data.fits
# Save results to permanent storage
cp /scratch/processed_data.fits /arc/projects/[project]/results/
cp /scratch/analysis_plots/ /arc/projects/[project]/figures/
ARC to Vault (Archival)¶
# Archive completed project results
vcp /arc/projects/[project]/final_results/ vos:[user]/archives/project2024/
Vault to ARC (Project Setup)¶
# Import archived data for new analysis
vcp vos:shared_project/calibrated_data/ /arc/projects/[project]/data/
# Import specific datasets
vcp "vos:public_surveys/gaia_dr3/*.fits" /arc/projects/[project]/catalogues/
Automated Workflow Example¶
#!/bin/bash
# Complete data processing workflow
set -e # Exit on error
PROJECT_DIR="/arc/projects/[project]"
SCRATCH_DIR="/scratch"
echo "Starting data processing pipeline..."
# 1. Download raw data from Vault to scratch
echo "Downloading raw data..."
vcp vos:[user]/raw_observations/obs_*.fits ${SCRATCH_DIR}/
# 2. Process data in scratch (fastest storage)
echo "Processing data..."
cd ${SCRATCH_DIR}
for file in obs_*.fits; do
python calibrate.py "$file" "cal_${file}"
done
# 3. Save intermediate results to ARC
echo "Saving calibrated data..."
mkdir -p ${PROJECT_DIR}/calibrated/
cp cal_*.fits ${PROJECT_DIR}/calibrated/
# 4. Further analysis
echo "Running analysis..."
python analyze_all.py ${PROJECT_DIR}/calibrated/ > analysis_results.txt
# 5. Save final results to ARC and archive to Vault
echo "Saving final results..."
cp analysis_results.txt ${PROJECT_DIR}/results/
cp final_plots/*.png ${PROJECT_DIR}/figures/
# Archive to Vault
vcp ${PROJECT_DIR}/results/ vos:[user]/completed_projects/$(date +%Y%m%d)/
echo "Pipeline completed successfully!"
📊 Performance Optimization¶
Transfer Speed Optimization¶
For Many Small Files¶
# Bundle small files into archives
tar -czf analysis_scripts.tar.gz scripts/
vcp analysis_scripts.tar.gz vos:[user]/code/
# Use directory sync instead of individual copies
vsync --nstreams=4 ./many_small_files/ vos:[user]/collection/
Network Performance Tips¶
Optimal Transfer Times¶
- Best performance: Off-peak hours (evenings, weekends)
- Avoid: Peak research hours (9 AM - 5 PM Pacific)
Connection Optimization¶
# Check network speed to CANFAR
ping ws-uv.canfar.net
# Test transfer speed with small file
time vcp test_file.fits vos:[user]/speed_test/
🚨 Error Handling and Recovery¶
Common Transfer Issues¶
Authentication Errors¶
# Certificate expired
ERROR:: Expired cert. Update by running cadc-get-cert
# Solution: Refresh certificate
cadc-get-cert -u [user]
# Check certificate validity
cadc-get-cert --days-valid
Network Timeouts¶
# Retry with exponential backoff
for i in {1..3}; do
vcp file.fits vos:[user]/ && break
sleep $((2**i))
done
Robust Transfer Script¶
#!/usr/bin/env python
"""
Robust file transfer with retry logic
"""
import vos
import time
import sys
from pathlib import Path
def robust_transfer(source, destination, max_retries=3):
"""Transfer file with retry logic"""
client = vos.Client()
for attempt in range(max_retries):
try:
print(f"Transfer attempt {attempt + 1}: {source} → {destination}")
client.copy(source, destination)
print(f"✓ Transfer successful")
return True
except Exception as e:
print(f"✗ Attempt {attempt + 1} failed: {e}")
if attempt < max_retries - 1:
wait_time = 2 ** attempt # Exponential backoff
print(f"Waiting {wait_time} seconds before retry...")
time.sleep(wait_time)
else:
print(f"Transfer failed after {max_retries} attempts")
return False
# Usage
if __name__ == "__main__":
if len(sys.argv) != 3:
print("Usage: python robust_transfer.py <source> <destination>")
sys.exit(1)
source, destination = sys.argv[1], sys.argv[2]
success = robust_transfer(source, destination)
sys.exit(0 if success else 1)
📋 Transfer Checklists¶
Pre-Transfer Checklist¶
- Authentication: Valid CADC certificate (
cadc-get-cert
) - Permissions: Write access to destination directory
- Space: Sufficient quota in destination storage
- Network: Stable connection for large transfers
- Backup: Important data backed up before moving
Post-Transfer Verification¶
# Verify file integrity
vls -l vos:[user]/transferred_file.fits # Check size and timestamp
# Compare checksums (if available)
vcp --head vos:[user]/data.fits | grep MD5
# Test file readability
python -c "from astropy.io import fits; fits.open('test_file.fits')"
Transfer Planning Template¶
## Transfer Plan: [Project Name]
**Data Description**:
- Size: ___GB
- File count: ___
- Type: Raw/Processed/Results
**Source**: _______________
**Destination**: ___________
**Method**: _______________
**Timeline**:
- Start: ____________
- Estimated completion: ___________
**Verification**:
- [ ] File count matches
- [ ] Total size matches
- [ ] Sample files readable
- [ ] Permissions set correctly
**Backup**: _______________
🔗 Integration Examples¶
Jupyter Notebook Upload¶
Within a CANFAR Jupyter session:
# Upload files using the Jupyter interface
# 1. Click the "Upload" button in file browser
# 2. Select files from your computer
# 3. Files appear in current directory
# Move uploaded files to appropriate storage
import shutil
shutil.move('uploaded_data.fits', '/arc/projects/[project]/data/')
# Or copy to scratch for processing
shutil.copy('/arc/projects/[project]/data.fits', '/scratch/')
Batch Job Data Staging¶
#!/bin/bash
# Batch job with data staging
# Download input data
vcp vos:project/input_data.tar.gz /scratch/
cd /scratch
tar -xzf input_data.tar.gz
# Process data
python analysis.py input_data/
# Upload results
tar -czf results_$(date +%Y%m%d).tar.gz results/
vcp results_*.tar.gz vos:[user]/job_outputs/
# Cleanup
rm -rf /scratch/*
External Data Import¶
# Download from astronomical archives
wget -O survey_data.fits "https://archive.eso.org/..."
# Upload to CANFAR
vcp survey_data.fits vos:[user]/external_data/
# Or direct to project space
curl -E ~/.ssl/cadcproxy.pem \
-T survey_data.fits \
https://ws-uv.canfar.net/arc/files/projects/[project]/survey_data.fits