Notebook Sessions¶
Interactive Jupyter Lab sessions for data analysis and computational astronomy
🎯 What You'll Learn
- How to launch and configure Jupyter notebook sessions
- Available containers and when to use each
- File management, uploads, and storage integration
- Performance tips, collaboration, and troubleshooting
Jupyter notebooks combine code execution, rich text documentation, and inline visualisations in a single interface. CANFAR's notebook sessions include pre-configured astronomy software stacks, persistent storage access, and collaborative sharing capabilities.
📋 Overview¶
Notebook sessions provide:
- Jupyter Lab: Full-featured development environment with file browser, terminal, and extensions
- Pre-configured containers: Astronomy-specific software stacks with popular libraries
- Persistent storage: Direct access to your
/arc/home/
and/arc/projects/
data - Terminal access: Built-in terminal for command-line operations
- File transfers: Upload/download capabilities for data management
🚀 Creating a Notebook Session¶
Step 1: Access Session Creation¶
From the Science Portal dashboard, click the plus sign (+) to create a new session, then select notebook as your session type.
Step 2: Choose Your Container¶
Select a container image that includes the software you need. Each container comes pre-configured with specific tools and libraries:
Available Containers¶
There are quite a few containers available, some from the CANFAR team, and the community. Some examples:
Container | Contents | Best For |
---|---|---|
astroml ⭐ | Modern Python astronomy stack (astropy , numpy , scipy , matplotlib , pandas ) |
General astronomy analysis, data science |
casa-notebook | CASA + Python stack | Radio astronomy data reduction |
Container Selection
Start with astroml for most astronomy workflows. It includes the latest astronomy libraries and is actively maintained. Use CASA containers only when you specifically need CASA functionality.
Step 3: Configure Session Resources¶
Session Name¶
Choose a descriptive name that helps you identify this session later:
Good session names:
- galaxy-photometry
- pulsar-analysis
- alma-data-reduction
- exoplanet-search
Memory Allocation¶
Select the maximum amount of RAM you anticipate requiring:
Memory Guidelines: - 4GB: Basic analysis, small datasets - 16GB: Standard workflows, moderate datasets (recommended default) - 32GB: Large datasets, memory-intensive operations - 64GB+: Very large datasets, specialized workflows
Resource Sharing
Choose the smallest value reasonable for your needs. Computing resources are shared amongst all users. Excessive requests may slow or prevent session launch.
CPU Cores¶
Select the maximum number of computing cores you anticipate requiring:
CPU Guidelines: - 1-2 cores: Most single-threaded analysis (recommended default) - 4-8 cores: Parallel processing, multi-threaded libraries - 16+ cores: Highly parallel workflows
Step 4: Launch Session¶
Click the Launch button to start your notebook session.
Wait until a notebook icon appears on your dashboard, then click it to access your session:
🧭 Using Jupyter Lab¶
Interface Overview¶
Once connected, you'll see the Jupyter Lab interface with several key areas:
- File Browser (left): Navigate your filesystem and open files
- Main Work Area (centre): Notebooks, terminals, and file editors
- Launcher: Create new notebooks, terminals, and files
- Menu Bar: File operations, edit functions, and view options
Starting Your First Notebook¶
- Click the Python 3 (ipykernel) notebook icon in the launcher
- Start coding in the first cell
- Run cells with
Shift+Enter
File Management¶
Persistent Storage Locations¶
Your notebook session has access to:
/arc/home/[user]/ # Your personal 10GB space
/arc/projects/[project]/ # Shared project spaces
/scratch/ # Temporary high-speed storage
Uploading Files¶
Method 1: Direct Upload (< 100MB)
- Navigate to your target directory in the file browser
- Click the upload arrow in the top menu bar
- Select files and click "Open"
- Files appear in the browser
Method 2: Copy-Paste Text
For code snippets or small text files:
- Open a terminal by double-clicking the terminal icon
- Create/edit files using command-line editors
- Copy text from your local computer
- Paste into the editor
Working with CASA¶
If using a CASA container, you can run CASA commands directly in notebook cells:
# Import CASA tasks
import casatasks as casa
# Example: Import UV data
casa.importuvfits(fitsfile='data.uvfits', vis='data.ms')
# List measurement set contents
casa.listobs(vis='data.ms')
🔧 Advanced Features¶
Terminal Access¶
Access the built-in terminal for command-line operations:
- Click the terminal icon in the launcher
- Run commands as you would in any Linux terminal
- Install packages with pip or conda (where permissions allow)
# Example terminal commands
ls /arc/projects/[project]/
python script.py
git clone https://github.com/username/repo.git
Jupyter Extensions¶
Many useful extensions are pre-installed:
- Variable Inspector: View variable contents
- Table of Contents: Navigate large notebooks
- Git Integration: Version control directly in Jupyter
Python Package Management¶
Installing Additional Packages¶
# Prefer python -m pip for clarity; installs to user site if needed
python -m pip install --user package-name
# Check installed packages
python -m pip list | less
🤝 Collaboration¶
Focus collaboration on shared project storage and version control, not session URL sharing.
Best Practices for Collaboration¶
- Use descriptive cell comments for clarity
- Save frequently to persistent storage
- Use version control (git) for important work
- Coordinate changes via pull requests or issue tracking
Sharing Notebooks¶
# Save notebook to shared location
cp my-analysis.ipynb /arc/projects/[project]/notebooks/
# Share via git repository
git add my-analysis.ipynb
git commit -m "Add analysis notebook"
git push origin main
⚡ Performance Optimisation¶
Memory Management¶
import psutil
print(f"Memory usage: {psutil.virtual_memory().percent}%")
# Free up memory by deleting large variables
if 'large_array' in globals():
del large_array
import gc
gc.collect()
Storage Performance¶
# Use /scratch for intensive I/O operations
import shutil, pathlib
source = '/arc/projects/[project]/large_file.fits'
target = '/scratch/large_file.fits'
shutil.copy(source, target)
# ... your analysis code ...
# Copy results back
pathlib.Path('/arc/projects/[project]/outputs/').mkdir(parents=True, exist_ok=True)
shutil.copy('/scratch/results.fits', '/arc/projects/[project]/outputs/results.fits')
Efficient Data Loading¶
from astropy.io import fits
# Efficient FITS access with context manager and memmap
with fits.open('huge_file.fits', memmap=True) as hdul:
header = hdul[0].header # Primary header
data_section = hdul[1].data # Access required extension lazily
# If only header needed
from astropy.io.fits import getheader
primary_header = getheader('large_file.fits')
🔧 Troubleshooting¶
Common Issues¶
Kernel Not Starting¶
Problem: Python kernel fails to start
Solutions:
1. Restart the kernel: Kernel
→ Restart Kernel
2. Clear output: Cell
→ All Output
→ Clear
3. Check memory usage and restart session if needed
Out of Memory Errors¶
Problem: MemoryError
or kernel crashes
Solutions: 1. Restart kernel and clear variables 2. Process data in smaller chunks 3. Use more memory-efficient data types 4. Launch session with more RAM
Slow Performance¶
Problem: Notebooks running slowly
Solutions:
1. Check system resources with htop
in terminal
2. Close unused notebooks and terminals
3. Clear notebook output: Cell
→ All Output
→ Clear
4. Use /scratch
for temporary files
File Upload Issues¶
Problem: Cannot upload files or uploads fail
Solutions:
1. Check file size (< 100MB for web upload)
2. Use command-line tools for larger files
3. Check available disk space
4. Try uploading to /scratch
first, then moving