Zero-Setup Federated Learning with Google Colab
Train machine learning models across distributed private datasets without any local setup—directly from Google Colab.
Overview
This tutorial demonstrates a complete federated learning workflow using the PIMA Indians Diabetes dataset split across two data owners. You'll train a diabetes prediction model collaboratively while keeping each party's data private and secure.
Key benefit: Raw data never leaves the data owner's environment—only model updates are shared.
The Parties
In this federated learning flow, there are three key parties:
| Party | Role | Description |
|---|---|---|
| Data Owner 1 (DO1) | Data holder | Holds partition 0 of the diabetes dataset |
| Data Owner 2 (DO2) | Data holder | Holds partition 1 of the diabetes dataset |
| Data Scientist (DS) | Coordinator | Proposes the ML project, submits jobs, and aggregates results |
Each party runs in a separate Google Colab notebook. You can use three different Google accounts—or invite two friends to join for a real collaborative experience!
Prerequisites
Before starting, you'll need:
- Three Google accounts (one for each party), or two friends willing to join
- Each party downloads and opens their respective notebook in Google Colab
That's it! No local Python installation, no complex setup.
Get the Notebooks
Download the notebooks from the official repository:
Tutorial Structure
This tutorial is divided into three parts:
- Setup - Install packages and authenticate all parties
- Data Owner Workflow - Create datasets and approve jobs
- Data Scientist Workflow - Explore data, submit jobs, and run aggregation
What You'll Learn
By the end of this tutorial, you will:
- Set up secure connections between data owners and data scientists
- Create Syft datasets with private and mock data paths
- Submit federated learning jobs for review and approval
- Run distributed model training with privacy guarantees
- Aggregate model updates using the Flower framework
Next Steps
Continue to the Setup guide to get started.