Part II: Data Scientist - FL Server
The Data Scientist coordinates the global training process across one or more Data Owners.
1. Discovery & Exploration
- Guest Login: Login to the target Data Owner's datasite using the SyftBox client in "Guest" mode.
- Explore: Programmatically list available datasets. Download and inspect the Mock Data to understand feature ranges and types.
2. Project Preparation
- Syft-Flwr Structure: Your project contains
client_app.py(local training logic) andserver_app.py(global aggregation). - Minimal Changes: If you have a standard Flower project, simply replace the network transport layer with
syft_flwr. - Task Definition: In
task.py, define your Logistic Regression model (e.g., using PyTorch or Scikit-Learn).
3. Simulation & Submission
- Local Simulation (Optional): Run
flwr run . --simulationon your machine using the mock data to ensure yourClientApptraining loop is bug-free. - Submit Job: Push the project bundle to the Data Owner.
do_client.job.submit(code_path="./diabetes_fl_project", dataset_name="pima-indians")
4. Aggregation & Results
- Start Server: Once the DO approves, launch the server. It will wait for the DO's "ClientApp" to send back model weights.
- Aggregation: The server uses a strategy like FedAvg to combine updates from multiple clients.
- Observe: Watch the loss decrease and accuracy increase across rounds (e.g., 2–5 rounds).
5. Privacy Guarantees
- No Raw Access: Any code attempt to read the private file path directly is blocked.
- Parameter Exchange: Only mathematical weights (W, b) move over the network.
- Compute-to-Data: The training happens where the data lives, satisfying strict health data residency requirements.