Skip to main content

Part II: Data Scientist - FL Server

The Data Scientist coordinates the global training process across one or more Data Owners.

1. Discovery & Exploration

Guest Login: Login to the target Data Owner's datasite using the SyftBox client in "Guest" mode.
Explore: Programmatically list available datasets. Download and inspect the Mock Data to understand feature ranges and types.

2. Project Preparation

Syft-Flwr Structure: Your project contains client_app.py (local training logic) and server_app.py (global aggregation).
Minimal Changes: If you have a standard Flower project, simply replace the network transport layer with syft_flwr.
Task Definition: In task.py, define your Logistic Regression model (e.g., using PyTorch or Scikit-Learn).

3. Simulation & Submission

Local Simulation (Optional): Run flwr run . --simulation on your machine using the mock data to ensure your ClientApp training loop is bug-free.
Submit Job: Push the project bundle to the Data Owner.

do_client.job.submit(code_path="./diabetes_fl_project", dataset_name="pima-indians")

4. Aggregation & Results

Start Server: Once the DO approves, launch the server. It will wait for the DO's "ClientApp" to send back model weights.
Aggregation: The server uses a strategy like FedAvg to combine updates from multiple clients.
Observe: Watch the loss decrease and accuracy increase across rounds (e.g., 2–5 rounds).

5. Privacy Guarantees

No Raw Access: Any code attempt to read the private file path directly is blocked.
Parameter Exchange: Only mathematical weights (W, b) move over the network.
Compute-to-Data: The training happens where the data lives, satisfying strict health data residency requirements.

1. Discovery & Exploration
2. Project Preparation
3. Simulation & Submission
4. Aggregation & Results
5. Privacy Guarantees