Skip to main content

Part II: Data Scientist - FL Server

The Data Scientist coordinates the global training process across one or more Data Owners.

1. Discovery & Exploration

  • Guest Login: Login to the target Data Owner's datasite using the SyftBox client in "Guest" mode.
  • Explore: Programmatically list available datasets. Download and inspect the Mock Data to understand feature ranges and types.

2. Project Preparation

  • Syft-Flwr Structure: Your project contains client_app.py (local training logic) and server_app.py (global aggregation).
  • Minimal Changes: If you have a standard Flower project, simply replace the network transport layer with syft_flwr.
  • Task Definition: In task.py, define your Logistic Regression model (e.g., using PyTorch or Scikit-Learn).

3. Simulation & Submission

  • Local Simulation (Optional): Run flwr run . --simulation on your machine using the mock data to ensure your ClientApp training loop is bug-free.
  • Submit Job: Push the project bundle to the Data Owner.
do_client.job.submit(code_path="./diabetes_fl_project", dataset_name="pima-indians")

4. Aggregation & Results

  • Start Server: Once the DO approves, launch the server. It will wait for the DO's "ClientApp" to send back model weights.
  • Aggregation: The server uses a strategy like FedAvg to combine updates from multiple clients.
  • Observe: Watch the loss decrease and accuracy increase across rounds (e.g., 2–5 rounds).

5. Privacy Guarantees

  • No Raw Access: Any code attempt to read the private file path directly is blocked.
  • Parameter Exchange: Only mathematical weights (W, b) move over the network.
  • Compute-to-Data: The training happens where the data lives, satisfying strict health data residency requirements.