Data Scientist: Running FL Servers

Learn how to run and manage the aggregation server that coordinates federated learning across distributed datasites.

The final stage of your federated workflow is launching the ServerApp to coordinate training across the network. The ServerApp acts as the central orchestrator. It does not see the raw data, but it dictates which clients participate in each round and how their model weights are combined. The server manages the lifecycle of the experiment, from selecting participants to mathematically aggregating their local updates into a global model.

1. Starting the FL Server

In a syft-flwr project, the server is initialized using the main.py entry point generated during bootstrapping.

Execution: Run your project using the uv run command or the Flower CLI to start the server process.
SyftBox Communication: Upon startup, the server utilizes SyftBox's file sync to exchange messages with the Data Owners you targeted in your job submission.

2. Server Configuration

Your server's behavior is primarily defined in server_app.py and the pyproject.toml file.

Round Definition: Specify the total number of training rounds (e.g., num_rounds=3).
Hyperparameters: You can pass a ConfigRecord containing settings like batch_size or learning_rate to be sent to all clients at the start of each round.
Initial Weights: The server initializes the global model weights (often randomly or from a pre-trained checkpoint) to be sent to participants in Round 1.

3. Coordinating FL Rounds

Training happens in sequential rounds, each consisting of four main phases:

Distribution: The server sends the current global model parameters to selected clients.
Local Training: Clients execute the client_app.py logic on their private data.
Collection: The server waits for clients to return their updated weights and local metrics (like accuracy).
Aggregation: The server combines these updates into a new global model for the next round.

4. Model Aggregation & Strategies

The aggregation strategy determines how the global model "learns" from participants.

FedAvg: The standard strategy, which calculates a weighted average of client weights based on the number of local examples they trained on.
Strategy Callbacks: You can customize strategies with callbacks like global_evaluate to test the aggregated model on a central validation set after each round.

5. Handling Client Responses & Timeouts

Real-world networks are unpredictable. The server includes logic to handle "stragglers" or disconnected clients.

Minimum Nodes: Set min_available_nodes to ensure training only starts when enough Data Owners are online.
Timeouts: You can configure a round_timeout to prevent the entire experiment from stalling if one client is slow to return its results.
Failure Handling: The server tracks how many results were successfully received versus how many failed, allowing it to proceed with partial updates if necessary.

Next Step: To further refine your model's performance on non-identical data distributions, explore Custom Strategies to implement advanced aggregation logic.

1. Starting the FL Server​

2. Server Configuration​

3. Coordinating FL Rounds​

4. Model Aggregation & Strategies​

5. Handling Client Responses & Timeouts​

1. Starting the FL Server

2. Server Configuration

3. Coordinating FL Rounds

4. Model Aggregation & Strategies

5. Handling Client Responses & Timeouts