Data Scientist: Running FL Servers
Learn how to run and manage the aggregation server that coordinates federated learning across distributed datasites.
The final stage of your federated workflow is launching the ServerApp to coordinate training across the network. The ServerApp acts as the central orchestrator. It does not see the raw data, but it dictates which clients participate in each round and how their model weights are combined.
The server manages the lifecycle of the experiment, from selecting participants to mathematically aggregating their local updates into a global model.
1. Starting the FL Server
In a syft-flwr project, the server is initialized using the main.py entry point generated during bootstrapping.
- Execution: Run your project using the
uv runcommand or the Flower CLI to start the server process. - SyftBox Communication: Upon startup, the server utilizes SyftBox's file sync to exchange messages with the Data Owners you targeted in your job submission.
2. Server Configuration
Your server's behavior is primarily defined in server_app.py and the pyproject.toml file.
- Round Definition: Specify the total number of training rounds (e.g.,
num_rounds=3). - Hyperparameters: You can pass a
ConfigRecordcontaining settings likebatch_sizeorlearning_rateto be sent to all clients at the start of each round. - Initial Weights: The server initializes the global model weights (often randomly or from a pre-trained checkpoint) to be sent to participants in Round 1.
3. Coordinating FL Rounds
Training happens in sequential rounds, each consisting of four main phases:
- Distribution: The server sends the current global model parameters to selected clients.
- Local Training: Clients execute the
client_app.pylogic on their private data. - Collection: The server waits for clients to return their updated weights and local metrics (like accuracy).
- Aggregation: The server combines these updates into a new global model for the next round.
4. Model Aggregation & Strategies
The aggregation strategy determines how the global model "learns" from participants.
- FedAvg: The standard strategy, which calculates a weighted average of client weights based on the number of local examples they trained on.
- Strategy Callbacks: You can customize strategies with callbacks like
global_evaluateto test the aggregated model on a central validation set after each round.
5. Handling Client Responses & Timeouts
Real-world networks are unpredictable. The server includes logic to handle "stragglers" or disconnected clients.
- Minimum Nodes: Set
min_available_nodesto ensure training only starts when enough Data Owners are online. - Timeouts: You can configure a
round_timeoutto prevent the entire experiment from stalling if one client is slow to return its results. - Failure Handling: The server tracks how many results were successfully received versus how many failed, allowing it to proceed with partial updates if necessary.
Next Step: To further refine your model's performance on non-identical data distributions, explore Custom Strategies to implement advanced aggregation logic.