Data Scientist: Creating FL Projects
Learn how to structure and implement a federated learning project using the Flower framework integrated with SyftBox.
In syft-flwr, the project structure is designed to be almost identical to a standard Flower project. This allows you to leverage existing Flower knowledge while benefiting from SyftBox's privacy-preserving networking.
1. Creating a Flower Project
The easiest way to start is by using the Flower CLI to scaffold a new project.
- Initialize: Run
flwr newin your terminal and select your preferred ML framework (e.g., PyTorch or TensorFlow). - Syft-Flwr Integration: Once scaffolded, you will only need to make minor modifications to allow the project to communicate over the SyftBox network instead of standard gRPC.
2. Project Structure
A typical syft-flwr project follows this organized layout:
my-fl-project/
├── my_fl_project/
│ ├── __init__.py
│ ├── client_app.py # Logic for local training on Data Owner sites
│ ├── server_app.py # Logic for global model aggregation
│ └── task.py # Model definition and data loading logic
├── pyproject.toml # Metadata, dependencies, and app configuration
└── README.md
3. Implementing the Client App
The client_app.py defines how the model trains on the Data Owner's private data.
ClientAppObject: You wrap your training logic in aClientAppobject.- Loading Data: In
syft-flwr, you replace standard data loading withload_syftbox_datasetto securely access the data hosted on the DO's datasite. - Training Logic: Use the
@app.train()decorator to define the local fitting process, which returns updated model parameters and metrics like accuracy.
4. Implementing the Server App
The server_app.py coordinates the federated rounds and manages global model updates.
ServerAppObject: This manages the overall orchestration of the FL process.server_fn: Defines the server's behavior, including the number of rounds and the aggregation strategy to use.- Configuration: You can pass runtime hyperparameters (like learning rate) to all clients through the
ServerConfig.
5. Defining Aggregation Strategies
Strategies determine how individual client updates are combined into a new global model.
- FedAvg: The default and most common strategy, which performs a weighted average of model parameters.
- Custom Strategies: You can extend existing classes (like
FedAvg) to implement custom logic, such as dynamic learning rate adjustment or advanced privacy protections.
6. Configuration Files (pyproject.toml)
The pyproject.toml file is the heartbeat of your project configuration.
- App Components: Defines the import paths for your
ServerAppandClientAppso the system knows where to find your code. - Dependencies: Lists the necessary Python packages (e.g.,
flwr,torch,numpy) required for the project to run. - Runtime Config: Use the
[tool.flwr.app.config]section to set default values for your experiment, such as the number of server rounds.
Next Step: Now that your project is structured, proceed to Bootstrapping with syft-flwr to learn how to prepare your code for submission to a Data Owner.