Data Scientist: Creating FL Projects

Learn how to structure and implement a federated learning project using the Flower framework integrated with SyftBox.

In syft-flwr, the project structure is designed to be almost identical to a standard Flower project. This allows you to leverage existing Flower knowledge while benefiting from SyftBox's privacy-preserving networking.

1. Creating a Flower Project

The easiest way to start is by using the Flower CLI to scaffold a new project.

Initialize: Run flwr new in your terminal and select your preferred ML framework (e.g., PyTorch or TensorFlow).
Syft-Flwr Integration: Once scaffolded, you will only need to make minor modifications to allow the project to communicate over the SyftBox network instead of standard gRPC.

2. Project Structure

A typical syft-flwr project follows this organized layout:

my-fl-project/
├── my_fl_project/
│   ├── __init__.py
│   ├── client_app.py  # Logic for local training on Data Owner sites
│   ├── server_app.py  # Logic for global model aggregation
│   └── task.py        # Model definition and data loading logic
├── pyproject.toml     # Metadata, dependencies, and app configuration
└── README.md

3. Implementing the Client App

The client_app.py defines how the model trains on the Data Owner's private data.

ClientApp Object: You wrap your training logic in a ClientApp object.
Loading Data: In syft-flwr, you replace standard data loading with load_syftbox_dataset to securely access the data hosted on the DO's datasite.
Training Logic: Use the @app.train() decorator to define the local fitting process, which returns updated model parameters and metrics like accuracy.

4. Implementing the Server App

The server_app.py coordinates the federated rounds and manages global model updates.

ServerApp Object: This manages the overall orchestration of the FL process.
server_fn: Defines the server's behavior, including the number of rounds and the aggregation strategy to use.
Configuration: You can pass runtime hyperparameters (like learning rate) to all clients through the ServerConfig.

5. Defining Aggregation Strategies

Strategies determine how individual client updates are combined into a new global model.

FedAvg: The default and most common strategy, which performs a weighted average of model parameters.
Custom Strategies: You can extend existing classes (like FedAvg) to implement custom logic, such as dynamic learning rate adjustment or advanced privacy protections.

6. Configuration Files (`pyproject.toml`)

The pyproject.toml file is the heartbeat of your project configuration.

App Components: Defines the import paths for your ServerApp and ClientApp so the system knows where to find your code.
Dependencies: Lists the necessary Python packages (e.g., flwr, torch, numpy) required for the project to run.
Runtime Config: Use the [tool.flwr.app.config] section to set default values for your experiment, such as the number of server rounds.

Next Step: Now that your project is structured, proceed to Bootstrapping with syft-flwr to learn how to prepare your code for submission to a Data Owner.

1. Creating a Flower Project​

2. Project Structure​

3. Implementing the Client App​

4. Implementing the Server App​

5. Defining Aggregation Strategies​

6. Configuration Files (pyproject.toml)​