Skip to main content

Data Scientist: Submitting FL Jobs

Learn how to propose and coordinate federated learning experiments with Data Owners across the network.

In syft-flwr, the submission process is the formal bridge between your experimental code and the real-world data held by Data Owners (DO). You don't just "run" a model; you propose a collaborative project that the Data Owner must review and approve.

1. Proposing FL Experiments

Before code execution can begin, you must establish the context of your study. This is done through a Job Proposal.

  • Project Intent: Your proposal should clearly state the goal of the research (e.g., "Diabetes Risk Factor Analysis") to help Data Owners make informed decisions.
  • Collaboration Invitations: A proposal acts as an invitation to Data Owners to join your federated network.

2. Targeting Specific Datasites

You must explicitly define which datasites in the SyftBox network will participate in your training rounds.

  • Datasite Discovery: Connect to Data Owner datasites using the init_session method with their email addresses.
  • Dataset Linking: Specify the exact Dataset Name or ID you want to use at each site (e.g., pima-indians-diabetes-database).

3. Job Configuration

Your job submission includes the entire bootstrapped project bundle.

  • Code Path: Point the submission tool to your local project directory containing your client_app.py and server_app.py.
  • Dependency Declaration: Ensure your pyproject.toml lists all required libraries, as these will be automatically installed in the Data Owner's secure sandbox upon approval.
  • Hyperparameters: Set global constants like the number of training rounds or learning rate that will be synced to all participants.

4. Communicating with Data Owners

The submission process is handled asynchronously through the SyftBox file sync network.

  • Submission Command: Use the job.submit method to push your project to a Data Owner's queue.
import syft_rds as sy

# Connect to the Data Owner's datasite as a guest
do_client = sy.init_session(host="data_owner@example.com", email="your_email@example.com")

# Submit the job
job = do_client.job.submit(
name="fl-diabetes-prediction",
user_code_path="path/to/your/fl-project",
dataset_name="pima-indians-diabetes-database",
entrypoint="main.py",
)
  • Transparency: Data Owners will receive your code in a PENDING state and can view the exact scripts you submitted before they grant access.

5. Managing Approvals

You can track the status of your submitted jobs using the job.get_all() method.

  • Status Monitoring: Check do_client.job.get_all() to see job statuses. Jobs can be pending_code_review, approved, or completed.
# Check job status on a specific datasite
jobs = do_client.job.get_all()
print(jobs)
  • Coordination: Training only begins once the required minimum number of Data Owners have approved and started their local clients.

Next Step: Once your jobs are approved, move to Running FL Server to start the global aggregation process and coordinate the training rounds.