Data Scientist: Submitting FL Jobs
Learn how to propose and coordinate federated learning experiments with Data Owners across the network.
In syft-flwr, the submission process is the formal bridge between your experimental code and the real-world data held by Data Owners (DO). You don't just "run" a model; you propose a collaborative project that the Data Owner must review and approve.
1. Proposing FL Experiments
Before code execution can begin, you must establish the context of your study. This is done through a Job Proposal.
- Project Intent: Your proposal should clearly state the goal of the research (e.g., "Diabetes Risk Factor Analysis") to help Data Owners make informed decisions.
- Collaboration Invitations: A proposal acts as an invitation to Data Owners to join your federated network.
2. Targeting Specific Datasites
You must explicitly define which datasites in the SyftBox network will participate in your training rounds.
- Datasite Discovery: Connect to Data Owner datasites using the
init_sessionmethod with their email addresses. - Dataset Linking: Specify the exact Dataset Name or ID you want to use at each site (e.g.,
pima-indians-diabetes-database).
3. Job Configuration
Your job submission includes the entire bootstrapped project bundle.
- Code Path: Point the submission tool to your local project directory containing your
client_app.pyandserver_app.py. - Dependency Declaration: Ensure your
pyproject.tomllists all required libraries, as these will be automatically installed in the Data Owner's secure sandbox upon approval. - Hyperparameters: Set global constants like the number of training rounds or learning rate that will be synced to all participants.
4. Communicating with Data Owners
The submission process is handled asynchronously through the SyftBox file sync network.
- Submission Command: Use the
job.submitmethod to push your project to a Data Owner's queue.
import syft_rds as sy
# Connect to the Data Owner's datasite as a guest
do_client = sy.init_session(host="data_owner@example.com", email="your_email@example.com")
# Submit the job
job = do_client.job.submit(
name="fl-diabetes-prediction",
user_code_path="path/to/your/fl-project",
dataset_name="pima-indians-diabetes-database",
entrypoint="main.py",
)
- Transparency: Data Owners will receive your code in a
PENDINGstate and can view the exact scripts you submitted before they grant access.
5. Managing Approvals
You can track the status of your submitted jobs using the job.get_all() method.
- Status Monitoring: Check
do_client.job.get_all()to see job statuses. Jobs can bepending_code_review,approved, orcompleted.
# Check job status on a specific datasite
jobs = do_client.job.get_all()
print(jobs)
- Coordination: Training only begins once the required minimum number of Data Owners have approved and started their local clients.
Next Step: Once your jobs are approved, move to Running FL Server to start the global aggregation process and coordinate the training rounds.