Skip to content

SLIME training backend setup

This doc describes how to train an AgentCore Runtime-deployed agent with the slime training backend. The public user surface is exactly two things: the SlimeRunner class (for launching training) and the two SGLang patch scripts (applied once to the SGLang install).

For known issues (Megatron-LM regression on 32B, norm-epsilon mismatch, etc.) see slime troubleshooting.

  • Hardware and CUDA requirements: see slime’s README and the slime docker README for tested GPU configurations per model size.
  • Python 3.10+ and uv.
  • AWS credentials with permission to invoke an AgentCore Runtime and read/write an S3 bucket.
  • An AgentCore Runtime deployment of your agent — follow the Prepare agent for RL guide. Save the resulting runtime ARN — required as the agent_runtime_arn argument on SlimeRunner below for Agent rollouts.
  • An S3 bucket for rollout result delivery — required as the s3_bucket argument on SlimeRunner below.

Follow slime’s own installation docs — either the container path (slimerl/slime:latest) or a bare-metal install. Everything below runs inside this environment.

Inside the slimerl/slime:latest container, slime and Megatron-LM ship pre-installed at /root/slime and /root/Megatron-LM — use those paths for slime_dir / megatron_dir on SlimeRunner. For a bare-metal install, point at wherever you cloned slime + Megatron-LM.

Inside the slime environment:

Terminal window
# From a clone of this repo
cd /path/to/agentcore-rl-toolkit
# Install the toolkit plus the slime-backend extras
uv pip install -e ".[slime]"

Then apply the SGLang token_ids patch — it adds prompt_token_ids / token_ids fields to chat completion responses so the gateway can capture RL training trace data. The patch is idempotent:

Terminal window
python -m agentcore_rl_toolkit.backends.slime.patches.sglang_token_ids
# Verify the patch round-trips under greedy decoding (any HF checkpoint
# works; Qwen2.5-0.5B-Instruct is the fastest to download + load)
python -m agentcore_rl_toolkit.backends.slime.patches.verify_sglang_token_ids \
--model-path /path/to/Qwen2.5-0.5B-Instruct
# Expect: "OK: 4/4 checks passed"

The training dataset is a JSONL file where each line is one rollout request. Every line has the shape:

{"prompt": "...", "metadata": { /* whatever your agent expects */ }}
  • prompt — top-level string, used by slime for length filtering only.
  • metadata — copied verbatim as the payload dict your @rollout_entrypoint function receives. Put every per-rollout config the agent needs here (user prompt, ground-truth answer, task IDs, repo URIs, etc.).

Example (GSM8K):

{"prompt": "How many ...?", "metadata": {"prompt": "How many ...?", "answer": "42"}}

SlimeRunner is the one and only entry point — a Python class that stops stale processes, starts a Ray head, submits the slime training job, and streams output. Defaults target 8 × H100 (num_gpus=8, tp_size=2, rollout_gpus_per_engine=2); tune them for your cluster.

from agentcore_rl_toolkit.backends.slime import SlimeRunner
SlimeRunner(
exp_id="gsm8k-3b-smoke",
agent_runtime_arn="arn:aws:bedrock-agentcore:...",
s3_bucket="your-bucket-name",
model_dir="/path/to/Qwen2.5-3B-Instruct",
data_path="/path/to/gsm8k_tiny.jsonl",
model_type="qwen2.5-3B",
).train(num_rollout=1) # 1 = smoke test; bump to 100 for a real run

Wandb — set WANDB_API_KEY and WANDB_ENTITY in your environment (plus wandb_project / wandb_group on the constructor) to log a run. Unset env vars skip wandb entirely.

Config-file workflow — dump kwargs to YAML and call SlimeRunner.from_yaml("my_run.yaml") instead.

SlimeRunner exposes every field most experiments tune (cluster shape, training hyperparameters, per-rollout ACR limits, extra_flags for extra arguments to be directly passed to SLIME) as constructor arguments. See the API reference or help(SlimeRunner) for the full list.