SlimeRunner
module: agentcore_rl_toolkit.backends.slime.runner
SlimeRunner — one Python entry point for slime-backed training.
Users instantiate SlimeRunner with a handful of per-experiment fields
and call .train(); the runner reproduces what train.sh does today
(stop stale processes, start a Ray head, source the slime model script,
submit the slime training job) via subprocess.
train.sh stays in the repo as the low-level escape hatch; this class
is the primary entry point.
class SlimeRunner
Section titled “class SlimeRunner”One Python entry point for slime-backed training.
Constructor
SlimeRunner( # --- Required: per-experiment --- exp_id: str, agent_runtime_arn: str, s3_bucket: str, model_dir: str, data_path: str, model_type: str,
# --- Optional: cluster --- num_gpus: int = 8, tp_size: int = 2, rollout_gpus_per_engine: int = 2, slime_dir: str = '/root/slime', megatron_dir: str = '/root/Megatron-LM',
# --- Optional: ACR / toolkit (forwarded to slime via custom-config yaml) --- model_id: str = 'default', acr_timeout: int = 900, acr_tps_limit: int = 25, max_concurrent: int = 100, gateway_port: int = 9090, reward_postprocessing: str = 'grpo',
# --- Optional: training hyperparameters --- rollout_batch_size: int = 32, n_samples_per_prompt: int = 8, rollout_max_response_len: int = 1024, rollout_temperature: float = 1.0, lr: float = 1e-06, eps_clip: float = 0.2, eps_clip_high: float = 0.28, weight_decay: float = 0.1, adam_beta2: float = 0.98, sglang_mem_fraction_static: float = 0.7, max_tokens_per_gpu: int = 9216,
# --- Wandb (opt-in; no defaults injected if unset) --- wandb_project: str | None = None, wandb_group: str | None = None,
# --- Escape hatch --- extra_flags: list[str] = list(),)Methods
Section titled “Methods”from_yaml(path: str | os.PathLike) -> 'SlimeRunner'
Section titled “from_yaml(path: str | os.PathLike) -> 'SlimeRunner'”Load kwargs from a YAML file (convenience for config-file workflows).
train(num_rollout: int = 1) -> None
Section titled “train(num_rollout: int = 1) -> None”Run the training job. Blocks until the slime job exits.
Mirrors train.sh step-by-step: stop stale sglang/ray, start a Ray
head, source the slime model script, submit the slime training job
via ray job submit. Streams stdout/stderr to the parent process.
Attributes
Section titled “Attributes”-
acr_timeout(int) -
acr_tps_limit(int) -
adam_beta2(float) -
agent_runtime_arn(str) -
data_path(str) -
eps_clip(float) -
eps_clip_high(float) -
exp_id(str) -
extra_flags(list[str]) -
gateway_port(int) -
lr(float) -
max_concurrent(int) -
max_tokens_per_gpu(int) -
megatron_dir(str) -
model_dir(str) -
model_id(str) -
model_type(str) -
n_samples_per_prompt(int) -
num_gpus(int) -
reward_postprocessing(str) -
rollout_batch_size(int) -
rollout_gpus_per_engine(int) -
rollout_max_response_len(int) -
rollout_temperature(float) -
s3_bucket(str) -
sglang_mem_fraction_static(float) -
slime_dir(str) -
tp_size(int) -
wandb_group(str | None) -
wandb_project(str | None) -
weight_decay(float)