Skip to content

SlimeRunner

module: agentcore_rl_toolkit.backends.slime.runner

SlimeRunner — one Python entry point for slime-backed training.

Users instantiate SlimeRunner with a handful of per-experiment fields and call .train(); the runner reproduces what train.sh does today (stop stale processes, start a Ray head, source the slime model script, submit the slime training job) via subprocess.

train.sh stays in the repo as the low-level escape hatch; this class is the primary entry point.

One Python entry point for slime-backed training.

Constructor

SlimeRunner(
# --- Required: per-experiment ---
exp_id: str,
agent_runtime_arn: str,
s3_bucket: str,
model_dir: str,
data_path: str,
model_type: str,
# --- Optional: cluster ---
num_gpus: int = 8,
tp_size: int = 2,
rollout_gpus_per_engine: int = 2,
slime_dir: str = '/root/slime',
megatron_dir: str = '/root/Megatron-LM',
# --- Optional: ACR / toolkit (forwarded to slime via custom-config yaml) ---
model_id: str = 'default',
acr_timeout: int = 900,
acr_tps_limit: int = 25,
max_concurrent: int = 100,
gateway_port: int = 9090,
reward_postprocessing: str = 'grpo',
# --- Optional: training hyperparameters ---
rollout_batch_size: int = 32,
n_samples_per_prompt: int = 8,
rollout_max_response_len: int = 1024,
rollout_temperature: float = 1.0,
lr: float = 1e-06,
eps_clip: float = 0.2,
eps_clip_high: float = 0.28,
weight_decay: float = 0.1,
adam_beta2: float = 0.98,
sglang_mem_fraction_static: float = 0.7,
max_tokens_per_gpu: int = 9216,
# --- Wandb (opt-in; no defaults injected if unset) ---
wandb_project: str | None = None,
wandb_group: str | None = None,
# --- Escape hatch ---
extra_flags: list[str] = list(),
)

from_yaml(path: str | os.PathLike) -> 'SlimeRunner'

Section titled “from_yaml(path: str | os.PathLike) -> 'SlimeRunner'”

Load kwargs from a YAML file (convenience for config-file workflows).

Run the training job. Blocks until the slime job exits.

Mirrors train.sh step-by-step: stop stale sglang/ray, start a Ray head, source the slime model script, submit the slime training job via ray job submit. Streams stdout/stderr to the parent process.

  • acr_timeout (int)

  • acr_tps_limit (int)

  • adam_beta2 (float)

  • agent_runtime_arn (str)

  • data_path (str)

  • eps_clip (float)

  • eps_clip_high (float)

  • exp_id (str)

  • extra_flags (list[str])

  • gateway_port (int)

  • lr (float)

  • max_concurrent (int)

  • max_tokens_per_gpu (int)

  • megatron_dir (str)

  • model_dir (str)

  • model_id (str)

  • model_type (str)

  • n_samples_per_prompt (int)

  • num_gpus (int)

  • reward_postprocessing (str)

  • rollout_batch_size (int)

  • rollout_gpus_per_engine (int)

  • rollout_max_response_len (int)

  • rollout_temperature (float)

  • s3_bucket (str)

  • sglang_mem_fraction_static (float)

  • slime_dir (str)

  • tp_size (int)

  • wandb_group (str | None)

  • wandb_project (str | None)

  • weight_decay (float)