PluginBench
Skill
Review
Audit score 70

explore-run

lllllllama/rigorpilot-skills

Bounded exploratory runs for deep learning research with fair-comparison caveats and candidate ranking.

What is explore-run?

explore-run executes authorized exploratory trials—small-subset validation, batch sweeps, transfer-learning probes—in isolated experiment state. Use when a researcher explicitly requests guess-and-check or quick verification runs, not for trusted baseline execution or implicit experimentation.

  • Plan and rank candidate exploratory runs using cost, success_rate, and expected_gain heuristics
  • Execute small-subset and short-cycle training probes with configurable variant axes and budget constraints
  • Isolate experiment state from trusted baseline to preserve research integrity
  • Generate fair-comparison reports and no-overclaim summaries in explore_outputs/
  • Hand off execution to minimal-run-and-audit or run-train while maintaining ranking semantics
  • Label results as bounded evidence and flag when comparisons are not directly fair

How to install explore-run

npx skills add https://github.com/lllllllama/rigorpilot-skills --skill explore-run
Prerequisites
  • Explicit researcher authorization for exploratory runs
  • Access to the research repository with current_research baseline
  • GPU or compute resources available for trial execution
  • Variant specification with axes, subset sizes, or short-run steps defined
Claude Code
Cursor
Windsurf
Cline

How to use explore-run

  1. 1.Define variant axes (hyperparameters, architectures, data subsets) and exploratory run scale (subset_sizes, short_run_steps)
  2. 2.Optionally specify selection_weights to rebalance cost, success_rate, and expected_gain scoring
  3. 3.Call explore-run with the variant spec; it ranks candidates using heuristic scoring
  4. 4.Review TOP_RUNS.md and COMPARABILITY_REPORT.md to assess fair-comparison caveats
  5. 5.Execute selected runs via minimal-run-and-audit or run-train; explore-run updates ranking with real evidence
  6. 6.Check explore_outputs/ for CHANGESET.md, SCIENTIFIC_CHANGELOG.md, and status.json

Use cases

Good for
  • Validate a new hyperparameter configuration on a small data subset before full training
  • Sweep batch sizes and learning rates across idle GPU capacity to rank promising candidates
  • Quick transfer-learning trial to test pretrained model on a new task
  • Short-cycle probe to estimate convergence behavior before committing to long training runs
  • Exploratory batch of model architecture variants ranked by estimated gain vs. computational cost
Who it's for
  • Deep learning researchers conducting authorized exploratory experiments
  • ML engineers validating candidate configurations before production training
  • Research teams needing bounded evidence and fair-comparison reports without full SOTA claims

explore-run FAQ

When should I use explore-run instead of run-train?

Use explore-run when the researcher explicitly authorizes bounded exploratory trials (small-subset validation, batch sweeps, quick probes). Use run-train for trusted baseline execution or conservative verification without exploratory authorization.

Does explore-run isolate experiments from the trusted baseline?

Yes. explore-run keeps experiment state isolated in explore_outputs/ and does not modify current_research, preserving baseline integrity.

How does ranking work?

Pre-execution ranking uses three factors: cost, success_rate, and expected_gain with conservative default weights. After execution, ranking switches to real evidence. You can override weights via selection_weights in the variant spec.

What if a comparison between runs is not fair?

explore-run flags unfair comparisons in COMPARABILITY_REPORT.md and labels results as bounded evidence rather than certified claims, with explicit caveats about subset size, run duration, or other confounds.

Can explore-run orchestrate end-to-end exploration?

No. explore-run is a leaf skill for execution planning and summary only. Use ai-research-explore if you need both current_research coordination and exploratory code changes.

Full instructions (SKILL.md)

Source of truth, from lllllllama/rigorpilot-skills.


name: explore-run description: Rigor Improve / Rigor Explore run leaf skill for bounded exploratory evidence in deep learning research repositories. Use when the researcher explicitly authorizes exploratory runs such as small-subset validation, short-cycle guess-and-check, batch sweeps, idle-GPU search, or quick transfer-learning trials, with fair-comparison caveats and no-overclaim summaries in explore_outputs/. Do not use for end-to-end exploration orchestration on top of current_research, trusted baseline execution, conservative training verification, default routing, verified SOTA claims, or implicit experimentation.

explore-run

Use this as the Rigor Improve / Rigor Explore run leaf skill. The installed slug remains explore-run for compatibility.

Use the shared operating principles in ../../references/agent-operating-principles.md; this skill should guide candidate run planning while preserving model judgment about the active repo.

When to apply

  • When the researcher explicitly authorizes exploratory runs.
  • When the task is a small-subset validation, short-cycle training probe, batch sweep, idle-GPU search, or quick transfer-learning trial.
  • When the output should rank candidate runs rather than certify trusted success.

When not to apply

  • When the user wants trusted training execution or conservative verification.
  • When there is no explicit exploratory authorization.
  • When the task is repository setup, intake, or debugging.

Clear boundaries

  • This skill owns exploratory execution planning and summary only.
  • Use ai-research-explore instead when the task spans both current_research coordination and exploratory code changes.
  • It may hand off actual command execution to minimal-run-and-audit or run-train.
  • It should keep experiment state isolated from the trusted baseline.
  • It should prefer small-subset and short-cycle checks before heavier exploratory runs.
  • It should label run results as bounded evidence and explain when a comparison is not directly fair.

Ranking Semantics

  • Pre-execution candidate selection uses three factors: cost, success_rate, and expected_gain.
  • Default weights should stay conservative unless the researcher explicitly provides selection_weights.
  • Budget pruning still applies after scoring through max_variants and max_short_cycle_runs.
  • If runs are executed later, downstream ranking should switch to real execution evidence, not stay purely heuristic.

Variant Spec Hints

  • Use variant_axes to define the candidate dimension grid.
  • Use subset_sizes and short_run_steps to express exploratory run scale.
  • Use selection_weights to rebalance cost, success_rate, and expected_gain.
  • Use primary_metric and metric_goal so downstream ranking can order executed candidates consistently.

Output expectations

  • explore_outputs/CHANGESET.md
  • explore_outputs/SCIENTIFIC_CHANGELOG.md
  • explore_outputs/COMPARABILITY_REPORT.md
  • explore_outputs/TOP_RUNS.md
  • explore_outputs/status.json

Notes

Use references/execution-policy.md, ../../references/explore-variant-spec.md, ../../references/deep-learning-experiment-principles.md, scripts/plan_variants.py, and scripts/write_outputs.py.