explore-run
lllllllama/rigorpilot-skills
Bounded exploratory runs for deep learning research with fair-comparison caveats and candidate ranking.
What is explore-run?
explore-run executes authorized exploratory trials—small-subset validation, batch sweeps, transfer-learning probes—in isolated experiment state. Use when a researcher explicitly requests guess-and-check or quick verification runs, not for trusted baseline execution or implicit experimentation.
- Plan and rank candidate exploratory runs using cost, success_rate, and expected_gain heuristics
- Execute small-subset and short-cycle training probes with configurable variant axes and budget constraints
- Isolate experiment state from trusted baseline to preserve research integrity
- Generate fair-comparison reports and no-overclaim summaries in explore_outputs/
- Hand off execution to minimal-run-and-audit or run-train while maintaining ranking semantics
- Label results as bounded evidence and flag when comparisons are not directly fair
How to install explore-run
npx skills add https://github.com/lllllllama/rigorpilot-skills --skill explore-run- Explicit researcher authorization for exploratory runs
- Access to the research repository with current_research baseline
- GPU or compute resources available for trial execution
- Variant specification with axes, subset sizes, or short-run steps defined
How to use explore-run
- 1.Define variant axes (hyperparameters, architectures, data subsets) and exploratory run scale (subset_sizes, short_run_steps)
- 2.Optionally specify selection_weights to rebalance cost, success_rate, and expected_gain scoring
- 3.Call explore-run with the variant spec; it ranks candidates using heuristic scoring
- 4.Review TOP_RUNS.md and COMPARABILITY_REPORT.md to assess fair-comparison caveats
- 5.Execute selected runs via minimal-run-and-audit or run-train; explore-run updates ranking with real evidence
- 6.Check explore_outputs/ for CHANGESET.md, SCIENTIFIC_CHANGELOG.md, and status.json
Use cases
- Validate a new hyperparameter configuration on a small data subset before full training
- Sweep batch sizes and learning rates across idle GPU capacity to rank promising candidates
- Quick transfer-learning trial to test pretrained model on a new task
- Short-cycle probe to estimate convergence behavior before committing to long training runs
- Exploratory batch of model architecture variants ranked by estimated gain vs. computational cost
- Deep learning researchers conducting authorized exploratory experiments
- ML engineers validating candidate configurations before production training
- Research teams needing bounded evidence and fair-comparison reports without full SOTA claims
explore-run FAQ
Use explore-run when the researcher explicitly authorizes bounded exploratory trials (small-subset validation, batch sweeps, quick probes). Use run-train for trusted baseline execution or conservative verification without exploratory authorization.
Yes. explore-run keeps experiment state isolated in explore_outputs/ and does not modify current_research, preserving baseline integrity.
Pre-execution ranking uses three factors: cost, success_rate, and expected_gain with conservative default weights. After execution, ranking switches to real evidence. You can override weights via selection_weights in the variant spec.
explore-run flags unfair comparisons in COMPARABILITY_REPORT.md and labels results as bounded evidence rather than certified claims, with explicit caveats about subset size, run duration, or other confounds.
No. explore-run is a leaf skill for execution planning and summary only. Use ai-research-explore if you need both current_research coordination and exploratory code changes.
Full instructions (SKILL.md)
Source of truth, from lllllllama/rigorpilot-skills.
name: explore-run
description: Rigor Improve / Rigor Explore run leaf skill for bounded exploratory evidence in deep learning research repositories. Use when the researcher explicitly authorizes exploratory runs such as small-subset validation, short-cycle guess-and-check, batch sweeps, idle-GPU search, or quick transfer-learning trials, with fair-comparison caveats and no-overclaim summaries in explore_outputs/. Do not use for end-to-end exploration orchestration on top of current_research, trusted baseline execution, conservative training verification, default routing, verified SOTA claims, or implicit experimentation.
explore-run
Use this as the Rigor Improve / Rigor Explore run leaf skill. The installed slug
remains explore-run for compatibility.
Use the shared operating principles in
../../references/agent-operating-principles.md; this skill should guide
candidate run planning while preserving model judgment about the active repo.
When to apply
- When the researcher explicitly authorizes exploratory runs.
- When the task is a small-subset validation, short-cycle training probe, batch sweep, idle-GPU search, or quick transfer-learning trial.
- When the output should rank candidate runs rather than certify trusted success.
When not to apply
- When the user wants trusted training execution or conservative verification.
- When there is no explicit exploratory authorization.
- When the task is repository setup, intake, or debugging.
Clear boundaries
- This skill owns exploratory execution planning and summary only.
- Use
ai-research-exploreinstead when the task spans both current_research coordination and exploratory code changes. - It may hand off actual command execution to
minimal-run-and-auditorrun-train. - It should keep experiment state isolated from the trusted baseline.
- It should prefer small-subset and short-cycle checks before heavier exploratory runs.
- It should label run results as bounded evidence and explain when a comparison is not directly fair.
Ranking Semantics
- Pre-execution candidate selection uses three factors:
cost,success_rate, andexpected_gain. - Default weights should stay conservative unless the researcher explicitly provides
selection_weights. - Budget pruning still applies after scoring through
max_variantsandmax_short_cycle_runs. - If runs are executed later, downstream ranking should switch to real execution evidence, not stay purely heuristic.
Variant Spec Hints
- Use
variant_axesto define the candidate dimension grid. - Use
subset_sizesandshort_run_stepsto express exploratory run scale. - Use
selection_weightsto rebalancecost,success_rate, andexpected_gain. - Use
primary_metricandmetric_goalso downstream ranking can order executed candidates consistently.
Output expectations
explore_outputs/CHANGESET.mdexplore_outputs/SCIENTIFIC_CHANGELOG.mdexplore_outputs/COMPARABILITY_REPORT.mdexplore_outputs/TOP_RUNS.mdexplore_outputs/status.json
Notes
Use references/execution-policy.md, ../../references/explore-variant-spec.md, ../../references/deep-learning-experiment-principles.md, scripts/plan_variants.py, and scripts/write_outputs.py.
Related skills
More from lllllllama/rigorpilot-skills and the wider catalog.
analyze-project
Read-only analysis of deep learning repositories to understand structure, configs, and suspicious patterns.
ai-research-explore
Auditable deep learning research exploration with idea gating, fair comparison, and governed experiments.
explore-code
Auditable exploratory code modifications for deep learning research on isolated branches with rollback tracking.
ai-research-reproduction
README-first deep learning repository reproduction with auditable evidence and standardized outputs.
paper-context-resolver
Resolve reproduction-critical paper details when README and repo files leave gaps.
safe-debug
Conservative diagnosis and minimal patching for deep learning training failures without automatic code mutation.