Skill

Review

Audit score 70

explore-run

lllllllama/rigorpilot-skills

Bounded exploratory runs for deep learning research with fair-comparison caveats and candidate ranking.

What is explore-run?

explore-run executes authorized exploratory trials—small-subset validation, batch sweeps, transfer-learning probes—in isolated experiment state. Use when a researcher explicitly requests guess-and-check or quick verification runs, not for trusted baseline execution or implicit experimentation.

Plan and rank candidate exploratory runs using cost, success_rate, and expected_gain heuristics
Execute small-subset and short-cycle training probes with configurable variant axes and budget constraints
Isolate experiment state from trusted baseline to preserve research integrity
Generate fair-comparison reports and no-overclaim summaries in explore_outputs/
Hand off execution to minimal-run-and-audit or run-train while maintaining ranking semantics
Label results as bounded evidence and flag when comparisons are not directly fair

How to install explore-run

npx skills add https://github.com/lllllllama/rigorpilot-skills --skill explore-run

Prerequisites

Explicit researcher authorization for exploratory runs
Access to the research repository with current_research baseline
GPU or compute resources available for trial execution
Variant specification with axes, subset sizes, or short-run steps defined

Claude Code

Cursor

Windsurf

Cline

How to use explore-run

1.Define variant axes (hyperparameters, architectures, data subsets) and exploratory run scale (subset_sizes, short_run_steps)
2.Optionally specify selection_weights to rebalance cost, success_rate, and expected_gain scoring
3.Call explore-run with the variant spec; it ranks candidates using heuristic scoring
4.Review TOP_RUNS.md and COMPARABILITY_REPORT.md to assess fair-comparison caveats
5.Execute selected runs via minimal-run-and-audit or run-train; explore-run updates ranking with real evidence
6.Check explore_outputs/ for CHANGESET.md, SCIENTIFIC_CHANGELOG.md, and status.json

Use cases

Good for

Validate a new hyperparameter configuration on a small data subset before full training
Sweep batch sizes and learning rates across idle GPU capacity to rank promising candidates
Quick transfer-learning trial to test pretrained model on a new task
Short-cycle probe to estimate convergence behavior before committing to long training runs
Exploratory batch of model architecture variants ranked by estimated gain vs. computational cost

Who it's for

Deep learning researchers conducting authorized exploratory experiments
ML engineers validating candidate configurations before production training
Research teams needing bounded evidence and fair-comparison reports without full SOTA claims

explore-run FAQ

When should I use explore-run instead of run-train?

Use explore-run when the researcher explicitly authorizes bounded exploratory trials (small-subset validation, batch sweeps, quick probes). Use run-train for trusted baseline execution or conservative verification without exploratory authorization.

Does explore-run isolate experiments from the trusted baseline?

Yes. explore-run keeps experiment state isolated in explore_outputs/ and does not modify current_research, preserving baseline integrity.

How does ranking work?

Pre-execution ranking uses three factors: cost, success_rate, and expected_gain with conservative default weights. After execution, ranking switches to real evidence. You can override weights via selection_weights in the variant spec.

What if a comparison between runs is not fair?

explore-run flags unfair comparisons in COMPARABILITY_REPORT.md and labels results as bounded evidence rather than certified claims, with explicit caveats about subset size, run duration, or other confounds.

Can explore-run orchestrate end-to-end exploration?

No. explore-run is a leaf skill for execution planning and summary only. Use ai-research-explore if you need both current_research coordination and exploratory code changes.

Full instructions (SKILL.md)

Source of truth, from lllllllama/rigorpilot-skills.

name: explore-run description: Rigor Improve / Rigor Explore run leaf skill for bounded exploratory evidence in deep learning research repositories. Use when the researcher explicitly authorizes exploratory runs such as small-subset validation, short-cycle guess-and-check, batch sweeps, idle-GPU search, or quick transfer-learning trials, with fair-comparison caveats and no-overclaim summaries in `explore_outputs/`. Do not use for end-to-end exploration orchestration on top of `current_research`, trusted baseline execution, conservative training verification, default routing, verified SOTA claims, or implicit experimentation.

explore-run

Use this as the Rigor Improve / Rigor Explore run leaf skill. The installed slug remains explore-run for compatibility.

Use the shared operating principles in ../../references/agent-operating-principles.md; this skill should guide candidate run planning while preserving model judgment about the active repo.

When to apply

When the researcher explicitly authorizes exploratory runs.
When the task is a small-subset validation, short-cycle training probe, batch sweep, idle-GPU search, or quick transfer-learning trial.
When the output should rank candidate runs rather than certify trusted success.

When not to apply

When the user wants trusted training execution or conservative verification.
When there is no explicit exploratory authorization.
When the task is repository setup, intake, or debugging.

Clear boundaries

This skill owns exploratory execution planning and summary only.
Use ai-research-explore instead when the task spans both current_research coordination and exploratory code changes.
It may hand off actual command execution to minimal-run-and-audit or run-train.
It should keep experiment state isolated from the trusted baseline.
It should prefer small-subset and short-cycle checks before heavier exploratory runs.
It should label run results as bounded evidence and explain when a comparison is not directly fair.

Ranking Semantics

Pre-execution candidate selection uses three factors: cost, success_rate, and expected_gain.
Default weights should stay conservative unless the researcher explicitly provides selection_weights.
Budget pruning still applies after scoring through max_variants and max_short_cycle_runs.
If runs are executed later, downstream ranking should switch to real execution evidence, not stay purely heuristic.

Variant Spec Hints

Use variant_axes to define the candidate dimension grid.
Use subset_sizes and short_run_steps to express exploratory run scale.
Use selection_weights to rebalance cost, success_rate, and expected_gain.
Use primary_metric and metric_goal so downstream ranking can order executed candidates consistently.

Output expectations

explore_outputs/CHANGESET.md
explore_outputs/SCIENTIFIC_CHANGELOG.md
explore_outputs/COMPARABILITY_REPORT.md
explore_outputs/TOP_RUNS.md
explore_outputs/status.json

Notes

Use references/execution-policy.md, ../../references/explore-variant-spec.md, ../../references/deep-learning-experiment-principles.md, scripts/plan_variants.py, and scripts/write_outputs.py.

Related skills

More from lllllllama/rigorpilot-skills and the wider catalog.

analyze-project

lllllllama/rigorpilot-skills

Read-only analysis of deep learning repositories to understand structure, configs, and suspicious patterns.

84k installsAudited

ai-research-explore

lllllllama/rigorpilot-skills

Auditable deep learning research exploration with idea gating, fair comparison, and governed experiments.

84k installs

explore-code

lllllllama/rigorpilot-skills

Auditable exploratory code modifications for deep learning research on isolated branches with rollback tracking.

84k installs

ai-research-reproduction

lllllllama/rigorpilot-skills

README-first deep learning repository reproduction with auditable evidence and standardized outputs.

84k installsAudited

paper-context-resolver

lllllllama/rigorpilot-skills

Resolve reproduction-critical paper details when README and repo files leave gaps.

84k installs

safe-debug

lllllllama/rigorpilot-skills

Conservative diagnosis and minimal patching for deep learning training failures without automatic code mutation.

84k installsAudited