Skill

Review

Audit score 70

minimal-run-and-audit

lllllllama/rigorpilot-skills

Execute and audit deep learning repo smoke tests with standardized evidence capture and patch tracking.

What is minimal-run-and-audit?

Minimal-run-and-audit is a Rigor Run skill for README-first deep learning repository reproduction. Use it after a reproduction target and setup plan exist to execute documented smoke tests, inference runs, or evaluation commands and generate standardized repro_outputs/ files with full audit trails and patch notes.

Execute selected smoke tests, inference runs, or evaluation commands with full evidence capture
Generate standardized repro_outputs/ files with execution results and normalized outputs
Create SCIENTIFIC_CHANGELOG.md documenting changes to evaluation, preprocessing, checkpoints, or metrics
Produce COMPARABILITY_REPORT.md assessing README/paper/baseline alignment
Track repository file changes in PATCHES.md with clear scientific impact distinction
Distinguish between verified, partial, and blocked execution states

How to install minimal-run-and-audit

npx skills add https://github.com/lllllllama/rigorpilot-skills --skill minimal-run-and-audit

Prerequisites

Selected reproduction target and setup plan already defined
Runnable commands or smoke commands identified
Environment and asset assumptions documented

Claude Code

Cursor

Windsurf

Cline

How to use minimal-run-and-audit

1.Provide the selected reproduction goal and the specific command to execute
2.Specify environment and asset assumptions for the run
3.Execute the command using the skill's execution framework
4.Review generated repro_outputs/ files and audit reports
5.Check SCIENTIFIC_CHANGELOG.md for any changes affecting evaluation or metrics
6.Verify COMPARABILITY_REPORT.md for alignment with README and paper claims
7.Review PATCHES.md if repository files were modified during execution

Use cases

Good for

Running a documented inference command on a pre-trained model and capturing outputs for reproducibility verification
Executing a smoke test suite after environment setup to validate repo functionality without full training
Auditing evaluation runs against baseline metrics and documenting any deviations or code changes
Normalizing execution evidence from multiple runs into standardized reports for comparison
Capturing inference outputs with patch notes when repository files were modified during execution

Who it's for

ML researchers validating deep learning repository reproducibility
Research engineers executing and auditing smoke tests and evaluation runs
Teams documenting evidence for paper reproducibility claims
Scientists needing standardized audit trails for computational experiments

minimal-run-and-audit FAQ

When should I use this skill vs. other reproduction skills?

Use minimal-run-and-audit after you have already selected a specific reproduction target and setup plan. It handles execution and evidence capture only—not target selection, environment setup, or training orchestration.

What happens if repository files change during execution?

The skill tracks all file changes in PATCHES.md and documents their scientific impact in SCIENTIFIC_CHANGELOG.md. Changes affecting evaluation, preprocessing, checkpoints, or metrics are flagged and not hidden.

Can this skill run training jobs?

No. This skill is designed for smoke tests, inference runs, and evaluation commands—not for training execution or long-running training state management.

What output files does this skill generate?

It generates standardized repro_outputs/ files, SCIENTIFIC_CHANGELOG.md (for scientific meaning changes), COMPARABILITY_REPORT.md (for README/paper alignment), and PATCHES.md (if files changed).

What if the command fails or produces partial results?

The skill clearly distinguishes between verified, partial, and blocked execution states in its reports, allowing you to assess what succeeded and what needs further investigation.

Full instructions (SKILL.md)

Source of truth, from lllllllama/rigorpilot-skills.

name: minimal-run-and-audit description: Rigor Run skill for README-first deep learning repo reproduction. Use when the task is specifically to capture or normalize evidence from the selected smoke test or documented inference or evaluation command and write standardized `repro_outputs/` files, including patch notes when repository files changed. Do not use for training execution, initial repo intake, generic environment setup, paper lookup, target selection, hidden scientific-meaning changes, or end-to-end orchestration by itself.

minimal-run-and-audit

Use this as the Rigor Run skill. The installed slug remains minimal-run-and-audit for compatibility.

Use the shared operating principles in ../../references/agent-operating-principles.md; this skill should make run evidence auditable without turning every command into a rigid protocol.

When to apply

After a reproduction target and setup plan exist.
When the main skill needs execution evidence and normalized outputs.
When a smoke test, documented inference run, documented evaluation run, or other short non-training verification is appropriate.
When the user already knows what command should be attempted and wants execution plus reporting only.

When not to apply

During initial repo scanning.
When environment or assets are still undefined enough to make execution meaningless.
When the task is a literature lookup rather than repository execution.
When the user is still deciding which reproduction target should count as the main run.

Clear boundaries

This skill owns normalized reporting for an attempted command.
It may receive execution evidence from the main skill or a thin helper.
It does not choose the overall target on its own.
It does not perform broad paper analysis.
It does not own training startup, resume, or long-running training state.
It should not normalize risky code edits into acceptable practice.
It must not hide changes that alter evaluation, preprocessing, checkpoints, metrics, or other scientific meaning.

Input expectations

selected reproduction goal
runnable commands or smoke commands
environment and asset assumptions
optional patch metadata

Output expectations

execution result summary
standardized repro_outputs/ files
SCIENTIFIC_CHANGELOG.md for changed scientific meaning and evidence status
COMPARABILITY_REPORT.md for README/paper/baseline comparability
clear distinction between verified, partial, and blocked states
PATCHES.md when repo files changed

Notes

Use references/reporting-policy.md, ../../references/research-rigor-principles.md, scripts/run_command.py, and scripts/write_outputs.py.

Related skills

More from lllllllama/rigorpilot-skills and the wider catalog.

analyze-project

lllllllama/rigorpilot-skills

Read-only analysis of deep learning repositories to understand structure, configs, and suspicious patterns.

84k installsAudited

ai-research-explore

lllllllama/rigorpilot-skills

Auditable deep learning research exploration with idea gating, fair comparison, and governed experiments.

84k installs

explore-code

lllllllama/rigorpilot-skills

Auditable exploratory code modifications for deep learning research on isolated branches with rollback tracking.

84k installs

ai-research-reproduction

lllllllama/rigorpilot-skills

README-first deep learning repository reproduction with auditable evidence and standardized outputs.

84k installsAudited

paper-context-resolver

lllllllama/rigorpilot-skills

Resolve reproduction-critical paper details when README and repo files leave gaps.

84k installs

safe-debug

lllllllama/rigorpilot-skills

Conservative diagnosis and minimal patching for deep learning training failures without automatic code mutation.

84k installsAudited