How to install m10-performance
npx skills add https://github.com/zhanghandong/rust-skills --skill m10-performanceFull instructions (SKILL.md)
Source of truth, from zhanghandong/rust-skills.
name: m10-performance description: "CRITICAL: Use for performance optimization. Triggers: performance, optimization, benchmark, profiling, flamegraph, criterion, slow, fast, allocation, cache, SIMD, make it faster, 性能优化, 基准测试" user-invocable: false
Performance Optimization
Layer 2: Design Choices
Core Question
What's the bottleneck, and is optimization worth it?
Before optimizing:
- Have you measured? (Don't guess)
- What's the acceptable performance?
- Will optimization add complexity?
Performance Decision → Implementation
| Goal | Design Choice | Implementation |
|---|---|---|
| Reduce allocations | Pre-allocate, reuse | with_capacity, object pools |
| Improve cache | Contiguous data | Vec, SmallVec |
| Parallelize | Data parallelism | rayon, threads |
| Avoid copies | Zero-copy | References, Cow<T> |
| Reduce indirection | Inline data | smallvec, arrays |
Thinking Prompt
Before optimizing:
-
Have you measured?
- Profile first → flamegraph, perf
- Benchmark → criterion, cargo bench
- Identify actual hotspots
-
What's the priority?
- Algorithm (10x-1000x improvement)
- Data structure (2x-10x)
- Allocation (2x-5x)
- Cache (1.5x-3x)
-
What's the trade-off?
- Complexity vs speed
- Memory vs CPU
- Latency vs throughput
Trace Up ↑
To domain constraints (Layer 3):
"How fast does this need to be?"
↑ Ask: What's the performance SLA?
↑ Check: domain-* (latency requirements)
↑ Check: Business requirements (acceptable response time)
| Question | Trace To | Ask |
|---|---|---|
| Latency requirements | domain-* | What's acceptable response time? |
| Throughput needs | domain-* | How many requests per second? |
| Memory constraints | domain-* | What's the memory budget? |
Trace Down ↓
To implementation (Layer 1):
"Need to reduce allocations"
↓ m01-ownership: Use references, avoid clone
↓ m02-resource: Pre-allocate with_capacity
"Need to parallelize"
↓ m07-concurrency: Choose rayon or threads
↓ m07-concurrency: Consider async for I/O-bound
"Need cache efficiency"
↓ Data layout: Prefer Vec over HashMap when possible
↓ Access patterns: Sequential over random access
Quick Reference
| Tool | Purpose |
|---|---|
cargo bench | Micro-benchmarks |
criterion | Statistical benchmarks |
perf / flamegraph | CPU profiling |
heaptrack | Allocation tracking |
valgrind / cachegrind | Cache analysis |
Optimization Priority
1. Algorithm choice (10x - 1000x)
2. Data structure (2x - 10x)
3. Allocation reduction (2x - 5x)
4. Cache optimization (1.5x - 3x)
5. SIMD/Parallelism (2x - 8x)
Common Techniques
| Technique | When | How |
|---|---|---|
| Pre-allocation | Known size | Vec::with_capacity(n) |
| Avoid cloning | Hot paths | Use references or Cow<T> |
| Batch operations | Many small ops | Collect then process |
| SmallVec | Usually small | smallvec::SmallVec<[T; N]> |
| Inline buffers | Fixed-size data | Arrays over Vec |
Common Mistakes
| Mistake | Why Wrong | Better |
|---|---|---|
| Optimize without profiling | Wrong target | Profile first |
| Benchmark in debug mode | Meaningless | Always --release |
| Use LinkedList | Cache unfriendly | Vec or VecDeque |
Hidden .clone() | Unnecessary allocs | Use references |
| Premature optimization | Wasted effort | Make it work first |
Anti-Patterns
| Anti-Pattern | Why Bad | Better |
|---|---|---|
| Clone to avoid lifetimes | Performance cost | Proper ownership |
| Box everything | Indirection cost | Stack when possible |
| HashMap for small sets | Overhead | Vec with linear search |
| String concat in loop | O(n^2) | String::with_capacity or format! |
Related Skills
| When | See |
|---|---|
| Reducing clones | m01-ownership |
| Concurrency options | m07-concurrency |
| Smart pointer choice | m02-resource |
| Domain requirements | domain-* |
Related skills
More from zhanghandong/rust-skills and the wider catalog.
m15-anti-pattern
Use when reviewing code for anti-patterns. Keywords: anti-pattern, common mistake, pitfall, code smell, bad practice, code review, is this an anti-pattern, better way to do this, common mistake to avoid, why is this bad, idiomatic way, beginner mistake, fighting borrow checker, clone everywhere, unwrap in production, should I refactor, 反模式, 常见错误, 代码异味, 最佳实践, 地道写法
coding-guidelines
Use when asking about Rust code style or best practices. Keywords: naming, formatting, comment, clippy, rustfmt, lint, code style, best practice, P.NAM, G.FMT, code review, naming convention, variable naming, function naming, type naming, 命名规范, 代码风格, 格式化, 最佳实践, 代码审查, 怎么命名
m07-concurrency
CRITICAL: Use for concurrency/async. Triggers: E0277 Send Sync, cannot be sent between threads, thread, spawn, channel, mpsc, Mutex, RwLock, Atomic, async, await, Future, tokio, deadlock, race condition, 并发, 线程, 异步, 死锁
m06-error-handling
CRITICAL: Use for error handling. Triggers: Result, Option, Error, ?, unwrap, expect, panic, anyhow, thiserror, when to panic vs return Result, custom error, error propagation, 错误处理, Result 用法, 什么时候用 panic
rust-refactor-helper
Safe Rust refactoring with LSP analysis. Triggers on: /refactor, rename symbol, move function, extract, 重构, 重命名, 提取函数, 安全重构
m01-ownership
CRITICAL: Use for ownership/borrow/lifetime issues. Triggers: E0382, E0597, E0506, E0507, E0515, E0716, E0106, value moved, borrowed value does not live long enough, cannot move out of, use of moved value, ownership, borrow, lifetime, 'a, 'static, move, clone, Copy, 所有权, 借用, 生命周期