AI Skills
AI skills package repeatable agent behavior into small, discoverable folders. A good skill does not just describe a task; it defines when to activate, what context to load, what artifacts to create, and how to verify the result.
Minimal Skill Shape
my-skill/
SKILL.md
references/
decisions.md
scripts/
validate.sh
examples/
expected-output.md
The common pattern is SKILL.md as the loading surface, with extra files pulled in only when needed. This keeps the first context load small while preserving detailed guidance for complex work.
What Goes In SKILL.md
---
name: issue-review
description: Investigate a GitHub issue and produce an implementation-ready analysis.
---
# Issue Review
Use this skill when the user asks to investigate an issue, triage a bug, or decide whether a repo problem is worth fixing.
## Workflow
1. Fetch the issue and linked pull requests.
2. Inspect the relevant code path.
3. Reproduce or explain the failure mode.
4. Write findings, risks, and next steps.
## Verification
- Link every claim to source evidence.
- Mark unknowns as follow-up instead of guessing.
The description is operational metadata. It should name trigger conditions, not just summarize the topic.
Gotcha: A vague description makes the skill hard to route.
Helps with planningis weaker thanConvert proposal.md into executable phase folders with specs, prompts, and validation gates.
Folder Anatomy
| Part | Use it for |
|---|---|
SKILL.md | Activation rules, workflow, artifact contract, verification rules. |
references/ | Stable background docs that are too detailed for the main prompt. |
scripts/ | Deterministic checks, generation, conversion, browser actions, or validation. |
examples/ | Expected inputs, outputs, fixtures, and review cases. |
assets/ | Images, templates, or files the skill needs to copy or transform. |
Keep the main skill readable. Move rarely used detail into references and load it explicitly from the workflow.
Helper Scripts
Use scripts when prose is likely to drift from reality.
./scripts/check-skill.sh my-skill
A small script can verify structure, required sections, links, generated files, or expected outputs. This is more reliable than asking every future agent to remember the same checklist.
Good script targets:
| Target | Example check |
|---|---|
| Structure | SKILL.md exists and has required metadata. |
| Links | Referenced files exist. |
| Artifacts | The workflow created the promised output files. |
| Regression | A fixture input still produces the expected shape. |
| Safety | The skill refuses destructive commands unless explicitly approved. |
Evals And Quality Gates
Skills become safer when they have examples that can fail.
scenario: proposal_to_plan
input: examples/proposal.md
expected:
- creates: initial-plan.md
- includes: verification steps
- rejects: direct implementation from proposal.md
You do not need a large eval harness on day one. Start with one scenario per risky behavior, then add regression cases when a failure repeats.
Tip: Evaluate the real workflow path, not a shortcut. If the production skill reads files, invokes subagents, and writes artifacts, the eval should exercise those same seams.
Composing Skills Into Workflows
Skills compose best when each handoff has a concrete artifact.
flowchart LR Brainstorm[brainstorm.md] --> Proposal[proposal.md] Proposal --> Plan[initial-plan.md] Plan --> Breakdown[phase folders] Breakdown --> Implementation[commits] Implementation --> Review[review findings] Review --> Fixup[fixup commits]
What to make explicit at every handoff:
| Handoff | Contract |
|---|---|
| Brainstorm to proposal | User decisions are captured without inventing settled requirements. |
| Proposal to plan | The plan preserves goals, non-goals, and unresolved assumptions. |
| Plan to breakdown | Every phase has scope, acceptance checks, and rollback-safe instructions. |
| Breakdown to implementation | Agents know what to edit, what not to edit, and how to verify. |
| Implementation to review | Reviewers see the intended behavior, not just the diff. |
Upgrading An Existing Skill
Upgrade from evidence, not vibes.
1. Identify the failure mode.
2. Find source patterns that solve the same class of problem.
3. Write a proposal before changing the skill.
4. Add the smallest workflow or artifact change that fixes the failure.
5. Add a structural check or scenario so the failure stays fixed.
Use this sequence when improving a local skill:
| Step | Question |
|---|---|
| Audit | Where does the skill currently fail or create ambiguity? |
| Evidence | Which source pattern is relevant: helper scripts, generated outputs, evals, role workflow, or curation? |
| Proposal | What should change, and what is explicitly out of scope? |
| Implementation | What is the smallest edit that improves reliability? |
| Verification | How will a future agent detect regression? |
Gotcha: Do not copy a large external skill system wholesale. Most local reliability gains come from sharper activation rules, clearer artifact contracts, and one or two targeted checks.
Source Evidence
The durable research lives under skills research. Start there when you need source-backed details: