AI Skills

AI skills package repeatable agent behavior into small, discoverable folders. A good skill does not just describe a task; it defines when to activate, what context to load, what artifacts to create, and how to verify the result.

Minimal Skill Shape

my-skill/
  SKILL.md
  references/
    decisions.md
  scripts/
    validate.sh
  examples/
    expected-output.md

The common pattern is SKILL.md as the loading surface, with extra files pulled in only when needed. This keeps the first context load small while preserving detailed guidance for complex work.

What Goes In SKILL.md

---
name: issue-review
description: Investigate a GitHub issue and produce an implementation-ready analysis.
---

# Issue Review

Use this skill when the user asks to investigate an issue, triage a bug, or decide whether a repo problem is worth fixing.

## Workflow

1. Fetch the issue and linked pull requests.
2. Inspect the relevant code path.
3. Reproduce or explain the failure mode.
4. Write findings, risks, and next steps.

## Verification

- Link every claim to source evidence.
- Mark unknowns as follow-up instead of guessing.

The description is operational metadata. It should name trigger conditions, not just summarize the topic.

Gotcha: A vague description makes the skill hard to route. Helps with planning is weaker than Convert proposal.md into executable phase folders with specs, prompts, and validation gates.

Folder Anatomy

PartUse it for
SKILL.mdActivation rules, workflow, artifact contract, verification rules.
references/Stable background docs that are too detailed for the main prompt.
scripts/Deterministic checks, generation, conversion, browser actions, or validation.
examples/Expected inputs, outputs, fixtures, and review cases.
assets/Images, templates, or files the skill needs to copy or transform.

Keep the main skill readable. Move rarely used detail into references and load it explicitly from the workflow.

Helper Scripts

Use scripts when prose is likely to drift from reality.

./scripts/check-skill.sh my-skill

A small script can verify structure, required sections, links, generated files, or expected outputs. This is more reliable than asking every future agent to remember the same checklist.

Good script targets:

TargetExample check
StructureSKILL.md exists and has required metadata.
LinksReferenced files exist.
ArtifactsThe workflow created the promised output files.
RegressionA fixture input still produces the expected shape.
SafetyThe skill refuses destructive commands unless explicitly approved.

Evals And Quality Gates

Skills become safer when they have examples that can fail.

scenario: proposal_to_plan
input: examples/proposal.md
expected:
  - creates: initial-plan.md
  - includes: verification steps
  - rejects: direct implementation from proposal.md

You do not need a large eval harness on day one. Start with one scenario per risky behavior, then add regression cases when a failure repeats.

Tip: Evaluate the real workflow path, not a shortcut. If the production skill reads files, invokes subagents, and writes artifacts, the eval should exercise those same seams.

Composing Skills Into Workflows

Skills compose best when each handoff has a concrete artifact.

flowchart LR
  Brainstorm[brainstorm.md] --> Proposal[proposal.md]
  Proposal --> Plan[initial-plan.md]
  Plan --> Breakdown[phase folders]
  Breakdown --> Implementation[commits]
  Implementation --> Review[review findings]
  Review --> Fixup[fixup commits]

What to make explicit at every handoff:

HandoffContract
Brainstorm to proposalUser decisions are captured without inventing settled requirements.
Proposal to planThe plan preserves goals, non-goals, and unresolved assumptions.
Plan to breakdownEvery phase has scope, acceptance checks, and rollback-safe instructions.
Breakdown to implementationAgents know what to edit, what not to edit, and how to verify.
Implementation to reviewReviewers see the intended behavior, not just the diff.

Upgrading An Existing Skill

Upgrade from evidence, not vibes.

1. Identify the failure mode.
2. Find source patterns that solve the same class of problem.
3. Write a proposal before changing the skill.
4. Add the smallest workflow or artifact change that fixes the failure.
5. Add a structural check or scenario so the failure stays fixed.

Use this sequence when improving a local skill:

StepQuestion
AuditWhere does the skill currently fail or create ambiguity?
EvidenceWhich source pattern is relevant: helper scripts, generated outputs, evals, role workflow, or curation?
ProposalWhat should change, and what is explicitly out of scope?
ImplementationWhat is the smallest edit that improves reliability?
VerificationHow will a future agent detect regression?

Gotcha: Do not copy a large external skill system wholesale. Most local reliability gains come from sharper activation rules, clearer artifact contracts, and one or two targeted checks.

Source Evidence

The durable research lives under skills research. Start there when you need source-backed details: