Tree-sitter Internals
How NeoVim’s tree-sitter integration works: the LanguageTree abstraction, query system, language injection, incremental parsing, and the highlighter.
LanguageTree
The central abstraction is LanguageTree (runtime/lua/vim/treesitter/languagetree.lua). It wraps tree-sitter’s parser and manages parse trees, including nested languages (like JavaScript inside HTML).
-- Get the LanguageTree for a buffer
local lt = vim.treesitter.get_parser(0, "lua")
-- Parse and get the tree
local trees = lt:parse() -- Returns list of trees (one per region)
local root = trees[1]:root()
-- The tree is immutable. After buffer changes, re-parse:
lt:parse() -- Incremental - only re-parses changed regions
Key LanguageTree concepts:
| Concept | What it is |
|---|---|
| Regions | Portions of the buffer this parser handles |
| Children | Nested LanguageTrees for injected languages |
| Valid | Whether the tree needs re-parsing |
| Callbacks | Hooks for parse events (bytes_changed, child_added, etc.) |
Incremental Parsing
Tree-sitter parsers are incremental. When you type a character, NeoVim doesn’t re-parse the entire file. Instead:
1. Buffer change notification (bytes_changed callback)
2. LanguageTree marks affected regions as invalid
3. On next parse(), only invalid regions are re-parsed
4. Tree-sitter's internal tree diff produces a minimal edit
5. Highlighting updates only the changed screen lines
This is why tree-sitter highlighting stays fast even in large files. A keystroke in a 10,000-line file only re-parses the affected function, not the whole file.
-- You can observe this with callbacks
local parser = vim.treesitter.get_parser(0, "lua")
parser:register_cbs({
on_bytes = function(_, _, start_row, start_col, _, old_end_row, old_end_col, _, new_end_row, new_end_col, _)
print(string.format("Changed at %d:%d", start_row, start_col))
end,
on_changedtree = function(ranges)
print("Re-parsed " .. #ranges .. " regions")
end,
})
Language Injection
A key feature: one buffer can contain multiple languages. HTML files have CSS and JavaScript. Markdown has code blocks. Tree-sitter handles this through injection queries.
;; runtime/queries/html/injections.scm (simplified)
;; Inject CSS into <style> tags
(style_element
(raw_text) @injection.content
(#set! injection.language "css"))
;; Inject JavaScript into <script> tags
(script_element
(raw_text) @injection.content
(#set! injection.language "javascript"))
NeoVim’s implementation:
LanguageTree (HTML)
|
+-- LanguageTree (CSS) -- for <style> regions
|
+-- LanguageTree (JavaScript) -- for <script> regions
Each child LanguageTree only parses its regions of the buffer. The parent coordinates which regions belong to which child.
-- Inspect injected languages
local parser = vim.treesitter.get_parser(0)
print("Root language:", parser:lang())
for lang, child in pairs(parser:children()) do
print("Injected:", lang)
for _, region in ipairs(child:included_regions()) do
print(" Region:", vim.inspect(region))
end
end
The Query System
Queries are S-expressions that pattern-match against tree-sitter nodes. The query system (runtime/lua/vim/treesitter/query.lua) compiles, caches, and executes these patterns.
Query Structure
;; A query pattern
(function_declaration ;; Match this node type
name: (identifier) @func_name ;; Capture the name child as @func_name
parameters: (parameters ;; Navigate into parameters
(identifier) @param)) ;; Capture each parameter
;; Predicates filter matches
(string_content) @string.special
(#match? @string.special "^%w+$") ;; Only match alphanumeric strings
;; Directives modify captures
(comment) @comment
(#set! priority 200) ;; Highlighting priority
Built-in Predicates
| Predicate | Purpose |
|---|---|
#eq? | Exact string match |
#match? | Lua pattern match |
#any-of? | Match against a set of strings |
#has-type? | Check node type |
#not-eq? | Negated string match |
#contains? | Substring check |
#lua-match? | Lua pattern (more powerful than #match?) |
Query Execution in Lua
local query = vim.treesitter.query.parse("lua", [[
(function_declaration
name: (identifier) @name
body: (block) @body)
]])
local parser = vim.treesitter.get_parser(0, "lua")
local tree = parser:parse()[1]
-- iter_captures: iterate individual captures
for id, node, metadata in query:iter_captures(tree:root(), 0) do
local capture_name = query.captures[id]
local row1, col1, row2, col2 = node:range()
print(capture_name, row1, col1, vim.treesitter.get_node_text(node, 0))
end
-- iter_matches: iterate full pattern matches
for pattern, match, metadata in query:iter_matches(tree:root(), 0) do
for id, nodes in pairs(match) do
for _, node in ipairs(nodes) do
print(query.captures[id], vim.treesitter.get_node_text(node, 0))
end
end
end
The Highlighter
The highlighter (runtime/lua/vim/treesitter/highlighter.lua) connects tree-sitter queries to NeoVim’s display engine.
How It Works
1. vim.treesitter.start() creates a highlighter for the buffer
2. Highlighter registers a decoration provider with NeoVim
3. When NeoVim redraws a line range, it calls the provider
4. Provider runs highlights.scm query over visible lines only
5. Each @capture maps to a highlight group (e.g., @function -> @function)
6. Extmarks are placed for each captured range
The key optimization: queries only run on visible lines. Scrolling to a new region triggers a query, but off-screen regions are not processed.
Highlight Groups
Tree-sitter captures map to highlight groups with the @ prefix:
@variable - Variables
@function - Function definitions
@function.call - Function calls
@keyword - Keywords (if, for, return)
@string - String literals
@comment - Comments
@type - Type names
@operator - Operators (+, -, =)
@punctuation.bracket - Brackets
@punctuation.delimiter - Commas, semicolons
These groups link to standard highlight groups. Your colorscheme defines what @function looks like:
-- Check what a highlight resolves to
:Inspect -- With cursor on a token, shows the highlight chain
Priority
When multiple highlights overlap, priority determines which wins:
;; Higher priority wins
(comment) @comment (#set! priority 200)
Default priorities:
- Syntax highlighting: 100
- Semantic tokens (from LSP): 125
- User-defined: 200
Tree-sitter vs Regex Highlighting
| Aspect | Regex (:syntax) | Tree-sitter |
|---|---|---|
| Accuracy | Heuristic, often wrong | Structural, parse-correct |
| Speed (small files) | Slightly faster | Slightly slower initial parse |
| Speed (large files) | Can be slow (regex backtracking) | Incremental, stays fast |
| Nested languages | Fragile, breaks often | First-class via injection |
| Extensibility | Vim syntax files (cryptic) | S-expression queries (readable) |
| Error recovery | None (falls apart on syntax errors) | Built-in (tree-sitter recovers gracefully) |
Tip: You can run both simultaneously if needed. Tree-sitter highlighting takes priority where it applies. Regex highlighting fills in gaps for languages without tree-sitter parsers.
Folding Internals
Tree-sitter folding (runtime/lua/vim/treesitter/_fold.lua) computes fold levels from the parse tree:
-- The fold expression
vim.o.foldexpr = "v:lua.vim.treesitter.foldexpr()"
-- Internally, this:
-- 1. Gets the node at each line
-- 2. Walks up to find "foldable" ancestors (defined by folds.scm)
-- 3. Returns the fold level based on nesting depth
Folds.scm defines what’s foldable:
;; runtime/queries/lua/folds.scm
[
(function_declaration)
(if_statement)
(for_statement)
(while_statement)
(table_constructor)
] @fold
Performance Considerations
Tree-sitter is fast, but there are limits:
- Parser loading: First parse of a large file takes time. Subsequent parses are incremental and fast.
- Injection overhead: Each injected language is a separate parser. A Markdown file with 50 code blocks spawns 50 child parsers.
- Query complexity: Complex queries with many predicates are slower. Highlight queries are optimized to run per-line.
- Memory: Each parsed tree lives in memory. Very large files (100K+ lines) use significant memory.
Gotcha: If NeoVim feels slow in a specific file, check
:InspectTreeto see if there are excessive injected languages or an unusually deep parse tree.