Дерево (структура данных)

В информатике , А дерево широко используется абстрактный тип данных , который имитирует иерархическую древовидную структуру , со значением корня и поддеревами детей с родительским узлом , представленное в виде набора связанных узлов .

Общая, и поэтому небинарная, несортированная, с дублированием некоторых меток, произвольная диаграмма дерева. На этой диаграмме узел с меткой 7 имеет трех дочерних узлов с меткой 2, 10 и 6 и одного родителя с меткой 2. Корневой узел наверху не имеет родителя.

Структура данных дерева может быть определена рекурсивно как набор узлов (начиная с корневого узла), где каждый узел представляет собой структуру данных, состоящую из значения вместе со списком ссылок на узлы («дочерние элементы») с ограничения, что никакая ссылка не дублируется и ни одна не указывает на корень.

В качестве альтернативы дерево может быть определено абстрактно в целом (глобально) как упорядоченное дерево со значением, присвоенным каждому узлу. Обе эти точки зрения полезны: в то время как дерево может быть проанализировано математически в целом, когда оно фактически представлено как структура данных, оно обычно представляется и обрабатывается отдельно по узлам (а не как набор узлов и список смежности ребер между узлами). , например, как орграф ). Например, рассматривая дерево в целом, можно говорить о «родительском узле» данного узла, но в целом, как структура данных, данный узел содержит только список своих дочерних узлов, но не содержит ссылки. своему родителю (если есть).

Общее использование

Представление иерархических данных, таких как:
- Абстрактные синтаксические деревья для компьютерных языков
- Разбирать деревья для человеческих языков
- Объект Document Модели из XML и HTML документов
- Документы JSON и YAML обрабатываются
В деревьях поиска данные хранятся таким образом, что возможен эффективный алгоритм поиска с помощью обхода дерева.
- Бинарное дерево поиска представляет собой тип бинарного дерева
Представление отсортированных списков данных
Как рабочий процесс для компоновки цифровых изображений для визуальных эффектов ^{[ необходима ссылка ]}
Хранение деревьев Barnes-Hut, используемых для моделирования галактик

Терминология

Узел представляет собой структуру , которая может содержать значение или состояние, или представлять собой отдельную структуру данных (который может быть деревом самостоятельно). Каждый узел в дереве имеет ноль или более дочерних узлов , которые находятся под ним в дереве (по соглашению деревья рисуются растущими вниз). Узел, у которого есть дочерний узел, называется родительским узлом дочернего (или вышестоящим ). Узел имеет не более одного родителя, но возможно много узлов-предков , таких как родительский родительский узел . Дочерние узлы с одним и тем же родителем являются узлами-братьями .

Внутренний узел (также известный как внутренний узел , индексный дескриптор для короткого или узла ветви ) является любым узлом дерева , который имеет дочерние узлы. Точно так же внешний узел (также известный как внешний узел , листовой узел или конечный узел ) - это любой узел, не имеющий дочерних узлов.

Самый верхний узел в дереве называется корневым узлом . В зависимости от определения может потребоваться, чтобы у дерева был корневой узел (в этом случае все деревья непустые), или ему может быть разрешено быть пустым, и в этом случае оно не обязательно имеет корневой узел. Будучи самым верхним узлом, корневой узел не будет иметь родителя. Это узел, с которого начинаются алгоритмы в дереве, поскольку в качестве структуры данных можно переходить только от родителей к потомкам. Обратите внимание, что некоторые алгоритмы (например, поиск в глубину после упорядочения) начинаются с корня, но сначала посещают конечные узлы (получают доступ к значению конечных узлов), посещают только корень последним (т. Е. Сначала обращаются к дочерним элементам корня). , но доступ только к значению корня последним). Все остальные узлы могут быть доступны из него по ребрам или ссылкам . (В формальном определении каждый такой путь также уникален.) На диаграммах корневой узел обычно рисуется вверху. В некоторых деревьях, например в кучах , корневой узел имеет особые свойства. Каждый узел в дереве можно рассматривать как корневой узел поддерева, укорененного в этом узле.

Высота узла длина самого длинного пути вниз к листу из этого узла. Высота корня - это высота дерева. Глубина узла является длиной пути к корню (т.е. его корень путь ). Это обычно требуется при манипулировании различными самобалансирующимися деревьями, в частности деревьями AVL . Корневой узел имеет нулевую глубину, листовые узлы имеют нулевую высоту, а дерево только с одним узлом (следовательно, и корнем, и листом) имеет нулевую глубину и высоту. Обычно пустое дерево (дерево без узлов, если таковые разрешены) имеет высоту -1.

Поддерево дерева $T$ представляет собой дерево , состоящее из узла $Т$ и всех его потомков в $T$ . ^[a]^[1] Таким образом, узлы соответствуют поддеревьям (каждый узел соответствует своему поддереву и всем его потомкам) - поддерево, соответствующее корневому узлу, является всем деревом, а каждый узел является корневым узлом поддерева, которое он определяет. ; поддерево, соответствующее любому другому узлу, называется собственным поддеревом (по аналогии с собственным подмножеством ).

Другие термины, используемые с деревьями:

Сосед: Родитель или ребенок.
Предок: Узел, достижимый повторным переходом от дочернего к родительскому.
Потомок: Узел, достижимый повторным переходом от родителя к потомку. Также известен как дочерний ребенок .
Узел ответвления
Внутренний узел: Узел по крайней мере с одним дочерним элементом.
Степень: Для данного узла его количество дочерних элементов. Лист обязательно имеет нулевую степень.
Степень дерева: Степень дерева - это максимальная степень узла в дереве.
Расстояние: Количество ребер на кратчайшем пути между двумя узлами.
Уровень: Уровень узла - это количество ребер на уникальном пути между ним и корневым узлом. ^[2]
Ширина: Количество узлов на уровне.
Ширина: Количество листьев.
лес: Набор из n ≥ 0 непересекающихся деревьев.
Заказанное дерево: Корневое дерево, в котором для дочерних элементов каждой вершины указан порядок.
Размер дерева: Количество узлов в дереве.

Предварительное определение

Не дерево : ненаправленный цикл 1-2-4-3. 4 имеет более одного родителя (входящий край).

Не дерево : цикл B → C → E → D → B. B имеет более одного родителя (входящий край).

Не дерево : цикл A → A. A - это корень, но у него также есть родитель.

Каждый линейный список тривиально представляет собой дерево

Дерево - это нелинейная структура данных по сравнению с массивами, связанными списками, стеками и очередями, которые являются линейными структурами данных. Дерево может быть пустым без узлов или дерево представляет собой структуру, состоящую из одного узла, называемого корнем, и нуля или одного или нескольких поддеревьев.

Рисование деревьев

Деревья часто рисуют в плоскости. Упорядоченные деревья могут быть представлены по существу уникальным образом на плоскости и поэтому называются плоскими деревьями следующим образом: если зафиксировать обычный порядок (скажем, против часовой стрелки) и расположить дочерние узлы в этом порядке (сначала входящее родительское ребро, затем первое дочернее ребро и т. д.), это дает вложение дерева в плоскость, единственное с точностью до объемлющей изотопии . И наоборот, такое вложение определяет порядок дочерних узлов.

Если поместить корень наверху (родители выше детей, как в семейном дереве ) и разместить все узлы, которые находятся на заданном расстоянии от корня (с точки зрения количества ребер: «уровень» дерева) на заданном горизонтальной линией получается стандартный рисунок дерева. Для двоичного дерева первый дочерний элемент находится слева («левый узел»), а второй дочерний элемент - справа («правый узел»).

Общие операции

Перечисление всех предметов
Перечисление раздела дерева
Поиск предмета
Добавление нового элемента в определенное место в дереве
Удаление предмета
Обрезка : удаление целого участка дерева.
Прививка : добавление целого участка к дереву
Нахождение корня для любого узла
Нахождение наименьшего общего предка двух узлов

Методы обхода и поиска

Переход по элементам дерева посредством связей между родителями и детьми называется ходьбой по дереву , а действие - обходом дерева. Часто операция может выполняться, когда указатель достигает определенного узла. Обход, при котором каждый родительский узел проходит до его дочерних узлов, называется обходом по предварительному заказу ; прогулка, при которой дети проходят путь до того, как пройдут их родители, называется прогулкой после заказа ; обход левого поддерева узла, затем самого узла и, наконец, его правого поддерева, называется обходом по порядку . (Этот последний сценарий, относящийся ровно к двум поддеревьям, левому поддереву и правому поддереву, предполагает, в частности, двоичное дерево .) Обход по уровням эффективно выполняет поиск в ширину по всему дереву; узлы проходят уровень за уровнем, где сначала посещается корневой узел, затем следуют его прямые дочерние узлы и их братья и сестры, затем идут его внучатые узлы и их братья и сестры, и т. д., пока не будут пройдены все узлы в дереве.

Представления

Есть много разных способов представить деревья; общие представления представляют узлы как динамически выделяемые записи с указателями на их дочерние элементы, их родители или и то, и другое, или как элементы в массиве , отношения между которыми определяются их позициями в массиве (например, двоичная куча ).

Действительно, двоичное дерево может быть реализовано как список списков (список, в котором значения являются списками): заголовок списка (значение первого члена) - это левый дочерний элемент (поддерево), а хвост (список второго и последующих членов) является правым потомком (поддеревом). Это также можно изменить, чтобы разрешить значения, как в S-выражениях Лиспа , где заголовок (значение первого члена) - это значение узла, заголовок хвоста (значение второго члена) - это левый дочерний элемент, а хвостик хвоста (список третьего и последующих терминов) - правый ребенок.

Как правило, узел в дереве не имеет указателей на своих родителей, но эта информация может быть включена (расширена структура данных, чтобы также включить указатель на родительский элемент) или сохранена отдельно. В качестве альтернативы восходящие ссылки могут быть включены в данные дочернего узла, как в многопоточном двоичном дереве .

Обобщения

Диграфы

Если ребра (к дочерним узлам) рассматриваются как ссылки, тогда дерево является частным случаем орграфа, и древовидная структура данных может быть обобщена для представления ориентированных графов , удалив ограничения, согласно которым узел может иметь не более одного родителя, и что никакие циклы не допускаются. Ребра по-прежнему абстрактно рассматриваются как пары узлов, однако термины $родительский$ и $дочерний$ обычно заменяются другой терминологией (например, $источник$ и $цель$ ). Существуют разные стратегии реализации : орграф может быть представлен той же локальной структурой данных, что и дерево (узел со значением и список дочерних элементов), при условии, что «список дочерних элементов» является списком ссылок, или глобально такими структурами, как списки смежности. .

В теории графов , А дерево является связным ациклическим графом ; если не указано иное, в теории графов деревья и графы считаются неориентированными. Между такими деревьями и деревьями, как структура данных, нет однозначного соответствия. Мы можем взять произвольное неориентированное дерево, произвольно выбрать одну из его вершин в качестве $корня$ , сделать все его ребра направленными, направив их в сторону от корневого узла, создавая древообразование, и назначить порядок всем узлам. Результат соответствует древовидной структуре данных. Выбор другого корня или другой порядок приводит к другому.

Для данного узла в дереве его дочерние элементы определяют упорядоченный лес (объединение поддеревьев, заданных всеми дочерними элементами, или, что эквивалентно, взятие поддерева, заданное самим узлом, и стирание корня). Подобно тому, как поддеревья естественны для рекурсии (как при поиске в глубину), леса естественны для коркурсии (как при поиске в ширину).

Посредством взаимной рекурсии лес можно определить как список деревьев (представленных корневыми узлами), где узел (дерева) состоит из значения и леса (его дочерние элементы):

f: [n [1], ..., n [k]]n: vf

Тип данных по сравнению со структурой данных

Существует различие между деревом как абстрактным типом данных и конкретной структурой данных, аналогичное различию между списком и связанным списком .

Как тип данных, дерево имеет значение и дочерние элементы, а дочерние элементы сами являются деревьями; значение и дочерние элементы дерева интерпретируются как значение корневого узла и поддеревьев дочерних элементов корневого узла. Чтобы разрешить конечные деревья, нужно либо разрешить список дочерних элементов быть пустым (в этом случае может потребоваться, чтобы деревья были непустыми, а «пустое дерево» вместо этого было представлено лесом из нулевых деревьев), либо разрешить деревьям быть быть пустым, и в этом случае список дочерних элементов может иметь фиксированный размер ( коэффициент ветвления , особенно 2 или «двоичный»), если желательно.

В качестве структуры данных связанное дерево представляет собой группу узлов , где каждый узел имеет значение и список ссылок на другие узлы (его дочерние элементы). Также существует требование, чтобы никакие две «нисходящие» ссылки не указывали на один и тот же узел. На практике узлы в дереве обычно включают в себя и другие данные, такие как следующие / предыдущие ссылки, ссылки на их родительские узлы или почти что угодно.

Из-за использования $ссылок$ на деревья в структуре данных связанного дерева деревья часто неявно обсуждаются, предполагая, что они представлены ссылками на корневой узел, поскольку часто именно так они и реализуются. Например, вместо пустого дерева может быть пустая ссылка: дерево всегда непусто, но ссылка на дерево может быть пустой.

Рекурсивный

Рекурсивно, как тип данных, дерево определяется как значение (некоторого типа данных, возможно, пустое) вместе со списком деревьев (возможно, пустой список), поддеревья его дочерних элементов; символически:

t: v [t [1], ..., t [k]]

(Дерево $t$ состоит из значения $v$ и списка других деревьев.)

Более элегантно, с помощью взаимной рекурсии , одним из основных примеров которой является дерево, дерево может быть определено в терминах леса (списка деревьев), где дерево состоит из значения и леса (поддеревья его дети):

f: [t [1], ..., t [k]]

т: vf

Обратите внимание, что это определение дано в терминах значений и подходит для функциональных языков (оно предполагает ссылочную прозрачность ); разные деревья не имеют связей, поскольку представляют собой просто списки значений.

В качестве структуры данных дерево определяется как узел (корень), который сам состоит из значения (некоторого типа данных, возможно, пустого) вместе со списком ссылок на другие узлы (список, возможно, пустой, ссылки, возможно, нулевые) ); символически:

n: v [& n [1], ..., & n [k]]

(Узел $n$ состоит из значения $v$ и списка ссылок на другие узлы.)

Эта структура данных определяет ориентированный граф ^[b], и для того, чтобы он был деревом, необходимо добавить условие к его глобальной структуре (его топологии), а именно, что не более одной ссылки может указывать на любой данный узел (узел имеет не более единственный родитель), и ни один узел в дереве не указывает на корень. Фактически, каждый узел (кроме корня) должен иметь ровно одного родителя, а корень не должен иметь родителей.

Действительно, учитывая список узлов и для каждого узла список ссылок на его дочерние элементы, нельзя сказать, является ли эта структура деревом или нет, не проанализировав ее глобальную структуру и что она на самом деле является топологически деревом, как определено ниже.

Теория типов

В качестве абстрактного типа данных тип абстрактного дерева $T$ со значениями некоторого типа $E$ определяется с использованием типа абстрактного леса $F$ (список деревьев) функциями:

значение:

T

→

E

дети:

T

→

F

ноль: () →

F

узел:

E

×

F

→

T

с аксиомами:

значение (узел (

e

,

f

)) =

e

дети (узел (

e

,

f

)) =

f

С точки зрения теории типов , дерево - это индуктивный тип, определяемый конструкторами $nil$ (пустой лес) и $node$ (дерево с корневым узлом с заданным значением и дочерними элементами).

Математическая терминология

В целом древовидная структура данных представляет собой упорядоченное дерево , обычно со значениями, прикрепленными к каждому узлу. Конкретно это (если требуется, чтобы он был непустым):

Корневое дерево с направлением «от корня» (более узкое понятие является « древовидный »), что означает:
- Ориентированный граф ,
- базовый неориентированный граф которого является деревом (любые две вершины соединены ровно одним простым путем),
- с выделенным корнем (одна вершина обозначается как корень),
- который определяет направление на ребрах (стрелки указывают от корня; для данного ребра узел, на который указывает ребро, называется $родительским,$ а узел, на который указывает ребро, называется $дочерним$ ),

вместе с:

упорядочение дочерних узлов данного узла и
значение (некоторого типа данных) на каждом узле.

Часто деревья имеют фиксированный (точнее, ограниченный) коэффициент ветвления ( исходящую степень ), в частности, всегда есть два дочерних узла (возможно, пустые, следовательно, не более двух непустых дочерних узлов), следовательно, «двоичное дерево».

Разрешение пустых деревьев делает некоторые определения более простыми, некоторые - более сложными: корневое дерево должно быть непустым, следовательно, если пустые деревья разрешены, приведенное выше определение вместо этого становится «пустым деревом или корневым деревом такое, что ...». С другой стороны, пустые деревья упрощают определение фиксированного коэффициента ветвления: с разрешенными пустыми деревьями двоичное дерево - это дерево, в котором каждый узел имеет ровно два дочерних элемента, каждый из которых является деревом (возможно, пустым). Полный набор операций с деревом должен включать операцию разветвления.

Математическое определение

Неупорядоченное дерево

Математически неупорядоченное дерево ^[3] (или «алгебраическое дерево») ^[4] может быть определено как алгебраическая структура $(X, родительский элемент),$ где $X$ - непустой несущий набор узлов, а $родительский$ элемент - это функция на $X,$ которая назначает каждый узел $x$ его "родительский" узел, $parent (x)$ . Структура подчиняется условию, что каждая непустая подалгебра должна иметь одну и ту же неподвижную точку . То есть должен быть уникальный «корневой» узел $r$ , такой, что $parent (r) = r,$ и для каждого узла $x$ некоторый $родительский элемент$ итеративного приложения $(parent (\dots parent ($ $x$ $) \dots))$ равен $r$ .

Есть несколько эквивалентных определений.

В качестве ближайшей альтернативы можно определить неупорядоченные деревья как частичные алгебры $(X, родительские),$ которые получаются из полных алгебр, описанных выше, если $родительский элемент (r)$ не определен. То есть корень $r$ является единственным узлом, на котором $родительская$ функция не определена, и для каждого узла $x$ корень доступен из $x$ в ориентированном графе $(X, родительский)$ . Это определение фактически совпадает с определением антидревесценции . В книге TAoCP используется термин ориентированное дерево . ^[5]

Неупорядоченное дерево представляет собой структура $(X, ≺)$ , где $Х$ представляет собой набор узлов и $≺$ является ребенок к родителю связь между узлами таким образом, что:
(1)	$X$ is non-empty.
(2)	$X$ is weakly connected in $≺$ .
(3)	$≺$ is functional.
(4)	$≺$ satisfies ACC: there is no infinite sequence $x 1 ≺ x 2 ≺ x 3 ≺ \dots$ .

The box on the right describes the partial algebra $(X, parent)$ as a relational structure $(X, ≺)$ . If (1) is replaced by

$X$ contains exactly one $≺$ -maximal node.

then condition (2) becomes redundant.

Using this definition, dedicated terminology can be provided for generalizations of unordered trees that correspond to distinguished subsets of the listed conditions:

(1,2,3) – directed pseudotree,
(3) – directed pseudoforest,
(3,4) – unordered forest (whose components are unordered trees),
(4) – directed acyclic graph, assumed that $X$ is finite,
(1',4) – acyclic accessible pointed graph (then condition (2) holds implicitly).

Another equivalent definition of an unordered tree is that of a set-theoretic tree that is singly-rooted and whose height is at most ω (a "finite-ish" tree).^[6] That is, the algebraic structures $(X, parent)$ are equivalent to partial orders $(X, \leq)$ that have a top element $r$ and whose every principal upset (aka principal filter) is a finite chain. To be precise, we should speak about an inverse set-theoretic tree since the set-theoretic definition usually employs opposite ordering.

The correspondence between $(X, parent)$ and $(X, \leq)$ is established via reflexive transitive closure / reduction, with the reduction resulting in the "partial" version without the root cycle.

The definition of trees in descriptive set theory (DST) utilizes the representation of partial orders $(X, \geq)$ as prefix orders between finite sequences. In turns out that up to isomorphism, there is a one-to-one correspondence between the (inverse of) DST trees and the tree structures defined so far.

We can refer to the four equivalent characterizations as to tree as an algebra, tree as a partial algebra, tree as a partial order, and tree as a prefix order. There is also a fifth equivalent definition – that of a graph-theoretic rooted tree which is just a connected acyclic rooted graph.

The expression of trees as (partial) algebras (also called functional graphs) $(X, parent)$ follows directly the implementation of tree structures using parent pointers. Typically, the partial version is used in which the root node has no parent defined. However, in some implementations or models even the $parent(r) = r$ circularity is established. Notable examples:

The Linux VFS where "The root dentry has a d_parent that points to itself".^[7]
The concept of an instantiation tree^[8]^[9]^[10]

from object-oriented programming. In this case, the root node is the top metaclass – the only class that is a direct instance of itself.

Note that the above definition admits infinite trees. This allows for the description of infinite structures supported by some implementations via lazy evaluation. A notable example is the infinite regress of eigenclasses from the Ruby object model.^[11] In this model, the tree established via superclass links between non-terminal objects is infinite and has an infinite branch (a single infinite branch of "helix" objects – see the diagram).

Sibling sets

In every unordered tree $(X, parent)$ there is a distinguished partition of the set $X$ of nodes into sibling sets. Two non-root nodes $x$ , $y$ belong to the same sibling set if $parent(x) = parent(y)$ . The root node $r$ forms the singleton sibling set ${r$ }.^[c] A tree is said to be locally finite or finitely branching if each of its sibling sets is finite.

Each pair of distinct siblings is incomparable in $\leq$ . This is why the word unordered is used in the definition. Such a terminology might become misleading when all sibling sets are singletons, i.e. when the set $X$ of all nodes is totally ordered (and thus well-ordered) by $\leq$ In such a case we might speak about a singly-branching tree instead.

Using set inclusion

As with every partially ordered set, tree structures $(X, \leq)$ can be represented by inclusion order – by set systems in which $\leq$ is coincident with $\subseteq$ , the induced inclusion order. Consider a structure $(U, ℱ)$ such that $U$ is a non-empty set, and $ℱ$ is a set of subsets of $U$ such that the following are satisfied (by Nested Set Collection definition):

$\emptyset \notin ℱ$ . (That is, $(U, ℱ)$ is a hypergraph.)
$U \in ℱ$ .
For every $X$ , $Y$ from $ℱ$ , $X \cap Y \in {\emptyset, X, Y$ }. (That is, $ℱ$ is a laminar family.)^[12]
For every $X$ from $ℱ$ , there are only finitely many $Y$ from $ℱ$ such that $X \subseteq Y$ .

Then the structure $(ℱ, \subseteq)$ is an unordered tree whose root equals $U$ . Conversely, if $(U, \leq)$ is an unordered tree, and $ℱ$ is the set ${↓ x | x \in U$ } of all principal ideals of $(U, \leq)$ then the set system $(U, ℱ)$ satisfies the above properties.

Tree as a laminar system of sets (Copied from Nested set model)

The set-system view of tree structures provides the default semantic model – in the majority of most popular cases, tree data structures represent containment hierarchy. This also offers a justification for order direction with the root at the top: The root node is a greater container than any other node. Notable examples:

Directory structure of a file system. A directory contains its sub-directories.
DOM tree. The document parts correspondent to DOM nodes are in subpart relation according to the tree order.
Single inheritance in object-oriented programming. An instance of a class is also an instance of a superclass.
Hierarchical taxonomy such as the Dewey Decimal Classification with sections of increasing specificity.
BSP trees, quadtrees, octrees, R-trees and other tree data structures used for recursive space partitioning.

Well-founded trees

An unordered tree $(X, \leq)$ is well-founded if the strict partial order $<$ is a well-founded relation. In particular, every finite tree is well-founded. Assuming the axiom of dependent choice a tree is well-founded if and only if it has no infinite branch.

Well-founded trees can be defined recursively – by forming trees from a disjoint union of smaller trees. For the precise definition, suppose that $X$ is a set of nodes. Using the reflexivity of partial orders, we can identify any tree $(Y, \leq)$ on a subset of $X$ with its partial order $(\leq)$ – a subset of $X \times X$ . The set $ℛ$ of all relations $R$ that form a well-founded tree $(Y, R)$ on a subset $Y$ of $X$ is defined in stages $ℛ i$ , so that $ℛ = ⋃{ℛ i | i is ordinal$ }. For each ordinal number $i$ , let $R$ belong to the $i$ -th stage $ℛ i$ if and only if $R$ is equal to

⋃ℱ \cup ((dom(⋃ℱ) \cup {x}) \times {x})

where $ℱ$ is a subset of $⋃{ℛ k | k < i}$ such that elements of $ℱ$ are pairwise disjoint, and $x$ is a node that does not belong to $dom(⋃ℱ)$ . (We use $dom(S)$ to denote the domain of a relation $S$ .) Observe that the lowest stage $ℛ 0$ consists of single-node trees {( $x$ , $x$ )} since only empty $ℱ$ is possible. In each stage, (possibly) new trees $R$ are built by taking a forest $⋃ℱ$ with components $ℱ$ from lower stages and attaching a new root $x$ atop of $⋃ℱ$ .

In contrast to the tree height which is at most ω, the rank of well-founded trees is unlimited,^[13] see the properties of "unfolding".

Using recursive pairs

In computing, a common way to define well-founded trees is via recursive ordered pairs $(F, x)$ : a tree is a forest $F$ together with a "fresh" node $x$ .^[14] A forest $F$ in turn is a possibly empty set of trees with pairwise disjoint sets of nodes. For the precise definition, proceed similarly as in the construction of names used in the set-theoretic technique of forcing. Let $X$ be a set of nodes. In the superstructure over $X$ , define sets $T$ , $ℱ$ of trees and forests, respectively, and a map $nodes : T \to ℘(X)$ assigning each tree $t$ its underlying set of nodes so that:

(trees over $X$ )	$t \in T$	↔	$t$ is a pair $(F, x)$ from $ℱ \times X$ such that for all $s \in F$ , $x \notin nodes(s)$ ,
(forests over $X$ )	$F \in ℱ$	↔	$F$ is a subset of $T$ such that for every $s, t \in F, s \neq t$ , $nodes(s) \cap nodes(t) = \emptyset$ ,
(nodes of trees)	$y \in nodes(t)$	↔	$t = (F, x) \in T$ and either $y = x$ or $y \in nodes(s)$ for some $s \in F$ .

Circularities in the above conditions can be eliminated by stratifying each of $T$ , $ℱ$ and $nodes$ into stages like in the previous subsection. Subsequently, define a "subtree" relation $\leq$ on $T$ as the reflexive transitive closure of the "immediate subtree" relation $≺$ defined between trees by

s ≺ t

↔

s \in π 1 (t)

where $π 1 (t)$ is the projection of $t$ onto the first coordinate; i.e., it is the forest $F$ such that $t = (F, x)$ for some $x \in X$ . It can be observed that $(T, \leq)$ is a multitree: for every $t \in T$ , the principal ideal $↓ t$ ordered by $\leq$ is a well-founded tree as a partial order. Moreover, for every tree $t \in T$ , its "nodes"-order structure $(nodes(t), \leq t)$ is given by $x \leq t y$ if and only if there are forests $F, G \in ℱ$ such that both $(F, x)$ and $(G, y)$ are subtrees of $t$ and $(F, x) \leq (G, y)$ .

Using arrows

Another formalization as well as generalization of unordered trees can be obtained by reifying parent-child pairs of nodes. Each such ordered pair can be regarded as an abstract entity – an "arrow". This results in a multidigraph $(X, A, s, t)$ where $X$ is the set of nodes, $A$ is the set of arrows, and $s$ and $t$ are functions from $A$ to $X$ assigning each arrow its source and target, respectively. The structure is subject to the following conditions:

(1)

(A, s ○ t -1)

is an unordered tree, as a total algebra.

(2) The

t

map is a bijection between arrows and nodes.

In (1), the composition symbol ○ is to be interpreted left-to-right. The condition says that inverse consecutivity of arrows^[d] is a total child-to-parent map. Let this parent map between arrows be denoted $p$ , i.e. $p = s ○ t -1$ . Then we also have $s = p ○ t$ , thus a multidigraph satisfying (1,2) can also be axiomatized as $(X, A, p, t)$ , with the parent map $p$ instead of $s$ as a definitory constituent. Observe that the root arrow is necessarily a loop, i.e. its source and target coincide.

Dentry structure

Arrow tree: the hard-link structure of VFS

An important generalization of the above structure is established by allowing the target map $t$ to be many-to-one. This means that (2) is weakened to

(2') The

t

map is surjective – each node is the target of some arrow.

Note that condition (1) asserts that only leaf arrows are allowed to have the same target. That is, the restriction of $t$ to the range of $p$ is still injective.

Multidigraphs satisfying (1,2') can be called "arrow trees" – their tree characteristics is imposed on arrows rather than nodes. These structures can be regarded as the most essential abstraction of the Linux VFS because they reflect the hard-link structure of filesystems. Nodes are called inodes, arrows are dentries (or hard links). The parent and target maps $p$ and $t$ are respectively represented by d_parent and d_inode fields in the dentry data structure.^[15] Each inode is assigned a fixed file type, of which the directory type plays a special role of "designed parents":

(a) only directory inodes can appear as hard-link source, and

(b) a directory inode cannot appear as the target of more than one hard-link.

Using dashed style for the first half of the root loop indicates that, similarly to the parent map, there is a partial version for the source map $s$ in which the source of the root arrow is undefined. This variant is employed for further generalization, see #Using paths in a multidigraph.

Using paths in a digraph

Unfolding of set membership $\in$ . For every set $x$ , the relational structures $(x \cup {x}, \in)$ and $(TC({x}), \in)$ are both apgs.^{[apg 1]}

Folded von Neumann ordinal 5

(5, \in)

folded

Unfolded von Neumann ordinal 5

(5, \in)

unfolded

$5 = 4 \cup {4} = TC({4}) = {0,1,2,3,4$ } is a von Neumann ordinal number.

Unordered trees naturally arise by "unfolding" of accessible pointed graphs.^[16]

Let $ℛ = (X, R, r)$ be a pointed relational structure, i.e. such that $X$ is the set of nodes, $R$ is a relation between nodes (a subset of $X \times X$ ), and $r$ is a distinguished "root" node. Assume further that $ℛ$ is accessible, which means that $X$ equals the preimage of ${r$ } under the reflexive transitive closure of $R$ , and call such a structure an accessible pointed graph or apg for short.^{[apg 2]} Then one can derive another apg $ℛ' = (X', R', r')$ – the unfolding of $ℛ$ – as follows:

$X'$ is the set of reversed paths to $r$ , i.e. the set of non-empty finite sequences $p$ of nodes (elements of $X$ ) such that (a) consecutive members of $p$ are inversely $R$ -related, and (b) the first member of $p$ is the root $r$ ,
$R'$ is a relation between paths from $X'$ such that paths $p$ and $q$ are $R'$ -related if and only if $p = q ⁎ [x]$ for some node $x$ (i.e. $q$ is a maximal proper prefix of $p$ , the "popped" $p$ ), and
$r'$ is the one-element sequence $[r]$ .

Apparently, the structure $(X', R')$ is an unordered tree in the "partial-algebra" version: $R'$ is a partial map that relates each non-root element of $X'$ to its parent by path popping. The root element is obviously $r'$ . Moreover, the following properties are satisfied:

$ℛ$ is isomorphic to its unfolding $ℛ'$ if and only if $ℛ$ is a tree.^{[apg 3]} (In particular, unfolding is idempotent, up to isomorphism.)
Unfolding preserves well-foundedness: If $R$ is well-founded then so is $R'$ .
Unfolding preserves rank: If $R$ is well-founded then the ranks of $R$ and $R'$ coincide.

Notes:

^ Acyclic relations like the set membership $\in$ only allow at most one possible root $r$ to form an apg. The root is then given implicitly and can be left out of the signature.
^ To establish a concordancy between $R$ and the $parent$ map, the presented definition uses reversed accessibility: $r$ is reachable from any node. This approach is also used in nLab – see membership graphs. In the original definition by P. Aczel, every node is reachable from $r$ (thus, instead of "preimage", the word "image" applies).
^ We have implicitly introduced a definition of unordered trees as apgs: call an apg $ℛ = (X, R, r)$ a tree if the reduct $(X, R)$ is an unordered tree as a partial algebra. This can be translated as: Every node is accessible by exactly one path.

Using paths in a multidigraph

As shown on the example of hard-link structure of file systems, many data structures in computing allow multiple links between nodes. Therefore, in order to properly exhibit the appearance of unordered trees among data structures it is necessary to generalize accessible pointed graphs to multidigraph setting. To simplify the terminology, we make use of the term quiver which is an established synonym for "multidigraph".

Accessible pointed quiver

Accessible pointed quiver (apq): generalization of apg to multidigraphs.

Let an accessible pointed quiver or apq for short be defined as a structure

ℳ = (X, A, s, t)

where $X$ is a set of nodes, $A$ is a set of arrows, $s$ is a partial function from $A$ to $X$ (the source map), and $t$ is a total function from $A$ to $X$ (the target map). Thus, $ℳ$ is a "partial multidigraph".

The structure is subject to the following conditions:

There is exactly one "root" arrow, $a r$ , whose source $s (a r)$ is undefined.
Every node $x \in X$ is reachable via a finite sequence of consecutive arrows starting with the root arrow $a r$ .

$ℳ$ is said to be a tree if the target map $t$ is a bijection between arrows and nodes. The unfolding of $ℳ$ is formed by the sequences mentioned in (2) – which are the accessibility paths (cf. Path algebra). As an apq, the unfolding can be written as $ℳ' = (X', A', s', t')$ where $X'$ is the set of accessibility paths, $A'$ coincides with $X'$ , $s'$ coincides with path popping, and $t'$ is the identity on $X'$ . Like with apgs, unfolding is idempotent and always results in a tree.

The underlying apg is obtained as the structure $(X, R, t (a r))$ where $R = {(t(a),s(a)) | a ∈ A \ {ar}}$ .

The diagram above shows an example of an apq with 1 + 14 arrows. In JavaScript, Python or Ruby, the structure can be created by the following (exactly the same) code:

r = {}; r[1] = {}; r[2] = r[1]; r[3] = {}; r[4] = {}; r[1][5] = {}; r[1][14] = r[1][5];r[3][7] = {}; r[3][8] = r[3][7]; r[3][13] = {};r[4][9] = r[4]; r[4][10] = r[4]; r[4][11] = {};r[3][7][6] = r[3]; r[3][7][12] = r[1][5];

Using names

Unordered trees and their generalizations form the essence of naming systems. There are two prominent examples of naming systems: file systems and (nested) associative arrays. The multidigraph-based structures from previous subsections provided anonymous abstractions for both cases. To obtain naming capabilities, arrows are to be equipped with names as identifiers. A name must be locally unique – within each sibling set of arrows^[e] there can be at most one arrow labelled by a given name.

`source`	`name`	`target`
$s (a)$	$σ (a)$	$t (a)$

This can be formalized as a structure

ℰ = (X, Σ, A, s, σ, t)

where $X$ is a set of nodes, $Σ$ is a set of names, $A$ is a set of arrows, $s$ is a partial function from $A$ to $X$ , $σ$ is a partial function from $A$ to $Σ$ , and $t$ is a total function from $A$ to $X$ . For an arrow $a$ , constituents of the triple $(s (a), σ (a), t (a))$ are respectively $a$ 's source, name and target.

The structure is subject to the following conditions:

The reduct $(X, A, s, t)$ is an accessible pointed quiver (apq) as defined previously.
The name function $σ$ is undefined just for the source-less root arrow.
The name function $σ$ is injective in the restriction to every sibling set of arrows, i.e. for every non-root arrows $a$ , $b$ , if $s (a) = s (b)$ and $σ (a) = σ (b)$ then $a = b$ .

This structure can be called a nested dictionary or named apq. In computing, such structures are ubiquitous. The table above shows that arrows can be considered "un-reified" as the set $A' = {(s(a), σ(a), t(a)) | a ∈ A \ {ar}}$ of source-name-target triples. This leads to a relational structure $(X, Σ, A')$ which can be viewed as a relational database table. Underlines in source and name indicate primary key.

The structure can be rephrased as a deterministic labelled transition system: $X$ is a set of "states", $Σ$ is a set of "labels", $A'$ is a set of "labelled transitions". (Moreover, the root node $r = t (a r)$ is an "initial state", and the accessibility condition means that every state is reachable from the initial state.)

Nested dictionary

The diagram on the right shows a nested dictionary $ℰ$ that has the same underlying multidigraph as the example in the previous subsection. The structure can be created by the code below. Like before, exactly the same code applies for JavaScript, Python and Ruby.

First, a substructure, $ℰ 0$ , is created by a single assignment of a literal {...} to r. This structure, depicted by full lines, is an "arrow tree" (therefore, it is a spanning tree). The literal in turn appears to be a JSON serialization of $ℰ 0$ .

Subsequently, the remaining arrows are created by assignments of already existing nodes. Arrows that cause cycles are displayed in blue.

r = {"a":{"a":5,"b":5},"c":{"a":{"w":5},"c":{}},"d":{"w":1.3}}r["b"] = r["a"]; r["c"]["b"] = r["c"]["a"]r["c"]["a"]["p"] = r["c"]; r["d"]["e"] = r["d"]["self"] = r["d"]

In the Linux VFS, the name function $σ$ is represented by the d_name field in the dentry data structure.^[15] The $ℰ 0$ structure above demonstrates a correspondence between JSON-representable structures and hard-link structures of file systems. In both cases, there is a fixed set of built-in types of "nodes" of which one type is a container type, except that in JSON, there are in fact two such types – Object and Array. If the latter one is ignored (as well as the distinction between individual primitive data types) then the provided abstractions of file-systems and JSON data are the same – both are arrow trees equipped with naming $σ$ and a distinction of container nodes.

See #Nested data for the formal description of tree structures ("JSON-trees") with distinction and bipartition of container nodes.

Pathnames

The naming function $σ$ of a nested dictionary $ℰ$ naturally extends from arrows to arrow paths. Each sequence $p = [a 1, ..., a n]$ of consecutive arrows is implicitly assigned a pathname (cf. Pathname) – the sequence $σ (p) = [σ (a 1), ..., σ (a n)]$ of arrow names.^[f] Local uniqueness carries over to arrow paths: different sibling paths have different pathnames. In particular, the root-originating arrow paths are in one-to-one correspondence with their pathnames. This correspondence provides a "symbolic" representation of the unfolding of $ℰ$ via pathnames – the nodes in $ℰ$ are globally identified via a tree of pathnames.

Ordered tree

The structures introduced in the previous subsection form just the core "hierarchical" part of tree data structures that appear in computing. In most cases, there is also an additional "horizontal" ordering between siblings. In search trees the order is commonly established by the "key" or value associated with each sibling, but in many trees that is not the case. For example, XML documents, lists within JSON files, and many other structures have order that does not depend on the values in the nodes, but is itself data — sorting the paragraphs of a novel alphabetically would lose information.^{[dubious – discuss]}

The correspondent expansion of the previously described tree structures $(X, \leq)$ can be defined by endowing each sibling set with a linear order as follows.^[17]^[18]

An alternative definition according to Kuboyama^[3] is presented in the next subsection.

An ordered tree is a structure $(X, \leq V, \leq S)$ where $X$ is a non-empty set of nodes and $\leq V$ and $\leq S$ are relations on $X$ called vertical (or also hierarchical^[3]) order and sibling order, respectively. The structure is subject to the following conditions:

$(X, \leq V)$ is a partial order that is an unordered tree as defined in the previous subsection.
$(X, \leq S)$ is a partial order.
Distinct nodes are comparable in $< S$ if and only if they are siblings:
$(< S) \cup (> S) = ((≺ V) ○ (≻ V)) ∖ idX$ .
Every node has only finitely many preceding siblings, i.e. every principal ideal of $(X, \leq S)$ is finite. (This condition can be omitted in the case of finite trees.)

Conditions (2) and (3) say that $(X, \leq S)$ is a component-wise linear order, each component being a sibling set. Condition (4) asserts that if a sibling set $S$ is infinite then $(S, \leq S)$ is isomorphic to $(ℕ, \leq)$ , the usual ordering of natural numbers.

Given this, there are three (another) distinguished partial orders which are uniquely given by the following prescriptions:

$(< H)$	$=$	$(\leq V) ○ (< S) ○ (\geq V)$	(the horizontal order),
$(< L⁻)$	$=$	$(> V) \cup (< H)$	(the "discordant" linear order),
$(< L⁺)$	$=$	$(< V) \cup (< H)$	(the "concordant" linear order).

This amounts to a "V-S-H-L^±" system of five partial orders $\leq V$ , $\leq S$ , $\leq H$ , $\leq L⁺$ , $\leq L⁻$ on the same set $X$ of nodes, in which, except for the pair ${ \leq S, \leq H$ }, any two relations uniquely determine the other three, see the determinacy table.

Notes about notational conventions:

The relation composition symbol ○ used in this subsection is to be interpreted left-to-right (as $\circ _{l}$ ).
Symbols $<$ and $\leq$ express the strict and non-strict versions of a partial order.
Symbols $>$ and $\geq$ express the converse relations.
The $≺$ symbol is used for the covering relation of $\leq$ which is the immediate version of a partial order.

This yields six versions $≺, <, \leq, ≻, >, \geq$ for a single partial order relation. Except for $≺$ and $≻$ , each version uniquely determines the others. Passing from $≺$ to $<$ requires that $<$ be transitively reducible. This is always satisfied for all of $< V$ , $< S$ and $< H$ but might not hold for $< L⁺$ or $< L⁻$ if $X$ is infinite.

The partial orders $\leq V$ and $\leq H$ are complementary:

(< V) ⊎ (> V) ⊎ (< H) ⊎ (> H) = X \times X ∖ id X

.

As a consequence, the "concordant" linear order $< L⁺$ is a linear extension of $< V$ . Similarly, $< L⁻$ is a linear extension of $> V$ .

The covering relations $≺ L⁻$ and $≺ L⁺$ correspond to pre-order traversal and post-order traversal, respectively. If $x ≺ L⁻ y$ then, according to whether $y$ has a previous sibling or not, the $x$ node is either the "rightmost" non-strict descendant of the previous sibling of $y$ or, in the latter case, $x$ is the first child of $y$ . Pairs $(x,y)$ of the latter case form the relation $(≺ L⁻) ∖ (< H)$ which is a partial map that assigns each non-leaf node its first child node. Similarly, $(≻ L⁺) ∖ (> H)$ assigns each non-leaf node with finitely many children its last child node.

Definition using horizontal order

The Kuboyama's definition of "rooted ordered trees"^[3] makes use of the horizontal order $\leq H$ as a definitory relation.^[g] (See also Suppes.^[19])

Using the notation and terminology introduced so far, the definition can be expressed as follows.

An ordered tree is a structure $(X, \leq V, \leq H)$ such that conditions (1–5) are satisfied:

$(X, \leq V)$ is a partial order that is an unordered tree. (The vertical order.)
$(X, \leq H)$ is a partial order. (The horizontal order.)
The partial orders $\leq V$ and $\leq H$ are complementary: $(< V) ⊎ (> V) ⊎ (< H) ⊎ (> H) = X \times X ∖ id X$ .
(That is, pairs of nodes that are incomparable in $(< V)$ are comparable in $(< H)$ and vice versa.)
The partial orders $\leq V$ and $\leq H$ are "consistent": $(< H) = (\leq V) ○ (< H) ○ (\geq V)$ .
(That is, for every nodes $x$ , $y$ such that $x < H y$ , all descendants of $x$ must precede all the descendants of $y$ .)
Every node has only finitely many preceding siblings. (That is, for every infinite sibling set $S$ , $(S, \leq H)$ has the order type of the natural numbers.) (Like before, this condition can be omitted in the case of finite trees.)

The sibling order $(\leq S)$ is obtained by $(< S) = (< H) \cap ((≺ V) ○ (≻ V))$ , i.e. two distinct nodes are in sibling order if and only if they are in horizontal order and are siblings.

Determinacy table

The following table shows the determinacy of the "V-S-H-L^±" system. Relational expressions in the table's body are equal to one of $< V$ , $< S$ , $< H$ , $< L⁻$ , or $< L⁺$ according to the column. It follows that except for the pair ${ \leq S, \leq H$ }, an ordered tree $(X, ...)$ is uniquely determined by any two of the five relations.

	$< V$	$< S$	$< H$	$< L⁻$	$< L⁺$
$V,S$			$(\leq V) ○ (< S) ○ (\geq V)$
$V,H$		$(< H) \cap ((≺ V)○(≻ V))$		$(> V) \cup (< H)$	$(< V) \cup (< H)$
$V,L⁻$		$(< L⁻) \cap ((≺ V)○(≻ V))$	$(< L⁻) ∖ (> V)$
$V,L⁺$		$(< L⁺) \cap ((≺ V)○(≻ V))$	$(< L⁺) ∖ (< V)$
$H,L⁻$	$(> L⁻) ∖ (< H)$
$H,L⁺$	$(< L⁺) ∖ (< H)$
$L⁻,L⁺$	$(> L⁻) \cap (< L⁺)$		$(< L⁻) \cap (< L⁺)$
$S,L⁻$	$x ≺ V y \leftrightarrow y = inf L⁻ (Y)$ where $Y$ is the image of ${x$ } under $(\geq S)○(≻ L⁻)$
$S,L⁺$	$x ≺ V y \leftrightarrow y = sup L⁺ (Y)$ where $Y$ is the image of ${x$ } under $(\leq S)○(≺ L⁺)$

In the last two rows, $inf L⁻ (Y)$ denotes the infimum of $Y$ in $(X, \leq L⁻)$ , and $sup L⁺ (Y)$ denotes the supremum of $Y$ in $(X, \leq L⁺)$ . In both rows, $(\leq S)$ resp. $(\geq S)$ can be equivalently replaced by the sibling equivalence $(\leq S)○(\geq S)$ . In particular, the partition into sibling sets together with either of $\leq L⁻$ or $\leq L⁺$ is also sufficient to determine the ordered tree. The first prescription for $≺ V$ can be read as: the parent of a non-root node $x$ equals the infimum of the set of all immediate predecessors of siblings of $x$ , where the words "infimum" and "predecessors" are meant with regard to $\leq L⁻$ . Similarly with the second prescription, just use "supremum", "successors" and $\leq L⁺$ .

The relations $\leq S$ and $\leq H$ obviously cannot form a definitory pair. For the simplest example, consider an ordered tree with exactly two nodes – then one cannot tell which of them is the root.

XPath axes

XPath axis	Relation
`ancestor`	$< V$
`ancestor-or-self`	$\leq V$
`child`	$≻ V$
`descendant`	$> V$
`descendant-or-self`	$\geq V$
`following`	$< H$
`following-sibling`	$< S$
`parent`	$≺ V$
`preceding`	$> H$
`preceding-sibling`	$> S$
`self`	$id X$

The table on the right shows a correspondence of introduced relations to XPath axes, which are used in structured document systems to access nodes that bear particular ordering relationships to a starting "context" node. For a context node^[20] $x$ , its axis named by the specifier in the left column is the set of nodes that equals the image of ${x$ } under the correspondent relation. As of XPath 2.0, the nodes are "returned" in document order, which is the "discordant" linear order $\leq L⁻$ . A "concordance" would be achieved, if the vertical order $\leq V$ was defined oppositely, with the bottom-up direction outwards the root like in set theory in accordance to natural trees.^[h]

Traversal maps

Below is the list of partial maps that are typically used for ordered tree traversal.^[21] Each map is a distinguished functional subrelation of $\leq L⁻$ or of its opposite.

$≺ V$ ... the parent-node partial map,
$≻ S$ ... the previous-sibling partial map,
$≺ S$ ... the next-sibling partial map,
$(≺ L⁻) ∖ (< H)$ ... the first-child partial map,
$(≻ L⁺) ∖ (> H)$ ... the last-child partial map,
$≻ L⁻$ ... the previous-node partial map,
$≺ L⁻$ ... the next-node partial map.

Generating structure

The traversal maps constitute a partial unary algebra^[22] $(X, parent, previousSibling, ..., nextNode)$ that forms a basis for representing trees as linked data structures. At least conceptually, there are parent links, sibling adjacency links, and first / last child links. This also applies to unordered trees in general, which can be observed on the dentry data structure in the Linux VFS.^[23]

Similarly to the "V-S-H-L^±" system of partial orders, there are pairs of traversal maps that uniquely determine the whole ordered tree structure. Naturally, one such generating structure is $(X, ≺ V, ≺ S)$ which can be transcribed as $(X, parent, nextSibling)$ – the structure of parent and next-sibling links. Another important generating structure is $(X, firstChild, nextSibling)$ known as left-child right-sibling binary tree. This partial algebra establishes a one-to-one correspondence between binary trees and ordered trees.

Definition using binary trees

The correspondence to binary trees provides a concise definition of ordered trees as partial algebras.

An ordered tree is a structure $(X,lc,rs)$ where $X$ is a non-empty set of nodes, and $lc$ , $rs$ are partial maps on $X$ called left-child and right-sibling, respectively. The structure is subject to the following conditions:

The partial maps $lc$ and $rs$ are disjoint, i.e. $(lc) \cap (rs) = \emptyset$ .
The inverse of $(lc) \cup (rs)$ is a partial map $p$ such that the partial algebra $(X, p)$ is an unordered tree.

The partial order structure $(X, \leq V, \leq S)$ is obtained as follows:

$(≺ S)$	$=$	$(rs)$ ,
$(≻ V)$	$=$	$(lc) ○ (\leq S)$ .

Per-level ordering

Dashed line indicates the

< B⁻

ordering)

As a possible expansion of the "V-S-H-L^±" system, another distinguished relations between nodes can be defined, based on the tree's level structure. First, let us denote by $\sim E$ the equivalence relation defined by $x \sim E y$ if and only if $x$ and $y$ have the same number of ancestors. This yields a partition of the set of nodes into levels $L 0, L 1, ... (, L n)$ – a coarsement of the partition into sibling sets. Then define relations $< E$ , $< B⁻$ and $< B⁺$ by

{\begin{aligned}\left(<_{\mathrm {E} }\right)&=\left(<_{\mathrm {H} }\right)\cap \left(\sim _{\mathrm {E} }\right)&{\text{((per) level order),}}\\\left(<_{\mathrm {B} ^{-}}\right)&=\left(<_{\mathrm {E} }\right)\cup \left(\left(\sim _{\mathrm {E} }\right)\circ \left(>_{\mathrm {V} }\right)\right)&{\text{(breadth-first order, BFS ordering),}}\\\left(<_{\mathrm {B} ^{+}}\right)&=\left(<_{\mathrm {E} }\right)\cup \left(\left({<}_{\mathrm {V} }\right)\circ \left(\sim _{\mathrm {E} }\right)\right)&{\text{(breadth-first post-order).}}\end{aligned}}

It can be observed that $< E$ is a strict partial order and $< B⁻$ and $< B⁺$ are strict total orders. Moreover, there is a similarity between the "V-S-L^±" and "V-E-B^±" systems: $< E$ is component-wise linear and orthogonal to $< V$ , $< B⁻$ is linear extension of $< E$ and of $> V$ , and $< B⁺$ is a linear extension of $< E$ and of $< V$ .

Encoding by sequences

Ordered trees can be naturally encoded by finite sequences of natural numbers.^[24]^[i] Denote $ℕ ⁎$ the set of all finite sequences of natural numbers. A non-empty subset $W$ of $ℕ ⁎$ is called a tree domain^[25]^[26] if for all $u$ , $v$ from $ℕ ⁎$ and all $i$ , $j$ from $ℕ$ the following holds ( $⁎$ is the concatenation operator):

If $u ⁎ v \in W$ then $u \in W$ . ( $W$ is prefix-closed.)
If $u ⁎ [i] \in W$ and $j < i$ then $u ⁎ [j] \in W$ . ( $W$ is left-sibling-closed.)

The induced structure on $W$ gives rise to an ordered tree: take the prefix order for $\geq V$ and the lexicographical order for $\leq L⁻$ .^[j]

Conversely, for an ordered tree $T = (X, \leq V, \leq L⁻)$ assign each node $x$ the sequence $w (x)$ of sibling indices, i.e. the root is assigned the empty sequence and for every non-root node $x$ , let $w (x) = w (parent(x)) ⁎ [i]$ where $i$ is the number of preceding siblings of $x$ . Put $W = {w (x) | x \in X}$ . Then $W$ is a tree domain with its induced structure isomorphic to $T$ .

Nested list

r = [[5,5],[[5],[]],[[1.3]]]r[3] = r[0] r[1][2] = r[1][0]r[1][0][1] = r[1] r[2][2] = r[2][1] = r[2]

Ordering of siblings can be naturally applied to multidigraph generalizations that have been introduced for unordered trees. Moveover, sibling indices can be viewed as special names. A tree domain is just a tree of pathnames. As a result, one obtains a nested list as a counterpart to a nested dictionary.

The example shown on the right provides a nested-list version of the nested dictionary presented before. Like before, there is an initial structure (an arrow tree, depicted by full lines) that is created by a single assignment of a literal [...] to r. The structure is subsequently modified by assignments that introduce additional links.

The code is applicable to JavaScript and Ruby. In Python, the modifying assignments are disallowed because of the use of sibling indices that are not present in the initial structure.^[k]

Nested data

The notions of a nested dictionary and a nested list (which are generalizations of unordered / ordered trees, respectively) can be combined into the unifying concept of nested data. Such structures are most popular in connection with the JSON data format. These structures are multidigraphs with a distinguished set of container nodes, each container being either a dictionary or a list. In the text below, the sets of dictionaries and lists are respectively denoted $X O$ and $X A$ . This is according to the JSON terminology in which the corresponding two types of containers are called Object and Array. The complementary set of non-container nodes represents "primitive values". JSON-specific formalizations^[27]^[28] provide further refinement according to the supported data types.

Nested data can be formalized as a structure $ℳ = (X, Σ, A, X O, X A, s, t, σ, ι)$ where $X$ is a set of nodes, $Σ$ is a set of names, $A$ is a set of arrows, $X O$ and $X A$ are distinguished subsets of $X$ , $s$ is a partial function from $A$ to $X$ (the source map), $t$ is a total function from $A$ to $X$ (the target map), $σ$ is a partial function from $A$ to $Σ$ assigning an arrow its name, and $ι$ is a partial function from $A$ to the set $ℕ$ of natural numbers assigning an arrow its sibling index.

Let $ℳ$ be called a nested data tree if the following conditions are satisfied:

There is exactly one "root" arrow, $a r$ , whose source $s (a r)$ is undefined.
Every node is reachable via a path of arrows starting with the root arrow $a r$ .
Every node from $X O \cup X A$ is the target of exactly one arrow.
Every node that is the source of an arrow is from $X O \cup X A$ .
The sets $X O$ and $X A$ are disjoint.
The arrow-naming function σ satisfies the following, for every arrows a, b.
1. $σ (a)$ is defined if and only if $s (a) \in X O$ .
2. If $σ (a)$ and $σ (b)$ are both defined and equal then either $a = b$ or $s (a) \neq s (b)$ .
The arrow-indexing function ι satisfies the following, for every arrows a, b.
1. $ι (a)$ is defined if and only if $s (a) \in X A$ .
2. If $ι (a)$ and $ι (b)$ are both defined and equal then either $a = b$ or $s (a) \neq s (b)$ .
3. If $ι (a)$ is defined and non-zero then $ι (a) = ι (a') + 1$ for some arrow $a'$ such that $s (a) = s (a')$ .

Note in particular that (1) and (2) define accessible pointed quivers $(X, A, s, t)$ . Conditions (1–4) provide axiomatization of arrow trees with distinction of containers $X O \cup X A$ . By (3) there are unique links to containers and by (4) non-containers must be leaf nodes (cf. conditions (b) and (a) for hard-links in file systems). An important consequence of (1–4) is the condition of acyclicity:

There are no circular paths of consecutive arrows.

This condition provides a weak alternative to (3).

Total ordering

Although dictionaries have the semantics of unordered collections, in programming environments they are often equipped with some intrinsic ordering. Such ordering is supported in all of Python, Ruby and JavaScript.^[l]

Thus, it is worthwhile to also consider ordered nested data trees, a refinement of nested data trees in which all sibling sets are ordered. (Condition (7a) is altered to let $ι (a)$ be defined for every non-root arrow $a$ .)

Nomenclature

By considering just particular subsets of the above conditions and/or particular constituents of $ℳ$ we obtain a nomenclature of tree data structures. Accessible pointed quivers $(X, A, s, t)$ form the "lowest common denominator" – conditions (1) and (2) are always required. Conditions (4–7) are imposed appropriately to whether $X O$ , $X A$ , $σ$ or $ι$ are defined.

The "tree" attribution is established by condition (3).

Require

Define

Distinguish

Uniqueness
(3)

Acyclicity
(3′)

Naming

σ

Ordering

ι

X O

X A

Semantic

Total

Accessible pointed quiver

Unordered tree

Ordered tree

Arrow tree
with distinction of containers

X O \cup X A

Nested dictionary

Python:	dictionary
Ruby:	hash
JavaScript:	object

General

Acyclic

Tree

Ordered

Nested list

Python:	list
Ruby, JavaScript:	array

General

Acyclic

Tree

Nested data

General

Acyclic

Tree

Ordered

Смотрите также

Tree structure
Tree (graph theory)
Tree (set theory)
Cardinal tree and Ordinal tree
Hierarchy (mathematics)
Dialog tree
Single inheritance
Generative grammar
Hierarchical clustering
Binary space partition tree
Recursion
Fenwick tree

Other trees

Trie
Day–Stout–Warren algorithm
Enfilade
Left child-right sibling binary tree
Hierarchical temporal memory
Integral Tree ^[29]

Заметки

^ This is different from the formal definition of subtree used in graph theory, which is a subgraph that forms a tree – it need not include all descendants. For example, the root node by itself is a subtree in the graph theory sense, but not in the data structure sense (unless there are no descendants).
^ Properly, a rooted, ordered directed graph.
^ Alternatively, a "partial" version can be employed by excluding ${r}$ .
^ Arrows $a$ and $b$ are said to be consecutive, respectively, if $t (a) = s (b)$ .
^ I.e. arrows that have the same source node.
^ Here we assume that the root arrow $a r$ is not in $p$ .
^ Unfortunately, the author uses the term sibling order for the horizontal order relation. This is non-standard, if not a misnomer.
^ This would also establish a concordance of the two possible directions of inequality symbols with the categorization of XPath axes into forward axes and reverse axes.
^ In general, any alphabet equipped with ordering that is isomorphic to that of natural numbers can be used.
^ This statement is valid regardless of condition (ⅱ).
^ This can be solved by e.g. r = [[5,5],[[5,0],[],0],[[1.3],0,0],0].
^ As of Python 3.7, Ruby 1.9 and ECMAScript 2020 (tc39.es/proposal-for-in-order).

дальнейшее чтение

Donald Knuth. The Art of Computer Programming: Fundamental Algorithms, Third Edition. Addison-Wesley, 1997. ISBN 0-201-89683-4 . Section 2.3: Trees, pp. 308–423.
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms, Second Edition. MIT Press and McGraw-Hill, 2001. ISBN 0-262-03293-7 . Section 10.4: Representing rooted trees, pp. 214–217. Chapters 12–14 (Binary Search Trees, Red-Black Trees, Augmenting Data Structures), pp. 253–320.

Внешние ссылки

Data Trees as a Means of Presenting Complex Data Analysis by Sally Knipe on August 8, 2013
Description from the Dictionary of Algorithms and Data Structures
CRAN - Package data.tree implementation of a tree data structure in the R programming language
WormWeb.org: Interactive Visualization of the C. elegans Cell Tree – Visualize the entire cell lineage tree of the nematode C. elegans (javascript)
Binary Trees by L. Allison

[ApgImplicitRoot-20] Acyclic relations like the set membership $\in$ only allow at most one possible root $r$ to form an apg. The root is then given implicitly and can be left out of the signature.

[ApgOrientation-22] To establish a concordancy between $R$ and the $parent$ map, the presented definition uses reversed accessibility: $r$ is reachable from any node. This approach is also used in nLab – see membership graphs. In the original definition by P. Aczel, every node is reachable from $r$ (thus, instead of "preimage", the word "image" applies).

[UnorderedTreeAsApg-23] We have implicitly introduced a definition of unordered trees as apgs: call an apg $ℛ = (X, R, r)$ a tree if the reduct $(X, R)$ is an unordered tree as a partial algebra. This can be translated as: Every node is accessible by exactly one path.

[1] This is different from the formal definition of subtree used in graph theory, which is a subgraph that forms a tree – it need not include all descendants. For example, the root node by itself is a subtree in the graph theory sense, but not in the data structure sense (unless there are no descendants).

[4] Properly, a rooted, ordered directed graph.

[14] Alternatively, a "partial" version can be employed by excluding ${r}$ .

[18] Arrows $a$ and $b$ are said to be consecutive, respectively, if $t (a) = s (b)$ .

[27] I.e. arrows that have the same source node.

[28] Here we assume that the root arrow $a r$ is not in $p$ .

[31] Unfortunately, the author uses the term sibling order for the horizontal order relation. This is non-standard, if not a misnomer.

[34] This would also establish a concordance of the two possible directions of inequality symbols with the categorization of XPath axes into forward axes and reverse axes.

[39] In general, any alphabet equipped with ordering that is isomorphic to that of natural numbers can be used.

[42] This statement is valid regardless of condition (ⅱ).

[43] This can be solved by e.g. r = [[5,5],[[5,0],[],0],[[1.3],0,0],0].

[46] As of Python 3.7, Ruby 1.9 and ECMAScript 2020 (tc39.es/proposal-for-in-order).

[2] Weisstein, Eric W. "Subtree". MathWorld.

[3] Susanna S. Epp (Aug 2010). Discrete Mathematics with Applications. Pacific Grove, CA: Brooks/Cole Publishing Co. p. 694. ISBN 978-0-495-39132-6.

[Kuboyama2007-5] Tetsuji Kuboyama (2007). "Matching and learning in trees" (PDF). Doctoral Thesis, University of Tokyo.

[6] "The Linux VFS Model: Naming structure".

[TAoCP_oriented_trees-7] Donald Knuth. The Art of Computer Programming, Volume 1: Fundamental Algorithms, Third Edition. Addison-Wesley, 1997. Section 2.3.4.2: Oriented trees.

[Unger2012-8] Unger, Spencer (2012). "Trees in Set Theory" (PDF). Cite journal requires |journal= (help)

[9] Bruce Fields. "Notes on the Linux kernel".

[10] Pierre Cointe (1987). "Metaclasses are First Class: the ObjVlisp Model". Proceeding OOPSLA '87 Conference Proceedings on Object-oriented Programming Systems, Languages and Applications. North-Holland.

[11] Wolfgang Klas, Michael Schrefl (1995). Metaclasses and Their Application: Data Model Tailoring and Database Integration. Springer.

[12] "What Is a Metaclass?".

[13] "The Ruby Object Model: Data structure in detail".

[15] B. Korte, and J. Vygen (2012). Combinatorial optimization. Springer, Heidelberg.

[Dasgupta2014-16] Dasgupta, Abhiit (2014). Set theory: with an introduction to real point sets. New York: Birkhäuser.

[Makinson2012-17] Makinson, David (2012). Sets, logic and maths for computing. Springer Science & Business Media. ISBN 9781447124993.

[Bovet2005-19] Bovet, Daniel; Cesati, Marco (2005). Understanding the Linux Kernel. O'Reilly. ISBN 9780596554910.

[Aczel1988-21] Aczel, Peter (1988), Non-well-founded sets., CSLI Lecture Notes, 14, Stanford, CA: Stanford University, Center for the Study of Language and Information, ISBN 0-937073-22-9, MR 0940014

[Hidders2005-29] Jan Hidders; Philippe Michiels; Roel Vercammen (2005). "Optimizing sorting and duplicate elimination in XQuery path expressions" (PDF). Cite journal requires |journal= (help)

[Dau2007-30] Frithjof Dau; Mark Sifer (2007). "A formalism for navigating and editing XML document structure" (PDF). International Workshop on Databases in Networked Information Systems. Springer, Berlin, Heidelberg.

[Suppes1973-32] Suppes, Patrick (1973). "Semantics of context-free fragments of natural languages". Approaches to Natural Language. Springer, Dordrecht: 370–394. CiteSeerX 10.1.1.398.2289. doi:10.1007/978-94-010-2506-5_21. ISBN 978-90-277-0233-3.

[33] "XML Path Language (XPath) 3.1". World Wide Web Consortium. 21 March 2017.

[35] "Document Object Model Traversal". W3C. 2000.

[36] "Unary Algebras".

[37] J.T. Mühlberg, G. Lüttgen (2009). "Verifying compiled file system code". Formal Methods: Foundations and Applications: 12th Brazilian Symposium on Formal Methods. Springer, Berlin, Heidelberg. CiteSeerX 10.1.1.156.7781. Cite journal requires |journal= (help)

[Afanasiev2005-38] L. Afanasiev; P. Blackburn; I. Dimitriou; B. Gaiffe; E. Goris; M. Marx; M. de Rijke (2005). "PDL for ordered trees" (PDF). Journal of Applied Non-Classical Logics. 15 (2): 115–135. doi:10.3166/jancl.15.115-135. S2CID 1979330.

[Michaelis2001-40] Michaelis, Jens (2001). "Transforming linear context-free rewriting systems into minimalist grammars". International Conference on Logical Aspects of Computational Linguistics. Springer, Berlin, Heidelberg. pp. 228–244.

[Morawietz2008-41] Morawietz, Frank (2008). Two-Step Approaches to Natural Language Formalism. Walter de Gruyter.

[Bourhis2020-44] Bourhis, P.; Reutter, J.L.; Vrgoč, D. (2020). "JSON: Data model and query languages". Information Systems. 89.

[Botoeva2018-45] Botoeva, E.; Calvanese, D.; Cogrel, B.; Xiao, G. (2018). "Expressivity and complexity of MongoDB queries". 21st International Conference on Database Theory (ICDT 2018). Schloss Dagstuhl-Leibniz-Zentrum für Informatik.

[47] Miltiadou, Milto; Campbell, Neill D. F.; Cosker, Darren; Grant, Michael G. (January 2021). "A Comparative Study about Data Structures Used for Efficient Management of Voxelised Full-Waveform Airborne LiDAR Data during 3D Polygonal Model Creation". Remote Sensing. 13 (4): 559. doi:10.3390/rs13040559.

[a]

Дерево (структура данных)

Общее использование

Терминология

Предварительное определение

Рисование деревьев

Общие операции

Методы обхода и поиска

Представления

Обобщения

Диграфы

Тип данных по сравнению со структурой данных

Рекурсивный

Теория типов

Математическая терминология

Математическое определение

Неупорядоченное дерево

Sibling sets

Using set inclusion

Well-founded trees

Using recursive pairs

Using arrows

Using paths in a digraph

Using paths in a multidigraph

Using names

Pathnames

Ordered tree

Definition using horizontal order

Determinacy table

XPath axes

Traversal maps

Generating structure

Definition using binary trees

Per-level ordering

Encoding by sequences

Nested list

Nested data

Total ordering

Nomenclature

Смотрите также

Other trees

Заметки

Рекомендации

дальнейшее чтение

Внешние ссылки