Оценка алгоритма распределения

Оценка алгоритма распределения. Для каждой итерации i выполняется случайная выборка для совокупности P в распределении PDu . Затем параметры распределения PDe оцениваются с использованием выбранных точек PS . Иллюстрированный пример оптимизирует непрерывный целевую функцию F (X) с уникальной оптимальной O . Выборка (согласно нормальному распределению N ) концентрируется вокруг оптимума по мере выполнения алгоритма раскрутки.

Алгоритмы оценки распределения ( EDA ), иногда называемые вероятностными генетическими алгоритмами построения моделей (PMBGA)^[1], представляют собойметоды стохастической оптимизации, которые направляют поиск оптимума путем построения и выборки явных вероятностных моделей многообещающих возможных решений. Оптимизация рассматривается как серия последовательных обновлений вероятностной модели, начиная с модели, кодирующей неинформативные априорные решения по сравнению с допустимыми, и заканчивая моделью, которая генерирует только глобальные оптимумы.^[2]^[3]^[4]

EDA относятся к классу эволюционных алгоритмов . Основное различие между EDA и большинством традиционных эволюционных алгоритмов заключается в том, что эволюционные алгоритмы генерируют новые возможные решения с использованием неявного распределения, определенного одним или несколькими операторами вариации, тогда как EDA используют явное распределение вероятностей, закодированное байесовской сетью , многомерное нормальное распределение или другое класс модели. Подобно другим эволюционным алгоритмам, EDA можно использовать для решения задач оптимизации, определенных для ряда представлений от векторов до S-выражений LISP , и качество возможных решений часто оценивается с использованием одной или нескольких целевых функций.

Общая процедура EDA изложена ниже:

t  : = 0инициализировать модель M (0) для представления равномерного распределения по допустимым решениямwhile (критерии завершения не выполнены) do  P  : = генерировать N> 0 возможных решений путем выборки M ( t ) F  : = оценивать все возможные решения в P M (t + 1): = adjust_model ( P , F , M ( t ) ) t  : = t + 1

Использование явных вероятностных моделей в оптимизации позволило EDA реально решить проблемы оптимизации, которые были заведомо сложными для большинства обычных эволюционных алгоритмов и традиционных методов оптимизации, таких как проблемы с высоким уровнем эпистаза ^{[ необходима цитата ]} . Тем не менее, преимущество EDA состоит также в том, что эти алгоритмы предоставляют специалисту по оптимизации ряд вероятностных моделей, которые раскрывают много информации о решаемой проблеме. Эта информация, в свою очередь, может использоваться для разработки операторов окружения для конкретных задач для локального поиска, для смещения будущих прогонов EDA по аналогичной проблеме или для создания эффективной вычислительной модели проблемы.

Например, если совокупность представлена строками битов длиной 4, EDA может представлять совокупность многообещающих решений с использованием одного вектора из четырех вероятностей (p1, p2, p3, p4), где каждый компонент p определяет вероятность того, что позиция равна 1. Используя этот вектор вероятности, можно создать произвольное количество возможных решений.

Оценка алгоритмов распределения (EDA) [ править ]

В этом разделе описаны модели, построенные некоторыми хорошо известными EDA разного уровня сложности. Это всегда предполагается население в поколении , оператор выбора , оператор построения модели и оператор выборки . $P(t)$ $t$ $S$ $\alpha$ $\beta$

Одномерные факторизации [ править ]

Самые простые ЭДАС предположить , что переменные решения являются независимыми, то есть . Следовательно, одномерные EDA полагаются только на одномерную статистику, а многомерные распределения должны быть факторизованы как продукт одномерных распределений вероятностей, $p(X_{1},X_{2})=p(X_{1})\cdot p(X_{2})$ $N$

$D_{\text{Univariate}}:=p(X_{1},\dots ,X_{N})=\prod _{i=1}^{N}p(X_{i}).$

Такие факторизации используются во многих различных EDA, далее мы опишем некоторые из них.

Алгоритм одномерного маржинального распределения (UMDA) [ править ]

UMDA ^[5] - это простой EDA, который использует оператор для оценки предельных вероятностей для выбранной совокупности . Предполагая, что элементы содержат , дает вероятности: $\alpha _{UMDA}$ $S(P(t))$ $S(P(t))$ $\lambda$ $\alpha _{UMDA}$

$p_{t+1}(X_{i})={\dfrac {1}{\lambda }}\sum _{x\in S(P(t))}x_{i},~\forall i\in 1,2,\dots ,N.$

Каждый шаг UMDA можно описать следующим образом

$D(t+1)=\alpha _{\text{UMDA}}\circ S\circ \beta _{\lambda }(D(t)).$

Population-based incremental learning (PBIL)[edit]

The PBIL,^[6] represents the population implicitly by its model, from which it samples new solutions and updates the model. At each generation, $\mu$ individuals are sampled and $\lambda \leq \mu$ are selected. Such individuals are then used to update the model as follows

$p_{t+1}(X_{i})=(1-\gamma )p_{t}(X_{i})+(\gamma /\lambda )\sum _{x\in S(P(t))}x_{i},~\forall i\in 1,2,\dots ,N,$

where $\gamma \in (0,1]$ is a parameter defining the learning rate, a small value determines that the previous model $p_{t}(X_{i})$ should be only slightly modified by the new solutions sampled. PBIL can be described as

$D(t+1)=\alpha _{\text{PIBIL}}\circ S\circ \beta _{\mu }(D(t))$

Compact genetic algorithm (cGA)[edit]

The CGA,^[7] also relies on the implicit populations defined by univariate distributions. At each generation $t$ , two individuals $x,y$ are sampled, $P(t)=\beta _{2}(D(t))$ . The population $P(t)$ is then sorted in decreasing order of fitness, $S_{{\text{Sort}}(f)}(P(t))$ , with $u$ being the best and $v$ being the worst solution. The CGA estimates univariate probabilities as follows

$p_{t+1}(X_{i})=p_{t}(X_{i})+\gamma (u_{i}-v_{i}),\quad \forall i\in 1,2,\dots ,N,$

where, $\gamma \in (0,1]$ is a constant defining the learning rate, usually set to $\gamma =1/N$ . The CGA can be defined as

$D(t+1)=\alpha _{\text{CGA}}\circ S_{{\text{Sort}}(f)}\circ \beta _{2}(D(t))$

Bivariate factorizations[edit]

Although univariate models can be computed efficiently, in many cases they are not representative enough to provide better performance than GAs. In order to overcome such a drawback, the use of bivariate factorizations was proposed in the EDA community, in which dependencies between pairs of variables could be modeled. A bivariate factorization can be defined as follows, where $\pi _{i}$ contains a possible variable dependent to $X_{i}$ , i.e. $|\pi _{i}|=1$ .

$D_{\text{Bivariate}}:=p(X_{1},\dots ,X_{N})=\prod _{i=1}^{N}p(X_{i}|\pi _{i}).$

Bivariate and multivariate distributions are usually represented as Probabilistic Graphical Models (graphs), in which edges denote statistical dependencies (or conditional probabilities) and vertices denote variables. To learn the structure of a PGM from data linkage-learning is employed.

Mutual information maximizing input clustering (MIMIC)[edit]

The MIMIC^[8] factorizes the joint probability distribution in a chain-like model representing successive dependencies between variables. It finds a permutation of the decision variables, $r:i\mapsto j$ , such that $x_{r(1)}x_{r(2)},\dots ,x_{r(N)}$ minimizes the Kullback-Leibler divergence in relation to the true probability distribution, i.e. $\pi _{r(i+1)}=\{X_{r(i)}\}$ . MIMIC models a distribution

$p_{t+1}(X_{1},\dots ,X_{N})=p_{t}(X_{r(N)})\prod _{i=1}^{N-1}p_{t}(X_{r(i)}|X_{r(i+1)}).$

New solutions are sampled from the leftmost to the rightmost variable, the first is generated independently and the others according to conditional probabilities. Since the estimated distribution must be recomputed each generation, MIMIC uses concrete populations in the following way

$P(t+1)=\beta _{\mu }\circ \alpha _{\text{MIMIC}}\circ S(P(t)).$

Bivariate marginal distribution algorithm (BMDA)[edit]

The BMDA^[9] factorizes the joint probability distribution in bivariate distributions. First, a randomly chosen variable is added as a node in a graph, the most dependent variable to one of those in the graph is chosen among those not yet in the graph, this procedure is repeated until no remaining variable depends on any variable in the graph (verified according to a threshold value).

The resulting model is a forest with multiple trees rooted at nodes $\Upsilon _{t}$ . Considering $I_{t}$ the non-root variables, BMDA estimates a factorized distribution in which the root variables can be sampled independently, whereas all the others must be conditioned to the parent variable $\pi _{i}$ .

$p_{t+1}(X_{1},\dots ,X_{N})=\prod _{X_{i}\in \Upsilon _{t}}p_{t}(X_{i})\cdot \prod _{X_{i}\in I_{t}}p_{t}(X_{i}|\pi _{i}).$

Each step of BMDA is defined as follows

$P(t+1)=\beta _{\mu }\circ \alpha _{\text{BMDA}}\circ S(P(t)).$

Multivariate factorizations[edit]

The next stage of EDAs development was the use of multivariate factorizations. In this case, the joint probability distribution is usually factorized in a number of components of limited size $|\pi _{i}|\leq K,~\forall i\in 1,2,\dots ,N$ .

$p(X_{1},\dots ,X_{N})=\prod _{i=1}^{N}p(X_{i}|\pi _{i})$

The learning of PGMs encoding multivariate distributions is a computationally expensive task, therefore, it is usual for EDAs to estimate multivariate statistics from bivariate statistics. Such relaxation allows PGM to be built in polynomial time in $N$ ; however, it also limits the generality of such EDAs.

Extended compact genetic algorithm (eCGA)[edit]

The ECGA^[10] was one of the first EDA to employ multivariate factorizations, in which high-order dependencies among decision variables can be modeled. Its approach factorizes the joint probability distribution in the product of multivariate marginal distributions. Assume $T_{\text{eCGA}}=\{\tau _{1},\dots ,\tau _{\Psi }\}$ is a set of subsets, in which every $\tau \in T_{\text{eCGA}}$ is a linkage set, containing $|\tau |\leq K$ variables. The factorized joint probability distribution is represented as follows

$p(X_{1},\dots ,X_{N})=\prod _{\tau \in T_{\text{eCGA}}}p(\tau ).$

The ECGA popularized the term "linkage-learning" as denoting procedures that identify linkage sets. Its linkage-learning procedure relies on two measures: (1) the Model Complexity (MC) and (2) the Compressed Population Complexity (CPC). The MC quantifies the model representation size in terms of number of bits required to store all the marginal probabilities

$MC=\log _{2}(\lambda +1)\sum _{\tau \in T_{\text{eCGA}}}(2^{|\tau |-1}),$

The CPC, on the other hand, quantifies the data compression in terms of entropy of the marginal distribution over all partitions, where $\lambda$ is the selected population size, $|\tau |$ is the number of decision variables in the linkage set $\tau$ and $H(\tau )$ is the joint entropy of the variables in $\tau$

$CPC=\lambda \sum _{\tau \in T_{\text{eCGA}}}H(\tau ).$

The linkage-learning in ECGA works as follows: (1) Insert each variable in a cluster, (2) compute CCC = MC + CPC of the current linkage sets, (3) verify the increase on CCC provided by joining pairs of clusters, (4) effectively joins those clusters with highest CCC improvement. This procedure is repeated until no CCC improvements are possible and produces a linkage model $T_{\text{eCGA}}$ . The ECGA works with concrete populations, therefore, using the factorized distribution modeled by ECGA, it can be described as

$P(t+1)=\beta _{\mu }\circ \alpha _{\text{eCGA}}\circ S(P(t))$

Bayesian optimization algorithm (BOA)[edit]

The BOA^[11]^[12]^[13] uses Bayesian networks to model and sample promising solutions. Bayesian networks are directed acyclic graphs, with nodes representing variables and edges representing conditional probabilities between pair of variables. The value of a variable $x_{i}$ can be conditioned on a maximum of $K$ other variables, defined in $\pi _{i}$ . BOA builds a PGM encoding a factorized joint distribution, in which the parameters of the network, i.e. the conditional probabilities, are estimated from the selected population using the maximum likelihood estimator.

$p(X_{1},X_{2},\dots ,X_{N})=\prod _{i=1}^{N}p(X_{i}|\pi _{i}).$

The Bayesian network structure, on the other hand, must be built iteratively (linkage-learning). It starts with a network without edges and, at each step, adds the edge which better improves some scoring metric (e.g. Bayesian information criterion (BIC) or Bayesian-Dirichlet metric with likelihood equivalence (BDe)).^[14] The scoring metric evaluates the network structure according to its accuracy in modeling the selected population. From the built network, BOA samples new promising solutions as follows: (1) it computes the ancestral ordering for each variable, each node being preceded by its parents; (2) each variable is sampled conditionally to its parents. Given such scenario, every BOA step can be defined as

$P(t+1)=\beta _{\mu }\circ \alpha _{\text{BOA}}\circ S(P(t))$

Linkage-tree Genetic Algorithm (LTGA)[edit]

The LTGA^[15] differs from most EDA in the sense it does not explicitly model a probability distribution but only a linkage model, called linkage-tree. A linkage $T$ is a set of linkage sets with no probability distribution associated, therefore, there is no way to sample new solutions directly from $T$ . The linkage model is a linkage-tree produced stored as a Family of sets (FOS).

$T_{\text{LT}}=\{\{x_{1}\},\{x_{2}\},\{x_{3}\},\{x_{4}\},\{x_{1},x_{2}\},\{x_{3},x_{4}\}\}.$

The linkage-tree learning procedure is a hierarchical clustering algorithm, which work as follows. At each step the two closest clusters $i$ and $j$ are merged, this procedure repeats until only one cluster remains, each subtree is stored as a subset $\tau \in T_{\text{LT}}$ .

The LTGA uses $T_{\text{LT}}$ to guide an "optimal mixing" procedure which resembles a recombination operator but only accepts improving moves. We denote it as $R_{\text{LTGA}}$ , where the notation $x[\tau ]\gets y[\tau ]$ indicates the transfer of the genetic material indexed by $\tau$ from $y$ to $x$ .

Algorithm Gene-pool optimal mixing Input: A family of subsets  $T_{\text{LT}}$  and a population  $P(t)$  Output: A population  $P(t+1)$ . for each  $x_{i}$  in  $P(t)$  do for each  $\tau$  in  $T_{\text{LT}}$  do choose a random  $x_{j}\in P(t):x_{i}\neq x_{j}$   $f_{x_{i}}$  :=  $f(x_{i})$   $x_{i}[\tau ]$ :=  $x_{j}[\tau ]$  if  $f(x_{i})\leq f_{x_{i}}$  then  $x_{i}[\tau ]:=x_{j}[\tau ]$  return  $P(t)$

"←" denotes assignment. For instance, "largest ← item" means that the value of largest changes to the value of item.
"return" terminates the algorithm and outputs the following value.

The LTGA does not implement typical selection operators, instead, selection is performed during recombination. Similar ideas have been usually applied into local-search heuristics and, in this sense, the LTGA can be seen as an hybrid method. In summary, one step of the LTGA is defined as

$P(t+1)=R_{\text{LTGA}}(P(t))\circ \alpha _{\text{LTGA}}(P(t))$

Other[edit]

Probability collectives (PC)^[16]^[17]
Hill climbing with learning (HCwL)^[18]
Estimation of multivariate normal algorithm (EMNA)^{[citation needed]}
Estimation of Bayesian networks algorithm (EBNA)^{[citation needed]}
Stochastic hill climbing with learning by vectors of normal distributions (SHCLVND)^[19]
Real-coded PBIL^{[citation needed]}
Selfish Gene Algorithm (SG)^[20]
Compact Differential Evolution (cDE)^[21] and its variants^[22]^[23]^[24]^[25]^[26]^[27]
Compact Particle Swarm Optimization (cPSO)^[28]
Compact Bacterial Foraging Optimization (cBFO)^[29]
Probabilistic incremental program evolution (PIPE)^[30]
Estimation of Gaussian networks algorithm (EGNA)^{[citation needed]}
Estimation multivariate normal algorithm with thresheld convergence^[31]
Dependency Structure Matrix Genetic Algorithm (DSMGA)^[32]^[33]

Related[edit]

CMA-ES
Cross-entropy method

References[edit]

^ Pelikan, Martin (2005-02-21), "Probabilistic Model-Building Genetic Algorithms", Hierarchical Bayesian Optimization Algorithm, Studies in Fuzziness and Soft Computing, 170, Springer Berlin Heidelberg, pp. 13–30, doi:10.1007/978-3-540-32373-0_2, ISBN 9783540237747
^ Pedro Larrañaga; Jose A. Lozano (2002). Estimation of Distribution Algorithms a New Tool for Evolutionary Computation. Boston, MA: Springer US. ISBN 978-1-4615-1539-5.
^ Jose A. Lozano; Larrañaga, P.; Inza, I.; Bengoetxea, E. (2006). Towards a new evolutionary computation advances in the estimation of distribution algorithms. Berlin: Springer. ISBN 978-3-540-32494-2.
^ Pelikan, Martin; Sastry, Kumara; Cantú-Paz, Erick (2006). Scalable optimization via probabilistic modeling : from algorithms to applications ; with 26 tables. Berlin: Springer. ISBN 978-3540349532.
^ Mühlenbein, Heinz (1 September 1997). "The Equation for Response to Selection and Its Use for Prediction". Evol. Computation. 5 (3): 303–346. doi:10.1162/evco.1997.5.3.303. ISSN 1063-6560. PMID 10021762.
^ Baluja, Shummet (1 January 1994). "Population-Based Incremental Learning: A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning". Carnegie Mellon University. Cite journal requires |journal= (help)
^ Harik, G.R.; Lobo, F.G.; Goldberg, D.E. (1999). "The compact genetic algorithm". IEEE Transactions on Evolutionary Computation. 3 (4): 287–297. doi:10.1109/4235.797971.
^ Bonet, Jeremy S. De; Isbell, Charles L.; Viola, Paul (1 January 1996). "MIMIC: Finding Optima by Estimating Probability Densities". Advances in Neural Information Processing Systems: 424. CiteSeerX 10.1.1.47.6497.
^ Pelikan, Martin; Muehlenbein, Heinz (1 January 1999). The Bivariate Marginal Distribution Algorithm. Advances in Soft Computing. pp. 521–535. CiteSeerX 10.1.1.55.1151. doi:10.1007/978-1-4471-0819-1_39. ISBN 978-1-85233-062-0.
^ Harik, Georges Raif (1997). Learning Gene Linkage to Efficiently Solve Problems of Bounded Difficulty Using Genetic Algorithms. University of Michigan.
^ Pelikan, Martin; Goldberg, David E.; Cantu-Paz, Erick (1 January 1999). "BOA: The Bayesian Optimization Algorithm". Morgan Kaufmann: 525–532. CiteSeerX 10.1.1.46.8131. Cite journal requires |journal= (help)
^ Pelikan, Martin (2005). Hierarchical Bayesian optimization algorithm : toward a new generation of evolutionary algorithms (1st ed.). Berlin [u.a.]: Springer. ISBN 978-3-540-23774-7.
^ Wolpert, David H.; Rajnarayan, Dev (1 January 2013). "Using Machine Learning to Improve Stochastic Optimization". Proceedings of the 17th AAAI Conference on Late-Breaking Developments in the Field of Artificial Intelligence. Aaaiws'13-17: 146–148.
^ Larrañaga, Pedro; Karshenas, Hossein; Bielza, Concha; Santana, Roberto (21 August 2012). "A review on probabilistic graphical models in evolutionary computation". Journal of Heuristics. 18 (5): 795–819. doi:10.1007/s10732-012-9208-4.
^ Thierens, Dirk (11 September 2010). The Linkage Tree Genetic Algorithm. Parallel Problem Solving from Nature, PPSN XI. pp. 264–273. doi:10.1007/978-3-642-15844-5_27. ISBN 978-3-642-15843-8.
^ WOLPERT, DAVID H.; STRAUSS, CHARLIE E. M.; RAJNARAYAN, DEV (December 2006). "Advances in Distributed Optimization Using Probability Collectives". Advances in Complex Systems. 09 (4): 383–436. CiteSeerX 10.1.1.154.6395. doi:10.1142/S0219525906000884.
^ Pelikan, Martin; Goldberg, David E.; Lobo, Fernando G. (2002). "A Survey of Optimization by Building and Using Probabilistic Models". Computational Optimization and Applications. 21 (1): 5–20. doi:10.1023/A:1013500812258.
^ Rudlof, Stephan; Köppen, Mario (1997). "Stochastic Hill Climbing with Learning by Vectors of Normal Distributions". CiteSeerX 10.1.1.19.3536. Cite journal requires |journal= (help)
^ Rudlof, Stephan; Köppen, Mario (1997). "Stochastic Hill Climbing with Learning by Vectors of Normal Distributions": 60––70. CiteSeerX 10.1.1.19.3536. Cite journal requires |journal= (help)
^ Corno, Fulvio; Reorda, Matteo Sonza; Squillero, Giovanni (1998-02-27). The selfish gene algorithm: a new evolutionary optimization strategy. ACM. pp. 349–355. doi:10.1145/330560.330838. ISBN 978-0897919692.
^ Mininno, Ernesto; Neri, Ferrante; Cupertino, Francesco; Naso, David (2011). "Compact Differential Evolution". IEEE Transactions on Evolutionary Computation. 15 (1): 32–54. doi:10.1109/tevc.2010.2058120. ISSN 1089-778X.
^ Iacca, Giovanni; Caraffini, Fabio; Neri, Ferrante (2012). "Compact Differential Evolution Light: High Performance Despite Limited Memory Requirement and Modest Computational Overhead". Journal of Computer Science and Technology. 27 (5): 1056–1076. doi:10.1007/s11390-012-1284-2. ISSN 1000-9000.
^ Iacca, Giovanni; Neri, Ferrante; Mininno, Ernesto (2011), "Opposition-Based Learning in Compact Differential Evolution", Applications of Evolutionary Computation, Springer Berlin Heidelberg, pp. 264–273, doi:10.1007/978-3-642-20525-5_27, ISBN 9783642205248
^ Mallipeddi, Rammohan; Iacca, Giovanni; Suganthan, Ponnuthurai Nagaratnam; Neri, Ferrante; Mininno, Ernesto (2011). Ensemble strategies in Compact Differential Evolution. 2011 IEEE Congress of Evolutionary Computation (CEC). IEEE. doi:10.1109/cec.2011.5949857. ISBN 9781424478347.
^ Neri, Ferrante; Iacca, Giovanni; Mininno, Ernesto (2011). "Disturbed Exploitation compact Differential Evolution for limited memory optimization problems". Information Sciences. 181 (12): 2469–2487. doi:10.1016/j.ins.2011.02.004. ISSN 0020-0255.
^ Iacca, Giovanni; Mallipeddi, Rammohan; Mininno, Ernesto; Neri, Ferrante; Suganthan, Pannuthurai Nagaratnam (2011). Global supervision for compact Differential Evolution. 2011 IEEE Symposium on Differential Evolution (SDE). IEEE. doi:10.1109/sde.2011.5952051. ISBN 9781612840710.
^ Iacca, Giovanni; Mallipeddi, Rammohan; Mininno, Ernesto; Neri, Ferrante; Suganthan, Pannuthurai Nagaratnam (2011). Super-fit and population size reduction in compact Differential Evolution. 2011 IEEE Workshop on Memetic Computing (MC). IEEE. doi:10.1109/mc.2011.5953633. ISBN 9781612840659.
^ Neri, Ferrante; Mininno, Ernesto; Iacca, Giovanni (2013). "Compact Particle Swarm Optimization". Information Sciences. 239: 96–121. doi:10.1016/j.ins.2013.03.026. ISSN 0020-0255.
^ Iacca, Giovanni; Neri, Ferrante; Mininno, Ernesto (2012), "Compact Bacterial Foraging Optimization", Swarm and Evolutionary Computation, Springer Berlin Heidelberg, pp. 84–92, doi:10.1007/978-3-642-29353-5_10, ISBN 9783642293528
^ Salustowicz, null; Schmidhuber, null (1997). "Probabilistic incremental program evolution". Evolutionary Computation. 5 (2): 123–141. doi:10.1162/evco.1997.5.2.123. ISSN 1530-9304. PMID 10021756.
^ Tamayo-Vera, Dania; Bolufe-Rohler, Antonio; Chen, Stephen (2016). Estimation multivariate normal algorithm with thresheld convergence. 2016 IEEE Congress on Evolutionary Computation (CEC). IEEE. doi:10.1109/cec.2016.7744223. ISBN 9781509006236.
^ Yu, Tian-Li; Goldberg, David E.; Yassine, Ali; Chen, Ying-Ping (2003), "Genetic Algorithm Design Inspired by Organizational Theory: Pilot Study of a Dependency Structure Matrix Driven Genetic Algorithm", Genetic and Evolutionary Computation — GECCO 2003, Springer Berlin Heidelberg, pp. 1620–1621, doi:10.1007/3-540-45110-2_54, ISBN 9783540406037
^ Hsu, Shih-Huan; Yu, Tian-Li (2015-07-11). Optimization by Pairwise Linkage Detection, Incremental Linkage Set, and Restricted / Back Mixing: DSMGA-II. ACM. pp. 519–526. arXiv:1807.11669. doi:10.1145/2739480.2754737. ISBN 9781450334723.

[1] Pelikan, Martin (2005-02-21), "Probabilistic Model-Building Genetic Algorithms", Hierarchical Bayesian Optimization Algorithm, Studies in Fuzziness and Soft Computing, 170, Springer Berlin Heidelberg, pp. 13–30, doi:10.1007/978-3-540-32373-0_2, ISBN 9783540237747

[2] Pedro Larrañaga; Jose A. Lozano (2002). Estimation of Distribution Algorithms a New Tool for Evolutionary Computation. Boston, MA: Springer US. ISBN 978-1-4615-1539-5.

[3] Jose A. Lozano; Larrañaga, P.; Inza, I.; Bengoetxea, E. (2006). Towards a new evolutionary computation advances in the estimation of distribution algorithms. Berlin: Springer. ISBN 978-3-540-32494-2.

[4] Pelikan, Martin; Sastry, Kumara; Cantú-Paz, Erick (2006). Scalable optimization via probabilistic modeling : from algorithms to applications ; with 26 tables. Berlin: Springer. ISBN 978-3540349532.

[5] Mühlenbein, Heinz (1 September 1997). "The Equation for Response to Selection and Its Use for Prediction". Evol. Computation. 5 (3): 303–346. doi:10.1162/evco.1997.5.3.303. ISSN 1063-6560. PMID 10021762.

[6] Baluja, Shummet (1 January 1994). "Population-Based Incremental Learning: A Method for Integrating Genetic Search Based Function Optimization and Competitive Learning". Carnegie Mellon University. Cite journal requires |journal= (help)

[7] Harik, G.R.; Lobo, F.G.; Goldberg, D.E. (1999). "The compact genetic algorithm". IEEE Transactions on Evolutionary Computation. 3 (4): 287–297. doi:10.1109/4235.797971.

[8] Bonet, Jeremy S. De; Isbell, Charles L.; Viola, Paul (1 January 1996). "MIMIC: Finding Optima by Estimating Probability Densities". Advances in Neural Information Processing Systems: 424. CiteSeerX 10.1.1.47.6497.

[9] Pelikan, Martin; Muehlenbein, Heinz (1 January 1999). The Bivariate Marginal Distribution Algorithm. Advances in Soft Computing. pp. 521–535. CiteSeerX 10.1.1.55.1151. doi:10.1007/978-1-4471-0819-1_39. ISBN 978-1-85233-062-0.

[10] Harik, Georges Raif (1997). Learning Gene Linkage to Efficiently Solve Problems of Bounded Difficulty Using Genetic Algorithms. University of Michigan.

[11] Pelikan, Martin; Goldberg, David E.; Cantu-Paz, Erick (1 January 1999). "BOA: The Bayesian Optimization Algorithm". Morgan Kaufmann: 525–532. CiteSeerX 10.1.1.46.8131. Cite journal requires |journal= (help)

[12] Pelikan, Martin (2005). Hierarchical Bayesian optimization algorithm : toward a new generation of evolutionary algorithms (1st ed.). Berlin [u.a.]: Springer. ISBN 978-3-540-23774-7.

[13] Wolpert, David H.; Rajnarayan, Dev (1 January 2013). "Using Machine Learning to Improve Stochastic Optimization". Proceedings of the 17th AAAI Conference on Late-Breaking Developments in the Field of Artificial Intelligence. Aaaiws'13-17: 146–148.

[14] Larrañaga, Pedro; Karshenas, Hossein; Bielza, Concha; Santana, Roberto (21 August 2012). "A review on probabilistic graphical models in evolutionary computation". Journal of Heuristics. 18 (5): 795–819. doi:10.1007/s10732-012-9208-4.

[15] Thierens, Dirk (11 September 2010). The Linkage Tree Genetic Algorithm. Parallel Problem Solving from Nature, PPSN XI. pp. 264–273. doi:10.1007/978-3-642-15844-5_27. ISBN 978-3-642-15843-8.

[16] WOLPERT, DAVID H.; STRAUSS, CHARLIE E. M.; RAJNARAYAN, DEV (December 2006). "Advances in Distributed Optimization Using Probability Collectives". Advances in Complex Systems. 09 (4): 383–436. CiteSeerX 10.1.1.154.6395. doi:10.1142/S0219525906000884.

[17] Pelikan, Martin; Goldberg, David E.; Lobo, Fernando G. (2002). "A Survey of Optimization by Building and Using Probabilistic Models". Computational Optimization and Applications. 21 (1): 5–20. doi:10.1023/A:1013500812258.

[18] Rudlof, Stephan; Köppen, Mario (1997). "Stochastic Hill Climbing with Learning by Vectors of Normal Distributions". CiteSeerX 10.1.1.19.3536. Cite journal requires |journal= (help)

[19] Rudlof, Stephan; Köppen, Mario (1997). "Stochastic Hill Climbing with Learning by Vectors of Normal Distributions": 60––70. CiteSeerX 10.1.1.19.3536. Cite journal requires |journal= (help)

[20] Corno, Fulvio; Reorda, Matteo Sonza; Squillero, Giovanni (1998-02-27). The selfish gene algorithm: a new evolutionary optimization strategy. ACM. pp. 349–355. doi:10.1145/330560.330838. ISBN 978-0897919692.

[21] Mininno, Ernesto; Neri, Ferrante; Cupertino, Francesco; Naso, David (2011). "Compact Differential Evolution". IEEE Transactions on Evolutionary Computation. 15 (1): 32–54. doi:10.1109/tevc.2010.2058120. ISSN 1089-778X.

[22] Iacca, Giovanni; Caraffini, Fabio; Neri, Ferrante (2012). "Compact Differential Evolution Light: High Performance Despite Limited Memory Requirement and Modest Computational Overhead". Journal of Computer Science and Technology. 27 (5): 1056–1076. doi:10.1007/s11390-012-1284-2. ISSN 1000-9000.

[23] Iacca, Giovanni; Neri, Ferrante; Mininno, Ernesto (2011), "Opposition-Based Learning in Compact Differential Evolution", Applications of Evolutionary Computation, Springer Berlin Heidelberg, pp. 264–273, doi:10.1007/978-3-642-20525-5_27, ISBN 9783642205248

[24] Mallipeddi, Rammohan; Iacca, Giovanni; Suganthan, Ponnuthurai Nagaratnam; Neri, Ferrante; Mininno, Ernesto (2011). Ensemble strategies in Compact Differential Evolution. 2011 IEEE Congress of Evolutionary Computation (CEC). IEEE. doi:10.1109/cec.2011.5949857. ISBN 9781424478347.

[25] Neri, Ferrante; Iacca, Giovanni; Mininno, Ernesto (2011). "Disturbed Exploitation compact Differential Evolution for limited memory optimization problems". Information Sciences. 181 (12): 2469–2487. doi:10.1016/j.ins.2011.02.004. ISSN 0020-0255.

[26] Iacca, Giovanni; Mallipeddi, Rammohan; Mininno, Ernesto; Neri, Ferrante; Suganthan, Pannuthurai Nagaratnam (2011). Global supervision for compact Differential Evolution. 2011 IEEE Symposium on Differential Evolution (SDE). IEEE. doi:10.1109/sde.2011.5952051. ISBN 9781612840710.

[27] Iacca, Giovanni; Mallipeddi, Rammohan; Mininno, Ernesto; Neri, Ferrante; Suganthan, Pannuthurai Nagaratnam (2011). Super-fit and population size reduction in compact Differential Evolution. 2011 IEEE Workshop on Memetic Computing (MC). IEEE. doi:10.1109/mc.2011.5953633. ISBN 9781612840659.

[28] Neri, Ferrante; Mininno, Ernesto; Iacca, Giovanni (2013). "Compact Particle Swarm Optimization". Information Sciences. 239: 96–121. doi:10.1016/j.ins.2013.03.026. ISSN 0020-0255.

[29] Iacca, Giovanni; Neri, Ferrante; Mininno, Ernesto (2012), "Compact Bacterial Foraging Optimization", Swarm and Evolutionary Computation, Springer Berlin Heidelberg, pp. 84–92, doi:10.1007/978-3-642-29353-5_10, ISBN 9783642293528

[30] Salustowicz, null; Schmidhuber, null (1997). "Probabilistic incremental program evolution". Evolutionary Computation. 5 (2): 123–141. doi:10.1162/evco.1997.5.2.123. ISSN 1530-9304. PMID 10021756.

[31] Tamayo-Vera, Dania; Bolufe-Rohler, Antonio; Chen, Stephen (2016). Estimation multivariate normal algorithm with thresheld convergence. 2016 IEEE Congress on Evolutionary Computation (CEC). IEEE. doi:10.1109/cec.2016.7744223. ISBN 9781509006236.

[32] Yu, Tian-Li; Goldberg, David E.; Yassine, Ali; Chen, Ying-Ping (2003), "Genetic Algorithm Design Inspired by Organizational Theory: Pilot Study of a Dependency Structure Matrix Driven Genetic Algorithm", Genetic and Evolutionary Computation — GECCO 2003, Springer Berlin Heidelberg, pp. 1620–1621, doi:10.1007/3-540-45110-2_54, ISBN 9783540406037

[33] Hsu, Shih-Huan; Yu, Tian-Li (2015-07-11). Optimization by Pairwise Linkage Detection, Incremental Linkage Set, and Restricted / Back Mixing: DSMGA-II. ACM. pp. 519–526. arXiv:1807.11669. doi:10.1145/2739480.2754737. ISBN 9781450334723.

[1],