透過 Genetic Algorithm 解函數極大化問題

基因演算法

基因演算法藉由模擬生物演化的過程，透過複製、交配、突變、淘汰等等，藉由逐代迭代，找出能極大化目標函數的解。

基因演算法的過程也涉及到「產生隨機數」，因此可以視為 Monte Carlo 方法的一種優化（見此篇文章的討論）。這篇筆記記錄了基因演算法對於一個簡單習題的應用：

目標函數

$$ 2x^3 – 25x^2 + 18x + 3000 – xsin(x)$$ where $$ -8 \leq x \leq 8$$

參數設定
- 每個世代有 30 組基因組
- 迭代 30 個世代
- 突變機率為 5%
- 每個基因組為 8 位元

R code

# Genetic Algorithm
rm(list = ls())

# find non-zero fitness for selection
get_fitness <- function(pred){
  return(pred + 1e-3 - min(pred))
}

# convert binary DNA to decimal and normalize it to a specific range
translateDNA <- function(pop){
  return((pop %*% (2^((DNA_SIZE-1):0))) / (2^DNA_SIZE - 1) * X_BOUND[2])
}


# nature selection wrt pop's fitness
select <- function(pop, fitness){
  weights <- fitness/sum(fitness)
  idx <- sample(1:POP_SIZE, size = POP_SIZE, replace = TRUE, prob = weights)
  return(pop[idx,])
}

# mating process (genes crossover)
crossover <- function(parent, pop){
  if(runif(1) < CROSS_RATE){
    i_ <- sample(1:POP_SIZE, size = 1)
    cross_points <- sample(c(TRUE, FALSE), size = DNA_SIZE, replace = TRUE)
    parent[cross_points] <- pop[i_, cross_points]
  }
  return(parent)
}

mutate <- function(child){
  for(point in 1:DNA_SIZE){
    if(runif(1) < MUTATION_RATE){
      if(child[point] == 0){
        child[point] <- 1
      } else {
        child[point] <- 0
      }
    }
  }
  return(child)
}

Fvalue <- function(x){
  return(2*x^3 - 25*x^2 + 18*x + 3000 - x*sin(x))
}


# Setting Parameters
DNA_SIZE = 8             # DNA length
POP_SIZE = 30            # population size
CROSS_RATE = 1           # mating probability (DNA crossover)
MUTATION_RATE = 0.05     # mutation probability
N_GENERATIONS = 30
X_BOUND = c(-8, 8)       # x upper and lower bounds


# execute GA
set.seed(1234)
pop <- sample(c(0, 1), size = DNA_SIZE * POP_SIZE, replace = TRUE)
pop <- matrix(pop, nrow = POP_SIZE, ncol = DNA_SIZE)

for(i in 1:N_GENERATIONS){
  F_values = Fvalue(translateDNA(pop))
  
  fitness <- get_fitness(F_values)
  print("Most fitted DNA:")
  mostFittedDNA <- pop[which(fitness == max(fitness)),]
  print(mostFittedDNA)
  print("Corresponding x value:")
  print(translateDNA(mostFittedDNA))
  print("Corresponding function value")
  print(Fvalue(translateDNA(mostFittedDNA)))
  pop <- select(pop, fitness)
  pop_copy <- pop
  for(row in 1:POP_SIZE){
    parent <- pop[row,]
    child <- crossover(parent, pop_copy)
    child <- mutate(child)
    parent <- child
    
    # renew the whole population
    pop[row, ] <- parent
  }
}

執行結果

我們可以先透過 wolframe alpha 偷看一下這個極大化問題的答案，約在 x = 0.3618 時，函數值最大

而我們透過基因演算法迭代 30 代後所得到「適應程度」最高的基因組為：00001100，轉換為十進位表示法並縮放至 x 的範圍後為 0.3764706。離解析解 0.361865 雖近，但仍有一絲差距

有意思的是，這裡如果我們繼續增加迭代的代數或者每代的基因組數量，例如將 N_GENERATIONS 以及 POP_SIZE 都分別從 30 修改為 300，並不會讓我們的「解」更加靠近正解，而依舊是 0.3764706，這是為什麼呢？原因在於我們只透過 8 個 bit 的基因組來表達定義域（即 x 的 precision）。若是我們將程式稍加修改，例如將 DNA_SIZE 增加，例如修改為 16 ，那麼在相同迭代 300 代，並且每代有 300 組基因組下，解便更新為 0.3627985 ，稍加靠近正解了一些。不過隨著 DNA_SIZE 的增加，實際上也就引入了更高的 flexibility ，意味著 model 本身的 variance 更大，很容易可以發現我們拿掉 seed 後，在 N_GENERATIONS 跟 POP_SIZE 都為 300 的情況下，所得出的解並不總是就離正解近一些。

參考資料

莫煩的 python code

部落格

透過 Genetic Algorithm 解函數極大化問題

基因演算法

目標函數

R code

執行結果

參考資料

發佈留言取消回覆

基因演算法

目標函數

R code

執行結果

參考資料

發佈留言 取消回覆

發佈留言取消回覆