# SmellsLikeML

#### AlphaGo Zero: Learn with Search

11/2/17

Having considered the Go environment and how AlphaGo Zero is able to perceive game play through deep convnets, we turn to consider how decisions are made.

AlphaGo Zero was designed with neural net 'heads' for policy and value calculations. Essentially, each head takes the convnet output and applies additional layers to learn functions which learn policy/value through experiencing gameplay.

The policy head applies 2-filter 1x1 convolution with 1-stride, batch normalization, relu activation, and a 362-dense layer outputting logit probabilities for actions corresponding to 361 positions and a pass for something like:


strides=1,
kernel_size=1,
)
is_training=True)


The value head uses 1-filter 1x1 convolution with 1-stride, batch normalization, relu activation, flattened and connected to 256 dense, relu, to scalar, tanh activation.


strides=1,
kernel_size=1,
)