Starting from random play, and given no domain knowledge except the game rules, AlphaZero achieved within 24 hours a superhuman level of play in the games of chess and shogi (Japanese chess) as well as Go, and convincingly defeated a world-champion program in each case.
Self-play games are generated by using the latest parameters for this neural network, omitting
the evaluation step and the selection of best player.
AlphaGo Zero tuned the hyper-parameter of its search by Bayesian optimization. In AlphaZero they reuse the same hyper-parameters for all games without game-specific tuning. The sole exception is the noise that is added to the prior policy to ensure exploration; this is scaled in proportion to the typical number of legal moves for that game type.
Like AlphaGo Zero, the board state is encoded by spatial planes based only on the basic
rules for each game. The actions are encoded by either spatial planes or a flat vector, again
based only on the basic rules for each game.