How to play Lights Out with a Neural Network in Java

Robert Hildreth
5 min readOct 27, 2021

Part 2 (of 2): Training and Loosing

The example will utilize a basic Neural Network similar in essence to this one. It will have a single and relatively wide hidden layer and 25 outputs corresponding to different clicks. Current state of the project can play on 13-click boards with > 99% success rate in finding a solution. These solutions are often not direct but are interesting nonetheless to watch and will be anywhere from 13–20 or more clicks. By ’13-click board’ I mean that the minimum number of clicks required to solve it is 13.* This results in a very difficult and unintuitive board and I encourage exploring this game yourself at various ‘click depths.’

I have found lights out, by recreating this project over the years handfuls of times with different networks, a next to unsolvable problem if attempted by simply generating a dataset and training over the dataset before unleashing the network into the wild to play whimsically. This almost never works well and is a bit disheartening to try to implement if you want a network capable of solving anything more than 5-click boards. This is also why I decided to write this how-to, how-why, and to share the project.

Instead of creating and using a dataset of board samples, I find it extraordinarily more useful to ‘throw the network into the deep end’ by giving it a board to work on and training it along the way as it makes its decisions, that is to say ‘letting it swim.’ As is, the algorithm will query the network with the state of the board up to 10 times before moving on to the next board, and will move on to the next board if this board becomes solved. After each answer from the network, each positive output will be interpreted as a command to click and each click will be successively carried about before the network is asked to make another set of moves. This means the network makes decisions in ‘click chunks.’ However, during this training, the network will be shown the board at every click along the way of the ‘click chunk’, and will be trained at every step it takes.

This style of training will continue until 30000 boards have been solved. Do not be one to quickly discourage if running this program as-is, as the first solution may take one to several minutes to manifest. However, the solution rate should only increase until it counts solutions relatively by the hundreds. If immediate results are desired, the training boards can be changed from requiring 13 clicks to something fewer, like 6. The downside to decreasing the complexity of the main training boards is typically poorer performance on more complex boards after the main training phase. However, given a moderately complex board for training, the network can be taught to ‘punch up’, meaning training a network on 7 click boards can be sufficient for stellar performance on 15 click boards. However, training on something like 5 click difficulty, while quick, is not sufficient. In addition, during this main training period the window is free to interact with in order to play the game by yourself.

Once this training period is over, the window will begin to update based on what the network is currently trying to work on, and you can watch the trained network play. Should the network solve the board across its newly allotted 15 instruction chunks, or eventually fail in trying, the board will visually pause momentarily before reconstructing the next board with which to challenge the network. In the eventual case that the network gets stuck, another instance of training takes place just once before the board is discarded. This last training sample seems to ‘fine-tune’ the network as it plays, allowing it to jump from somewhat near a likely 96% to upwards of over 99 % success rate as presented.

Remember: clickHistory() actually returns a shortest list of points to the solution at any instance (our oracle)

The game is designed to have a small delay between actions taken on the network’s behalf, currently 123 milliseconds between clicks. This allows the user to actually see what is happening, as it would play so quickly that the window would blur between colors. However, this delay is only applied if the window has not been minimized. If the window is minimized, it plays continuously with no delay. This way, you can ‘jump to the future’ by minimizing the window for a few seconds, as window deiconfication (renormalization) reinstates the delay and you will see the network playing after several thousand boards have cycled through either solved or failed. This is a feature used mostly for performance monitoring, as I keep track of the solution-to-failure ratio over its deployment.

The only thing required to get this up and running is a neural network (notice the two imports from trust.net at the top of the Player class.) The one I’m using here operates on and returns float arrays in Java. The easiest drop-in replacement then should do something similar. If the network you utilize operates on different data primitives or in different formats, some light surgery may need to be practiced in order to include it in place of the sample API shown.

PS — I have decided to include the full code of the project, including the neural network that drives it, in a folder on GitHub.

  • Note that simply clicking a solved board 13 times may not result in a true 13-click board, as it may be solvable in a fewer number of clicks than created it. Thus, creating a real 13 click board by random clicks is actually unlikely and gets more taxing as the click count increases. This does not stop me from doing it, but is a prime point of available optimization.

--

--