Observations

The main purpose of BackPropagationVisualization is to allow observing how the weights change as the neural network is trained. It also allows you to compare different sets of parameters (i.e. varying the learning rate), and see the effect on the training (speed or convergence, success of convergence). Additionally you can see how different sets of training data produce different patterns of training. Some observations of the training patterns of the 3 available logical operators are discussed below...

AND

The AND operator rarely converges, often resulting in an output activation value of around 0.5 for the {true, true} training case. Whether it successfully converges or not seems to be dependent on certain patterns of initialized weights. Correct convergence usually coincides with 1 of the hidden to output weights being positive (whereas non-convergence results in all negative hidden to output weights). The most success I had with getting AND to converge was using a batch size of 4 with a low learning rate (~0.2). Interestingly I was able to get to converge more reliably when I had a bug in the Train() method which performed a weight update after every epoch. If I selected a batch size of 3 in this case (meaning a weight update occurred after the first 3 training examples and then also after the last {true, true} example in every epoch) the AND operator would consistently converge.

OR

Using a high learning rate tends to correctly converge quite quickly (within 1000-2000 epochs). The same result seems to occur regardless of the batch size.

XOR

Using a batch size of 2 and a high learning rate tends to cause the weights to all move towards 0 (and hence not converge to a result overall). A batch size of 1 tends to converge to a result quickly, but the result for {true,true} is incorrect (~0.5). Batch size 3 tends to produce the same result, but converge more slowly. Using batch size 4 seems to often 'hover' around very small weight values before suddenly jumping to strong weight values and converging correctly after between 5000 and 25000 epochs (try batch size 4, and learning rate of 0.9).

You can also experiment with the effect of unusual settings and actions, for example changing the batch size or even the logical operator while training. Changing the logical operator in this case can be quite interesting... i.e. to see if the neural network can 'adapt' to different training data after already converging to a result (this can work after first training OR and then changing and continuing to train on XOR).