How I scored a 97% on MNIST(hw) from scratch in Java

Robert Hildreth
5 min readApr 2, 2021

Some time in the past;

I set out to design, develop, test, and experiment with a convolutional neural network. In designing and tuning networks I have learned a few things and have observed behaviors of characteristics of networks pertaining to how they, the behaviors, behave as they, the networks, begin to approach their, the network’s, resting place (e.g. how adjusting things such as learning rate or momentum shape the network’s approach to a solution). I know 92 percent correct isn’t much compared to how most modern approaches perform on MNIST, but a unique approach yielded a unique result, and I did all of it entirely from scratch.

I developed a convolutional network (or ‘net’ after) that focuses properly on images 28x28 pix² of handwritten digits 0–9 as defined to be the 60,000 training images of the MNIST handwritten dataset, followed by a testing across 10,000 new instances. The architecture opened the first conv-layer with 6x6 kernels with stride 2, then saw 2x2 kernels with stride 1, followed with stride 1 in both 3x3 and 5x5 kernels, respectively. *** These were not at all flattened and were not fed to a decision network. The result is 1x1xZ, where Z is the number of traits. Results are read directly from heat.

(L1) 6x6 s2 (L2) 2x2 s1 (L3) 3x3 s1 (L4) 5x5 s1

What follows is the use and conceptual form of my network.

I wrote this in java. There is a lot of help in java when it comes to native learning frameworks available and utilization of mechanisms such as TensorFlow or PyTorch. I would add reference here, but I have not personally tried any of them (other than having wonderful interactions with somebody’s DataFrame manager I was using for .csv file management, which can be useful for importing real world data), as I am a semi-fluent programmer and wondered if I could do it myself. I failed immediately and had to utilize another’s work to translate the IDX file acquired from simply googling MNIST and visiting the first result (at the time of writing) to literal .png images in my workspace. However, from there I successfully multi-threaded my image-file-to-Image-object translation method, send the kernel size, dimension, and striding length vectors (as hardcoded currently) to my own wonderful creation of a network to build with various layer activations (Don’t get your hopes up here: I get lazy as I grow older), seed the network, and begin training before finally running it through the testing set for final scoring. This process follows (somebody please help me embed code in this editor):

public static final void main(final String[] args) {

List<Image> trainImages = new LinkedList<>();
List<Image> testImages = new LinkedList<>();

try (final Stream<Path> paths = Files.walk(Paths.get(“G:/Users/me//NetBeansProjects/ImgNet/imgRepo/MNIST/Train/”))) {
trainImages = paths.parallel().filter(Files::isRegularFile)
.map(Image::new).collect(Collectors.toList());
} catch(final IOException e) {
System.out.println(e.getMessage());
System.exit(0);
}

try (final Stream<Path> paths = Files.walk(Paths.get(“G:/Users/me//NetBeansProjects/ImgNet/imgRepo/MNIST/Test/”))) {
testImages = paths.parallel().filter(Files::isRegularFile)
.map(Image::new).collect(Collectors.toList());
} catch(final IOException e) {
System.out.println(e.getMessage());
System.exit(0);
}

final Dimension[] layerDims
= new Dimension[]
{new Dimension(6,6),//->12
new Dimension(2,2),//->11
new Dimension(3,3),//->9
new Dimension(5,5),//->5
new Dimension(5,5)};//->1

final int[] depths = new int[]{6,7,22,87,10};
final int[] strides = new int[]{2,1,1,1,1};

final ImgNet network = new ImgNet(
layerDims,depths,strides,trainImages.get(0).channels()
);

network.seed();

train(network, trainImages);
test(network, testImages);

}

The actual network itself is something I’ve come to be proud of in that it has successfully performed two tasks. One, telling apart these two photographs:

Two, scoring wonderfully on the MNIST dataset.

I have admittedly not trotted this very far but consider, this was an old concept architecturally, and while very thoughtfully streamlined, it succumbs to its own nature in that otherwise semi-reasonable disturbances to the initial weights will lead to vagrant dissociation for epocs truly (a problem admittedly solved), and it must be custom-tailored per usage. In addition to weight initialization issues, it’s designed to be run on a multi-core cpu, not a gpu (this cpu cannot make up time against the Fast Fourier Transform matrix multiplication performed on a gpu, and is best suited instead for private usage should one bear the time), and does not make any usage of any adaptive learning rate, including momentum. In addition, it does not make use of any pooling layers, or specialized convolutions insomuch as dilated convolutions or their ilk.

MY OWN “NETWORK”:

What follows is wholeheartedly a personal endeavor, and I cannot promise that it functions as would your standard Vanilla Convolutional Neural Network, but it behaves as one and passes all unit and at least seemingly all functionality testing to date.

***Since I’ve had trouble lifting some of my networks off of the ground using ReLU (forgive me) as an activation function in the past, I was skeptical of not sticking with my good-old trustworthy TanH (for I have sinned) paired with a hot mixing of carefully tuned weights. This hereby seemingly self-imposed hinderance obliterated any concept I had on making a network of the usually advised 3x3 reductions (show me any reference on convolutional networks and I’ll show you their 3x3 convolution example, disparagers receive free cookies), due to the fact that it would also produce too ‘deep’ of a network to really squeeze out all the TanH helpfulness

I wonder to myself if I have to publish this in order to take a coffee break…

--

--