Convolutional Neural Network Workbench by Filip D’haene (from http://www.codeproject.com/Articles/140631/Convolutional-Neural-Network-MNIST-Workbench)

Following are my notes on a great Neural Network project written in C# that is widely used as a benchmarking and learning reference.

Keywords

Convolutional Neural Network – is a neural network that uses only local connections and shared weights.

MNIST dataset of handwritten digits – The MNIST database of handwritten digits, available from this page, has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image. It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.

CIFAR-10 dataset of 10 different natural objects – 

Background

Article by Mike O’Neill on the The Code Project

Dr. Yann LeCun’s paper: Gradient-Based Learning Applied to Document Recognition

An award winning project by another developer

Key Programming Mechanisms

Create a new Neural Network – NeuralNetworks network = new NeuralNetworks

Create a new layer – network.Layers.Add(new Layers(..

Assign initial (random) weights – network.InitWeights(..

Using lists (as opposed to arrays) to manage items – List<bool> listname = new List<bool>(row * column)

{ member00, member01, .member02,..
member10, member11, member1,3,..
};

Compare lists to other types of Collections

Persistence

DataProvider class – for loading MNIST and CIFAR-10 data

NeuralNetworkDataSet – for loading and saving Neural Network definitions (with weights) to disk files

Architecture

This application is developed using the Model, View, View-Model pattern (MVVM) as applied to a Windows Presentation Foundation project.

Moving Forward

The developer of this project expressed his hopes that..

“..there’s someone out there who can actually use this code and improve on it. Extend it with an unsupervised learning stage for example (encoder/decoder construction), or implement a better loss-function (negative log likelihood instead of MSE); extend to more test databases; make use of more advanced squashing functions, etc.”


Advertisements

Artificial Intelligence For Credit Approvals

We are coming back to a previous article (Build Neural Network With MS Excel) to examine one of several excellent examples given by www.xlexpert.com. One of the examples given was on the use of a Neural Network with a Microsoft Excel spreadsheet to determine the credit risk of loan applicants.

Although the article presents excellent real world applications of Neural Networks at an introductory level, it is probably aimed for a technical audience for whom the mathematical and algorithmic details would not be too daunting. What we are going to do here is to attempt to re-package the presentation for a business audience in a way that they can understand and evaluate (and hopefully appreciate) the benefits offered by Neural Networks in the particular area of business application.

The model accepts 10 parameters for each loan applicant. These are all numerical or otherwise expressed in numerical terms.

  • Age
  • Marital status (1 for married and 0 for single)
  • Occupation (0=unemployed, 0.3=professional, 0.45=blue collar, 0.6 manager, 0.75=office, 0.9=principal, 1=retired)
  • Gender (1=male, 0=female)
  • Address time
  • Job time
  • Checking account type? (1=yes, 0=no)
  • Savings balance amount
  • Payment history
  • Home ownership (1=own, 0=rented)

The expected outcome or answer expected out of the evaluation process is

  • Credit risk level (0=low, 1=high)

Getting into a little bit of technicality, this is how the Neural Network model is going to be constructed

image

What the diagram basically shows is that the model will be constructed in 3 layers

  • Layer 1 – I1 to I10 – represents the input parameters fed into the evaluation
  • Layer 2 – H1 to H6 – is the “hidden’ processing layer, unseen to the outside world
  • Layer 3 – O1 only – represents the evaluation result output by the system

The diagram shows part of the connections between the layers. Note that H1 receives it’s input from all the input parameters I1 to I10. Similarly, each of H2 to H6 receives its inputs from all the input parameters. This is not shown in the diagram for simplicity’s sake.

The diagram also shows that O1 is connected to all of H1 to H6.

Each connecting line represents the flow of evaluation data between connected points. Each connecting line actually carries a weightage which represents the importance of that connection in the evaluation process. The connection weightage will be adjusted in what is called a Learning Process.

During the learning process, actual data with known actual outcomes will be fed into the system. During the learning process, the weightage of each connection will be adjusted until a sufficient threshold of correct evaluation on the number of sample data presented is achieved.

In other words, during the early stages of the learning process, the model is expected to produce errors in outcomes as compared to the expected outcome for the data that is used. However, the learning process involving calibration of weightages will be carried in out over several learning cycles until a set of weightages is obtained to produce the expected outcome. This is tested against a large number of sample data to ensure it’s accuracy across a wide range of possibilities.

The calibration of weightages does not occur randomly. It uses a process called Back Propagation, in which the rate of change in expected outcome is compared against the rate of change in weightage values. This comparison is then used to guide the direction in which the weightage values are refined over repetitive cycles.

Initially random values that range between –1 and 1 is used as weightage for each connection. A –1 value represents perfect negative influence or in other words, an inverse relationship, between the input value and the outcome, while 1 represents a perfectly positive influence on the outcome.

A few other parameters are also input into the process. These additional parameters define the limits of the repetitive refinement process, otherwise the model may just keep running in search of the perfect solution. These additional parameters often define the degree of accuracy or inaccuracy in prediction that the user is willing to tolerate, which determines the number of repetitions the model is going to carry out. Often the trade off is between a lower accuracy but faster evaluation process against a higher accuracy but a lower accuracy process.

At the end of the training session, the model would have within itself a set of acceptable weightage which produced the expected outcome within the tolerable range of accuracy.

Having trained the model, the next step is to apply the model to new and untested data. This is where the model applies the weightages that it had learned to the untested data, and produces a prediction whether the untested data represents good or bad credit risk.

Models like this can continue to be trained over time. This ensures that the weightages applied by the model keeps abreast with current realities in the marketplace as well as getting a higher accuracy by learning from a large number of samples. For example, Singles may represent high credit risk 10 years ago, but trends have changed and presently those who are Married represents higher credit risks than Singles. New data with known outcomes (after several years the actual outcomes of credit risk predictions will actually be known) can be fed into a continuous learning process.

Records of the learning process can be kept for future reference too.