TMLAI-13100.pdf

Page 1 of 19

Transactions on Machine Learning and Artificial Intelligence - Vol. 10, No. 4

Publication Date: August, 25, 2022

DOI:10.14738/tmlai.104.13100. De Groff, D., & Neelakanta, P. S. (2022). A Novel Deep Learning ANN Supported on Langevin-Neelakanta Machine. Transactions on

Machine Learning and Artificial Intelligence, 10(4). 27-45.

Services for Science and Education – United Kingdom

A Novel Deep Learning ANN Supported on Langevin-Neelakanta

Machine

Dolores De Groff

Department of Electrical Engineering & Computer Science

Florida Atlantic University, Boca Raton, Fl 33431, United States

Perambur S. Neelakanta

Department of Electrical Engineering & Computer Science

Florida Atlantic University, Boca Raton, Fl 33431, United States

ABSTRACT

In the contexts of deep learning (DL) considered in artificial intelligence (AI) efforts,

relevant machine learning (ML) algorithms adopted refer to using a class of deep

artificial neural network (ANN) that supports a learning process exercised with an

enormous set of input data (labeled and/or unlabeled) so to predict at the output

details on accurate features of labeled data present in the input data set. In the

present study, a deep ANN is proposed thereof conceived with certain novel

considerations: The proposed deep architecture consists of a large number of

consequently placed structures of paired-layers. Each layer hosts identical number

of neuronal units for computation and the neuronal units are massively

interconnected across the entire network. Further, each paired-layer is

independently subjected to unsupervised learning (USL). Hence, commencing from

the input layer-pair, the excitatory (input) data supplied flows across the

interconnected neurons of paired layers, terminating eventually at the final pair of

layers, where the output is recovered. That is, the converged neuronal states at any

given pair is iteratively passed on to the next pair and so on. The USL suite involves

collectively gathering the details of neural information across a pair of the layers

constituting the network. This summed data is then limited with a specific choice of

a squashing (sigmoidal) function; and, the resulting scaled value is used to adjust

the coefficients of interconnection weights seeking a convergence criterion. The

associated learning rate on weight adjustment is uniquely designed to facilitate fast

learning towards convergence. The unique aspects of deep learning proposed here

refer to: (i) Deducing the learning coefficient with a compatible algorithm so as to

realize a fast convergence; and, (ii) the adopted sigmoidal function in the USL loop

conforms to the heuristics of the so-called Langevin-Neelakanta machine. The paper

describes the proposed deep ANN architecture with necessary details on structural

considerations, sigmoidal selection, prescribing required learning rate and

operational (training and predictive phase) routines. Results are furnished to

demonstrate the performance efficacy of the test ANN.

Keywords: Deep learning, Machine learning, Artificial Intelligence, Artificial neural

network, Logistic learning, Langevin-Neelakanta machine.

Page 2 of 19

Transactions on Machine Learning and Artificial Intelligence (TMLAI) Vol 10, Issue 4, August - 2022

Services for Science and Education – United Kingdom

INTRODUCTION

In the contexts of artificial intelligence (AI), the scope of the present study is focused to consider

deep learning (DL) schemes exercised using deep neural learning (or deep neural networking)

algorithms adopted with deep ANN architectures. As well known, the art of DL is a subset of

machine learning (ML), which refers to the process wherein the computers are rendered to

learn with a labeled and/or unlabeled set of training input data and hence make adjusted

predictions on the details of the supplied data. A relevant approach of DL involves using

massively interconnected switches (computational neural units) constituting the general

notions of artificial neural network (ANN), which can be designed in the image of real neural

complex to learn and recognize patterns akin to the functions of a human brain and nervous

system [1]-[3]. Relevantly, conceiving the deep ANN is inspired by the neural cognitive

structure of enormous complexity framed in the image of human brain and associated massive

neural connectivity.

Present study refers to using a novel ANN developed exclusively to simulate neural

computations consistent with the paradigm of DL exercised towards framing an AI domain

wherein, the human-like intelligent behavior is conceivably exhibited by a machine or a system

in extracting pattern details pertinent to a variety of complex entities such as pictures, texts,

sounds, etc. In order to produce accurate and adjusted insights in such predictions, the focused

approach of DL is to simulate the AI ambient to learn and process the information via deep ANN.

The paper is organized as follows: Commensurate with the background considerations on AI

and DL, the motivational considerations on proposed deep ANN architecture are presented in

the present introduction section (Section 1). Next section (Section 2) describes how the test

deep ANN is conceived vis-a-vis the notions of ANN with certain novel changes and ad hoc

additions necessarily introduced in conformance with unsupervised learning (USL) suite

towards compatible feature extraction and classification exercises involved. Section 3 is an

annotative note narrating the operational details of the proposed deep ANN and pertinent

heuristics are indicated. The architecture of the test deep ANN used in simulation studies is

illustratively explained in Section 4. Following the results and discussion outlined in Section 5,

concluding remarks are presented in Section 6.

In all, the proposed study refers to developing a deep ANN architecture for use in DL contexts

with certain novel details on its structural considerations, unsupervised training schedule,

deciding learning rate, adjusting the weights of neural interconnections with a coefficient

scaled by an appropriate sigmoidal squashing function and complete operational routines

towards predicting and recovering the required pattern in conformance with the input data.

DEEP ANN: WHAT IS IT?

Presented in this section are basic considerations defining a deep ANN and how it can be

conceived vis-a-vis the notions of traditional ANN. As an implied objective of this study, certain

novel changes and ad hoc additions necessarily introduced on the deep ANN are narrated in

conformance with unsupervised learning (USL) suite adopted towards compatible feature

extraction and classification exercises in the contexts of deep learning (DL) considered in

artificial intelligence (AI) efforts, exercised with an enormous set of input data (labeled and/or

unlabeled) so as to predict accurate features of labeled data present in the input data set.

Page 3 of 19

De Groff, D., & Neelakanta, P. S. (2022). A Novel Deep Learning ANN Supported on Langevin-Neelakanta Machine. Transactions on Machine Learning

and Artificial Intelligence, 10(4). 27-45.

URL: http://dx.doi.org/10.14738/tmlai.104.13100

Essentially, as well known an ANN refers to a mathematical model evolved as a neuromorphic

computational tool based on the image of biological neural complex as elaborated by

Neelakanta and De Groff in [1][2]. There are several versions of ANN in vogue and a compatible

structure can be chosen and modified on ad hoc basis to form the deep learning architecture

useful in DL contexts. The architecture pursued thereof in this study is similar to neocognitron

heuristics proposed by Fukushima in 1980 [4]. The “neocognitron” network is a refined scheme

of a self-organizing multilayered neural network called “cognitron” developed by Fukushima in

1975 [5]. Relevant network is self-organized by "learning without a teacher", and acquires an

ability to recognize the stimulus patterns based on any labeled (earmarked) similarity. Its

underlying Hebbian learning of a real neural complex conforms to “the synapse from

neuron x to neuron y is reinforced when x fires provided that no neuron in the vicinity of y is

firing stronger than y”. Hence, in cognitron (as well as, in neocognitron), the self-organization

favorably progresses layer-by-layer without having a “teacher” to instruct in all particulars how

the individual neural unit in each layer responds. In big-data contexts, the network becomes

self-organized such that, the receptive fields of the neural computational units/cells become

relatively larger in the deeper sets of layers; and, after repetitive presentations of several

stimulus/excitatory patterns, each neural unit in any layer integrates the information from

whole parts of the previous layer and selectively responds to a specific stimulus pattern or a

feature in the output.

Proposed deep ANN architecture

In the present study a novel deep ANN is conceived as illustrated in Figure 1. It is a hierarchical

network, which comprises even number of several layers: (L1, L2, ..., Lk; k is an even number).

Figure 1: Proposed deep ANN architecture constructed with k layers with each layer having N

neuronal units: One input layer, one output layer and (k – 2) hidden layers. The set of k layers

are paired into k/2 pairs, each pair denoted as P1, P2, ..., Pij, ..., Pk/2)

Page 4 of 19

Transactions on Machine Learning and Artificial Intelligence (TMLAI) Vol 10, Issue 4, August - 2022

Services for Science and Education – United Kingdom

Each of the layers is housed with identical number of a specified set of neuronal

(computational) units. Further, the layers are paired into forming the set, {P1, P2, ..., Pk/2) as

shown; and, there is a pattern of connectivity locally implied in each pair of layers. In all, the

network (as shown in Figure 1,) is divided into connected set of paired-layers. Considering any

given (ij)th pair, when the neuronal units in its first layer (ith layer) is addressed with a particular

pattern, the neurons of the succeeding interconnected layer (jth layer) sums up the contents and

prepares the summed-up details towards a converged self-organization. (The layers Li and Lj

correspond to S- and C-cells of neocognitron [4]). This self-organization is accomplished via

unsupervised learning (USL) briefly outlined below with reference to the illustrative details of

Figure 1 and 2.

Figure 2 An expanded illustration of USL

Focusing on any (ij)th pair of layers with interconnected neurons identified as Pij: Li ↔ Lj in

Figure 1, the state vectors of the associated set of neurons, {n1, n2, ..., N} in each layer can be

represented as, {(zi)1, (zi)2, ..., (zi)N} for the layer Li and {(zj)1, (zj)2, ..., (zj)N} for the layer Lj;

further, the N-components of neuronal states are interconnected by the set of weight states

{wij}. Hence, the following summation, yin can be specified:

y!" = ∑ [w!#

"%& ]" × z" (1)

The summed-up details of equation (1) is then adopted in USL towards realizing the converged

state of self-organization in the pair, Pij: Li ↔ Lj as shown in Figure 2. Summary of the tasks

involved are as follows (and more details will be furnished in later sections: The summed entity

yin is next subjected to a nonlinear activation with bipolar limits of (−1 and +1). That is, yin is

squashed with a sigmoidal function, F(yin) to yield an output yo = F(yin). The next step involved

in the USL refers to building a prorated value (k)ij equal to: a × [yave − y' × wij(old)].

Here, yave is the mean value of the excitatory inputs given by:

y()* = (

$) ∑ y" $

"%& (2)

The scaling constant (k)ij refers to the change in the value of wij, (expressed as ± Dwij) applied

iteratively to the interconnections, so that the existing weight, wij (old) modifies to a new value,

wij (new) until yin converges to a minimum value.

USL: Unsupervised

learning

(Expanded Scale)

yin

F(yin)

Layer Lj

yave

wij(old)

k =Dwij

Interconnections

Page 5 of 19

De Groff, D., & Neelakanta, P. S. (2022). A Novel Deep Learning ANN Supported on Langevin-Neelakanta Machine. Transactions on Machine Learning

and Artificial Intelligence, 10(4). 27-45.

URL: http://dx.doi.org/10.14738/tmlai.104.13100

In short, [wij (new) − wij (old)] = [k = ± Dwij]; and, this change is governed by the cybernetics of

self-organization pivoted on yave seeking the goal of convergence or global minimum observed

as [yin]min. And, the coefficient a defines the learning rate, which can be optimally chosen as

will be explained later.

OPERATIONAL DETAILS OF THE PROPOSED DEEP ANN

With reference to the architectural details of deep ANN in Figures 1 and 2, the steps of operation

can be stated as a sequential preprocessing of raw, excitatory input data, which is adopted to

train progressively (k/2) pairs of layers, {P1, P2, ..., Pk/2}; and. with each pair is subjected to USL

towards a specified goal of convergence. The USL performed independently on each unit of the

paired-layer set {[P1], [P2], ..., [Pk/2]} refers to a machine learning process of inferring the hidden

patterns from a set of some historical data addressed as input to a paired set of layers; hence,

the learning model tries to find any similarities, differences, patterns, and structure in the data

by itself. No prior human intervention is needed [6] [7].

Commencing from the input pair of layers of the multilayer deep architecture, any (input)

excitatory data supplied at the first input layer L1 progresses forward across interconnected

neurons of paired hidden layers; eventually, the output is observed as a converged map on the

N neurons at the final layer, Lk pertinent to the input details. Relevant output enables prediction

classification on labeled and unlabeled constituents of input stimulus/excitatory input. That is,

the required features in the stimulus input data are extracted at the terminal pair of layers of

the deep architecture as the resulting converged map of predictive details on labeled and

unlabeled contents.

The algorithm on unsupervised training pursued in the proposed deep ANN is based on two

analytical considerations:

(i) The first one refers to formulating a matrix of interconnection weights across identical

number of N neuronal units present in the adjacent pair of layers, Li and Lj framing the unit, Pi,j.

The [N × N] matrix relevant to the layers, Li and Lj corresponds to the set of weights {wij}.

Further, inasmuch as this matrix is symmetric, it can be specified in terms of a diagonal set of

associated eigen values (lij). The learning coefficient (a) is then specified in terms of (lij)

towards achieving fast convergence. In all, the associated multidimensional Hessian of the

interconnection matrix has eigenvalues denoting a measure of steepness of a surface of

convergence with iteratively adjusted values of the weights. A large eigenvalue would signify a

steep curvature implicitly depicting a small learning rate towards convergence of the said error

reaching a minimum value. That is, the learning rate should be inversely proportional to the

eigenvalue. The learning rate is indicated before as a part of scaling factor k prescribed as the

weight adjustment parameter. In summary, prescribing an appropriate learning rate (a) by

formulating a matrix of interconnection weights {wij} across identical number of N neuronal

units present in the adjacent pair of layers, Li and Lj (framing the unit, Pi,j); and, this

interconnection matrix [N × N]ij relevant to the layers Li and Lj being symmetric, it is specified

in terms of a diagonal set of associated eigen values {lij}. The learning rate (a) is then linked to

lij.

(ii) The second part of USL algorithm refers to specifying an appropriate squashing function,

F(yin) on the collective details (yin) of neural unit information gathered across the pair, (i, j) as

Page 6 of 19

Transactions on Machine Learning and Artificial Intelligence (TMLAI) Vol 10, Issue 4, August - 2022

Services for Science and Education – United Kingdom

shown in Figure 1. The squashing function proposed thereof conforms to the heuristics of the

so-called Langevin-Neelakanta machine [1]-[3]. In summary, the summed value yin in Figure 1,

corresponds to ensemble of neural information at the layer, Lj; and, the proposed scheme

involves prescribing a squashing transfer function, F(yin) on the collective neural information

gathered at the layer, Lj so that the resulting output (yo) remains limited (typically between ±

1). Conventionally, a simple hyperbolic tangent (sigmoidal) function [1] is adopted as the

squashing function; however, in the present study, the sigmoidal function chosen conforms to

the heuristics of the so-called Langevin-Neelakanta machine [1]-[3] as will be detailed later.

In the following subsections, details on aforesaid algorithms are explained.

Learning rate (a) algorithm

As indicated above, the learning/training schedule pursued conforms to unsupervised

(Hebbian) learning (USL) applied at each unit of interconnected pair of layers. Considering any

such paired-layer, Pij: (Li ↔ Lj), the weighted sum of neural information gathered from the set

of neurons, {n1, n2, ..., N} in layer Lj is given by equation (1) and denoted as yin in Figure 1. It is

then limited or compressed to a value, yo by a sigmoidal function, F(.); hence, yo = F(yin), which

is used to construct a scaling constant, (k)ij equal to: a × [yave − y' × wij(old)] where yave is the

mean value of the excitatory inputs given by equation (2). As stated before, this scaling constant

(k)ij refers to ± Dwij denoting the differential change in the value of wij applied iteratively to the

interconnections, until yin converges to a minimum value. And, the coefficient a defines the

learning rate, which can be optimally chosen as follows: Proposed thereof is a novel fast- convergence algorithm towards establishing the learning rate (a) based on eigenvalues of the

associated Hessian matrix of interconnection weights between N neurons in the pair of layers,

(Li ↔ Lj) [8] [9]. The learning rate so adopted varies dynamically with the input data used for

training and enables fast-convergence of learning at the USL.

In any iteration cycle of the USL, considering the neuronal states {z1,z2,..., zN} at the layer Li, the

associated multidimensional interconnection mentioned earlier is a Hessian matrix

(corresponding to an N × N square matrix); and this Hessian matrix can be put into a diagonal

form, denoted as [HD]. Because of the symmetry of the Hessian matrix, it has a unique, single

eigenvalue, lHD in the diagonal form as shown below (with all other eigenvalues being zero):

[��] =

⎣

⎢

⎡

0 0 0 0

0 0 0 λ+,⎦

⎥

⎤

(3)

Hence, in each iteration of USL schedule, the differential change in wij (expressed as ± Dwij)

exercised on the interconnections enables a corresponding error function e(wij) deciding the

value of yin. It is given by:

ε(w!#) = ∑ 8(z")! − λ!# × F(w!# × (z")#; $

"%& (4)

and its corresponding derivative, .Ɛ01!"2

.01!"2 can be determined as follows:

Page 8 of 19

Transactions on Machine Learning and Artificial Intelligence (TMLAI) Vol 10, Issue 4, August - 2022

Services for Science and Education – United Kingdom

Typically, the underlying data and the resulting values of the weights in the iterations of USL

schedule are mostly random entities. Relevantly, it is indicated in [2] that the associated F(.) is

a solution of a stochastic Bernoulli-Riccati differential equation developed in the context of

nonlinear input-output response of neural interaction. That is, pertinent dynamics refers to the

interacting neurons being regarded as a stochastic system of dipole activity. Hence, Neelakanta

et al. [3] extended Langevin’s theory of dipole polarization and obtained the following explicit

function for, F(.):

F(X) ≡ LQ(X) = (1 + 1/2Q) ×{coth[(1 +1/2Q) × X]} − (1/2Q) ×{coth[(1/2Q) × X]} (8)

where the parameter, Q denotes the stochastic state of the interacting entities bounded by the

limits, (1⁄2 ≤ Q < ¥). The lower bound on Q = 1/2 depicts a totally anisotropic state of statistically

interacting units; and, the other limit Q → ¥ implies a totally-isotropic state of stochastic

interactions involved. Hence, in the contexts of ANN, the configuration of neural interaction of

disorder states can be characterized in the bounding limits of (1⁄2 ≤ Q < ¥) as outlined and

summarized in Table 5.3 of [1]. Usage of LQ(.) as ANN-implied sigmoid leads to Langevin

machine concept as detailed in [1]-[3]. (Framing neuromorphic functioning via Langevin

dynamics and hence, a neural network based on stochastically justifiable sigmoidal function

was proposed in [1] and relevant considerations are extended to discrete Langevin machine in

[3]). In summary, as regard to deep ANN study pursued currently, it is proposed to adopt the

aforesaid LQ(yin) sigmoidal squashing in the USL loop of Figure 1.

TEST DEEP ANN FOR SIMULATION STUDIES

The efficacy of the proposed deep ANN method is cross-verified against details of simulated

results obtained with a test architecture illustrated in Figure 3. Relevant architectural details,

data adopted towards simulations and for cross-verification towards predictions and

operational routines are described in the following sections:

The test architecture of deep ANN shown in Figure 3 is consistent with the general structure

depicted in Figure 1 and constructed with 6 layers as shown: One input layer, one output layer

and four hidden layers. Each layer in the network of Figure 2 has seven neuronal units and the

set of six layers are paired into three pairs represented as: P1: (L1 ↔ L2); P2: (L3 ↔ L4) and P3:

(L5 ↔ L6). The implied USL is exercised on each of the three pairs and illustrated in expanded

scale in Figure 2.

Simulations performed on the test network and the details of data used in the training and

prediction phases, are as follows: As well known, the ANN has the ability to learn or get trained

from experience (or from the existing data). In general, the exhaustively known details adopted

for ANN training would conform to a big-data universe wherein the information/pattern to be

predicted remains obscured as an admixture of random constituents of labeled and unlabeled

in the stochastic framework of large data. The learning/training exercise imposed on the test

ANN as seen earlier, involves adjustment of weight coefficients pertinent to massively

interconnected neuronal units; and, the adjustment is based on either by capturing the

randomness (stochastic aspects) of the stimuli (depicting the input training data) and/or by

using information present in a set of tagged (labeled and unlabeled) classification flags present

in the input data. Eventually, by virtue of trained details, the test ANN is rendered to predict the

desired pattern or labeled details in a given set of data. Further, it will be shown (in a later

Page 10 of 19

Transactions on Machine Learning and Artificial Intelligence (TMLAI) Vol 10, Issue 4, August - 2022

Services for Science and Education – United Kingdom

In all, objectively the tasks in deep neural networking imply processing a set of raw input data

so as to extract required features or the desired/labeled data from it. The extracted details are

based on the inherent capability of classification in an arbitrary input-output mapping and

specify a predictive classification of the desired data at the output. That is, the ANN is trained

to classify the labeled and unlabeled data supplied as the input and spot the labeled pattern as

predictive output. Relevant training scheme in the proposed deep ANN simulation is based on

unsupervised learning (USL) method indicated earlier in Figures 1 and 2.

Consistent with the algorithmic suites described earlier for the test deep ANN, relevant

simulation studies indicated here conform to a set of the input data {y1, y2, ..., y7} addressed at

the input layer and constructing a corresponding transpose [y1, y2, ..., y7]T; hence, a [7 × 7] Hessian

matrix [H] is specified as follows: [HD] = [y1, y2, ..., y7]T. [y1, y2, ..., y7]. The input data set used in the

simulations consists of normalized values attributed to seven observed data pertinent to an

arbitrary, random set of epochs. Corresponding [HD] refers to a symmetric square [7 × 7]

matrix yielding one non-vanishing eigenvalue as a diagonal element. That is, for a given

ensemble of input data set there is one non-vanishing eigenvalue (lij) for the pair of

interconnected layers Li and Lj. The learning coefficient adopted towards weight adjustments

is linked to this eigenvalue as will be indicated later. For training the test ANN of Figure 3, four

sets of independent data, (each set consisting of six unlabeled details plus one labeled data) are

gathered from an observed universe of random epochs as listed in Table 1. While the unlabeled

entries refer to stochastic details of the random epochs, the labeled entity, marked (bold) as X

is a deterministic information on the pattern or desired details of focused relevance in the

stochastic occurrence of epochs. The discrete value of X can be addressed randomly at any one

of the input neurons.

Page 15 of 19

De Groff, D., & Neelakanta, P. S. (2022). A Novel Deep Learning ANN Supported on Langevin-Neelakanta Machine. Transactions on Machine Learning

and Artificial Intelligence, 10(4). 27-45.

URL: http://dx.doi.org/10.14738/tmlai.104.13100

Table 2: Example data set, {y1, y2, ..., y7}: SET P containing unlabeled and labeled entities

considered for prediction phase simulations

Data

SET P

Parameters used in USL

schedule

n Values of {yn} and C

1 y1 0.138655049

Learning coefficient:

a = 0.1620 (As specified by

the maximum eigen-value

used in the training phase)

2 y2 0.164597563

3 y3 0.688897953

4 y4 0.623615765

5 C 1.000000000

6 y6 0.730366133

7 y7 0.860958783

Data C is the labeled entity to be

classified and identified in the

output set of the prediction phase.

(It can be present randomly at

any of the input neurons, n: 1, 2,

..., 7)

Perform

→ With the initializations as above and prediction phase data set SET P, of Table 2, do USL

operations on the three paired layers as done in training phase

Output

→ With the completion of USL implemented on the three paired layers of Figure 3, the

observed set of {y1, y2, ..., y7} on layer L6 of the last/output pair: L5 and L6 is noted. It depicts the

required prediction details in the output

Result

← That is, the predicted results on the input data of Table 2 refer

to the converged values on neural units {y1, y2, ..., y7} of layer L6

→ Relevant output data set: SET R is presented in Table 3

Page 16 of 19

Transactions on Machine Learning and Artificial Intelligence (TMLAI) Vol 10, Issue 4, August - 2022

Services for Science and Education – United Kingdom

Table 3: Result: SET R containing the mix of unlabeled and labeled entities observed as output

data in layer L6 with SET P details addressed as the input

RESULT

Data SET R

Identified output corresponding to the

labeled entity C in the input data of

Table 2

Values of {yn}

Output

Q = 0.5 Q =1.0

1 y1 0.9867 0.9129

Labeled

value

classified

as output

Percentage error:

[(1.0000−0.9867)/1]

×100

2 y2 0.9867 0.9129

0.9867

3 y3 0.9867 0.9129 1.33 %

4 y4 0.9867 0.9129

5 y5 0.9867 0.9129

6 y6 0.9867 0.9129

7 y7 0.9867 0.9129

1. Learning coefficient: a = 0.1620 (As specified by the maximum

eigen-value used in the training phase)

2. Learning curves obtained are shown in Figures 4 and 5

3. The output values 0.9867) of {yn} shown are close to the data C =

1 of the input set of details in SET P used in prediction phase. It

conforms the required classification as the labeled entity by the

test deep ANN

Finding the data to be classified as the pattern details sought

← List the output data set {yn} of the prediction phase at the output

layer L6 as SET R shown in Table 3

→ One or more values of {yn} in SET R conform to the required data C corresponding to the

input set of details in SET P used in prediction phase classified as the labeled entity.

End

RESULTS AND DISCUSSION

The present study refers to a multilayer deep ANN network, which uses unsupervised learning

(USL) scheme across a hierarchical set of paired-layers with a pattern of connectivity locally in

those layers. In the following subsections, summary of the operational details is outlined: In the

test multilayer deep ANN network, the USL or training scheme progresses along the paired

stack of layers. The inputs are addressed at the first layer and USL is performed on the first pair

of layers. The resulting interconnection weights of this input pair upon convergence is stored

frozen. The converged values on neuronal states in the second layer are then transferred to the

third layer. Hence, the pair of third and fourth layers are trained and converged details on

weights are stored; and, the converged neuronal states at fourth layer is passed on to the next

(fifth) layer and so on. This procedure is carried out across entire set of paired layers of the

Page 17 of 19

De Groff, D., & Neelakanta, P. S. (2022). A Novel Deep Learning ANN Supported on Langevin-Neelakanta Machine. Transactions on Machine Learning

and Artificial Intelligence, 10(4). 27-45.

URL: http://dx.doi.org/10.14738/tmlai.104.13100

designed architecture and adopted in both training and prediction phases. The output being

sought refers to the final converged details of the last (output) layer corresponding the

prediction phase. The internal (local) calculations between the adjacent layers in any paired set

depend upon the neuronal state details coming from the previous layers.

The proposed deep ANN conceived for ML in big data environments of AI has the following

uniquely specified novel features: (i) The algorithm prescribes at each pair of layers a learning

coefficient inversely proportional to maximum of computed Hessian eigenvalues evaluated for

the ensemble data sets processed across the paired set of layers in training and prediction

phases. This prescription enables a desirable fast convergence of learning involved; and, (ii)

necessary sigmoidal squashing is adopted while performing the USL, which conforms to

Langevin-Neelakanta machine concepts. The choices as above rely on the motivational

considerations towards fast convergence of the test deep ANN even under big data ambient,

yielding the result on correct classification of labeled entity present in the input data analyzed.

Figure 4: Learning curves depicting the observed error e in yin versus number of iterations in a

USL loop of Figure 3 obtained (normalized) with the ensemble of data SET P in Table 2 for two

cases using an optimum value of a = 0.1620 (as deduced by the proposed method): (A) with Q =

0.5 and (B) with Q = 1.0

Presented in Figures 4 and 5 are examples of learning-curves obtained for the exemplar

ensemble of input data (of SET P) of the prediction phase. The learning curves presented are

pertinent to two cases: (i) With learning coefficient, a = 0.1620, inversely proportional to the

maximum value of computed Hessian eigenvalues evaluated for the ensemble data sets; and,

(ii) with a equal to an arbitrary value: 0.001. As seen in Figures 4 and 5, the convergence is

faster with the learning rate based on a = (1/lmax) = 0.1620. Also, the convergence is influenced

by the value of Q adopted in the USL loop. Further, the chosen value of Q decides the accuracy

of predicted output with respect to the labeled value implied at the input as in Table 3.

0 5 10 15 20 25 30 35 40 45 50

Number of iterations

1.0000

0.9991

0.9987

0.9982

0.9978

0.9974

0.9995

0.9969

Error: e

Page 18 of 19

Transactions on Machine Learning and Artificial Intelligence (TMLAI) Vol 10, Issue 4, August - 2022

Services for Science and Education – United Kingdom

Figure 5: Learning curves depicting the observed (normalized) error e in yin versus number of

iterations in a USL loop of Figure 3 obtained with the ensemble of data SET P, in Table 2 for two

cases using an arbitrary value of a = 0.001: (X) with Q = 0.5 and (Y) with Q = 1.0

CONCLUDING REMARKS

The present study demonstrates a promising application of deep ANN for classifying and

predicting a labeled pattern present as an admixture of details in a big data environment. A

multilayered deep ANN is described thereof with an architecture of successively placed set of

paired layers, each containing N neuronal units that are massively interconnected. Each pair of

layers is independently trained via unsupervised learning schedule. The underlying training

and prediction phases are implemented with specifically defined algorithms on sigmoidal

squashing involved in defining the weight adjustment parameter and learning rate pursued

independently in the USL loop of each pair of layers.

To illustrate the efficacy of the proposal, simulated details on training a test architecture and

evaluating the prediction phase data are furnished. Results on fast prediction of labeled data

(that could be obscurely present in the stochastic mix of unlabeled and labeled patterns in the

input) concur with the novel heuristics and suggested algorithms proposed. In essence, the

following can be indicated as salient features of the novel deep learning ANN supported on

Langevin-Neelakanta machine:

Machine learning in big data environment objectively requires a deep learning ANN

architecture to handle enormous data and achieving fast convergence. The computational

power of traditional recurrent neural networks depends ultimately on the complexity of

massive interconnections of the network. Relevant implications of information processing are

decided by Kolmogorov complexity associated with the size of the network architecture and

extents of weights present. Any neural net learning algorithms aim at finding simple nets to

assimilate the training (test) data robustly and lead to better the generalization on test data in

the prediction phase [10][11]. This complexity issue grows significantly with increased number

of hidden layers manifesting as computational bottleneck in handling large data and involving

excessive processing time towards convergence. This issue is obviously serious in big data

environment; as such, novelty in conceiving better deep ANN architecture is a motivated goal

0 500 1000 1500 2000 2500 3000

Number of iterations

1.0000

0.9948

0.9896

0.9846

0.9793

0.9740

0.9689

Error: e

Page 19 of 19

De Groff, D., & Neelakanta, P. S. (2022). A Novel Deep Learning ANN Supported on Langevin-Neelakanta Machine. Transactions on Machine Learning

and Artificial Intelligence, 10(4). 27-45.

URL: http://dx.doi.org/10.14738/tmlai.104.13100

towards ML efforts in AI contexts. Hence, the present study on deep ANN has been indicated

with the feasibility to accommodate a large number of cascaded sets of independent paired

layers; and, each pair is trained via USL towards self-organization. In the relevant effort, the

algorithmic suites pursued enable fast convergence with a uniquely specified learning rate as

well as, achieving better (more accurate) results on prediction by resorting to Langevin- Neelakanta machine concept with the associated squashing function in the USL loop. The

architecture of using a cascaded set of individualized paired set of layers is akin to that of

Fukushima’s cognitron [4][5], but modified to match the motivated context of the present study.

Simulation results presented establish the desirable efficacy of the proposed concept.

References

1. Neelakanta, P. S., De Groff, D., Neural Network Modeling: Statistical Mechanics and Cybernetic Perspectives,

1994, CRC Press, Boca Raton, Fl: USA.

2. Neelakanta, P. S., (Editor), Information-Theoretic Aspects of Neural Networks, 1999, CRC Press, Boca Raton,

Fl: USA.

3. Neelakanta, P. S., Sudhakar, R., and DeGroff, D., Langevin machine: A neural network based on stochastically

justifiable sigmoidal function, Biological Cybernetics, 1999, 65, p. 331-338.

4. Fukushima, K. Neocognitron: A self-organizing neural network model for a mechanism of pattern

recognition unaffected by shift in position. Biological Cybernetics, 1980, 36, p.193-202.

5. Fukushima, K. Cognitron: A self-organizing multilayered neural network. Biological Cybernetics, 1975, 20,

p.121-136.

6. Becker, S., Plumbley, M. Unsupervised neural network learning unsupervised neural network learning

procedures for feature extraction and classification. Applied Intelligence, 1996, 6, p.185-203.

7. Wittek, P., Unsupervised learning in quantum machine learning, Elsevier Inc/Morgan Kaufmann, 2014,

Cambridge, MA: USA.

8. De Groff, D., Neelakanta, P. S., Faster convergent neural networks, International Journal of Computers and

Technology, 2018, 17(1), p.7126-7132.

9. Margoulas, G. D., Vrahatis, M. N., Androulakis, G.S. Improving the convergence of the backpropagation

algorithm using learning rate adaptation methods, Neural Computation, 1999, 11(7), p. 1769-1796.

10. Schmidhuber, J. Discovering neural nets with low Kolmogorov complexity and high generalization

capability. Neural Network, 1997, 10(5), p.857-873

11. Balcázar, J. L. Computational power of neural networks: A characterization in terms of Kolmogorov

complexity. IEEE Transactions on Information Theory, 1997, 43(4), p.1175-1183.