Page 1 of 19
Transactions on Machine Learning and Artificial Intelligence - Vol. 10, No. 4
Publication Date: August, 25, 2022
DOI:10.14738/tmlai.104.13100. De Groff, D., & Neelakanta, P. S. (2022). A Novel Deep Learning ANN Supported on Langevin-Neelakanta Machine. Transactions on
Machine Learning and Artificial Intelligence, 10(4). 27-45.
Services for Science and Education – United Kingdom
A Novel Deep Learning ANN Supported on Langevin-Neelakanta
Machine
Dolores De Groff
Department of Electrical Engineering & Computer Science
Florida Atlantic University, Boca Raton, Fl 33431, United States
Perambur S. Neelakanta
Department of Electrical Engineering & Computer Science
Florida Atlantic University, Boca Raton, Fl 33431, United States
ABSTRACT
In the contexts of deep learning (DL) considered in artificial intelligence (AI) efforts,
relevant machine learning (ML) algorithms adopted refer to using a class of deep
artificial neural network (ANN) that supports a learning process exercised with an
enormous set of input data (labeled and/or unlabeled) so to predict at the output
details on accurate features of labeled data present in the input data set. In the
present study, a deep ANN is proposed thereof conceived with certain novel
considerations: The proposed deep architecture consists of a large number of
consequently placed structures of paired-layers. Each layer hosts identical number
of neuronal units for computation and the neuronal units are massively
interconnected across the entire network. Further, each paired-layer is
independently subjected to unsupervised learning (USL). Hence, commencing from
the input layer-pair, the excitatory (input) data supplied flows across the
interconnected neurons of paired layers, terminating eventually at the final pair of
layers, where the output is recovered. That is, the converged neuronal states at any
given pair is iteratively passed on to the next pair and so on. The USL suite involves
collectively gathering the details of neural information across a pair of the layers
constituting the network. This summed data is then limited with a specific choice of
a squashing (sigmoidal) function; and, the resulting scaled value is used to adjust
the coefficients of interconnection weights seeking a convergence criterion. The
associated learning rate on weight adjustment is uniquely designed to facilitate fast
learning towards convergence. The unique aspects of deep learning proposed here
refer to: (i) Deducing the learning coefficient with a compatible algorithm so as to
realize a fast convergence; and, (ii) the adopted sigmoidal function in the USL loop
conforms to the heuristics of the so-called Langevin-Neelakanta machine. The paper
describes the proposed deep ANN architecture with necessary details on structural
considerations, sigmoidal selection, prescribing required learning rate and
operational (training and predictive phase) routines. Results are furnished to
demonstrate the performance efficacy of the test ANN.
Keywords: Deep learning, Machine learning, Artificial Intelligence, Artificial neural
network, Logistic learning, Langevin-Neelakanta machine.
Page 2 of 19
28
Transactions on Machine Learning and Artificial Intelligence (TMLAI) Vol 10, Issue 4, August - 2022
Services for Science and Education – United Kingdom
INTRODUCTION
In the contexts of artificial intelligence (AI), the scope of the present study is focused to consider
deep learning (DL) schemes exercised using deep neural learning (or deep neural networking)
algorithms adopted with deep ANN architectures. As well known, the art of DL is a subset of
machine learning (ML), which refers to the process wherein the computers are rendered to
learn with a labeled and/or unlabeled set of training input data and hence make adjusted
predictions on the details of the supplied data. A relevant approach of DL involves using
massively interconnected switches (computational neural units) constituting the general
notions of artificial neural network (ANN), which can be designed in the image of real neural
complex to learn and recognize patterns akin to the functions of a human brain and nervous
system [1]-[3]. Relevantly, conceiving the deep ANN is inspired by the neural cognitive
structure of enormous complexity framed in the image of human brain and associated massive
neural connectivity.
Present study refers to using a novel ANN developed exclusively to simulate neural
computations consistent with the paradigm of DL exercised towards framing an AI domain
wherein, the human-like intelligent behavior is conceivably exhibited by a machine or a system
in extracting pattern details pertinent to a variety of complex entities such as pictures, texts,
sounds, etc. In order to produce accurate and adjusted insights in such predictions, the focused
approach of DL is to simulate the AI ambient to learn and process the information via deep ANN.
The paper is organized as follows: Commensurate with the background considerations on AI
and DL, the motivational considerations on proposed deep ANN architecture are presented in
the present introduction section (Section 1). Next section (Section 2) describes how the test
deep ANN is conceived vis-a-vis the notions of ANN with certain novel changes and ad hoc
additions necessarily introduced in conformance with unsupervised learning (USL) suite
towards compatible feature extraction and classification exercises involved. Section 3 is an
annotative note narrating the operational details of the proposed deep ANN and pertinent
heuristics are indicated. The architecture of the test deep ANN used in simulation studies is
illustratively explained in Section 4. Following the results and discussion outlined in Section 5,
concluding remarks are presented in Section 6.
In all, the proposed study refers to developing a deep ANN architecture for use in DL contexts
with certain novel details on its structural considerations, unsupervised training schedule,
deciding learning rate, adjusting the weights of neural interconnections with a coefficient
scaled by an appropriate sigmoidal squashing function and complete operational routines
towards predicting and recovering the required pattern in conformance with the input data.
DEEP ANN: WHAT IS IT?
Presented in this section are basic considerations defining a deep ANN and how it can be
conceived vis-a-vis the notions of traditional ANN. As an implied objective of this study, certain
novel changes and ad hoc additions necessarily introduced on the deep ANN are narrated in
conformance with unsupervised learning (USL) suite adopted towards compatible feature
extraction and classification exercises in the contexts of deep learning (DL) considered in
artificial intelligence (AI) efforts, exercised with an enormous set of input data (labeled and/or
unlabeled) so as to predict accurate features of labeled data present in the input data set.
Page 3 of 19
29
De Groff, D., & Neelakanta, P. S. (2022). A Novel Deep Learning ANN Supported on Langevin-Neelakanta Machine. Transactions on Machine Learning
and Artificial Intelligence, 10(4). 27-45.
URL: http://dx.doi.org/10.14738/tmlai.104.13100
Essentially, as well known an ANN refers to a mathematical model evolved as a neuromorphic
computational tool based on the image of biological neural complex as elaborated by
Neelakanta and De Groff in [1][2]. There are several versions of ANN in vogue and a compatible
structure can be chosen and modified on ad hoc basis to form the deep learning architecture
useful in DL contexts. The architecture pursued thereof in this study is similar to neocognitron
heuristics proposed by Fukushima in 1980 [4]. The “neocognitron” network is a refined scheme
of a self-organizing multilayered neural network called “cognitron” developed by Fukushima in
1975 [5]. Relevant network is self-organized by "learning without a teacher", and acquires an
ability to recognize the stimulus patterns based on any labeled (earmarked) similarity. Its
underlying Hebbian learning of a real neural complex conforms to “the synapse from
neuron x to neuron y is reinforced when x fires provided that no neuron in the vicinity of y is
firing stronger than y”. Hence, in cognitron (as well as, in neocognitron), the self-organization
favorably progresses layer-by-layer without having a “teacher” to instruct in all particulars how
the individual neural unit in each layer responds. In big-data contexts, the network becomes
self-organized such that, the receptive fields of the neural computational units/cells become
relatively larger in the deeper sets of layers; and, after repetitive presentations of several
stimulus/excitatory patterns, each neural unit in any layer integrates the information from
whole parts of the previous layer and selectively responds to a specific stimulus pattern or a
feature in the output.
Proposed deep ANN architecture
In the present study a novel deep ANN is conceived as illustrated in Figure 1. It is a hierarchical
network, which comprises even number of several layers: (L1, L2, ..., Lk; k is an even number).
Figure 1: Proposed deep ANN architecture constructed with k layers with each layer having N
neuronal units: One input layer, one output layer and (k – 2) hidden layers. The set of k layers
are paired into k/2 pairs, each pair denoted as P1, P2, ..., Pij, ..., Pk/2)
Page 4 of 19
30
Transactions on Machine Learning and Artificial Intelligence (TMLAI) Vol 10, Issue 4, August - 2022
Services for Science and Education – United Kingdom
Each of the layers is housed with identical number of a specified set of neuronal
(computational) units. Further, the layers are paired into forming the set, {P1, P2, ..., Pk/2) as
shown; and, there is a pattern of connectivity locally implied in each pair of layers. In all, the
network (as shown in Figure 1,) is divided into connected set of paired-layers. Considering any
given (ij)th pair, when the neuronal units in its first layer (ith layer) is addressed with a particular
pattern, the neurons of the succeeding interconnected layer (jth layer) sums up the contents and
prepares the summed-up details towards a converged self-organization. (The layers Li and Lj
correspond to S- and C-cells of neocognitron [4]). This self-organization is accomplished via
unsupervised learning (USL) briefly outlined below with reference to the illustrative details of
Figure 1 and 2.
Figure 2 An expanded illustration of USL
Focusing on any (ij)th pair of layers with interconnected neurons identified as Pij: Li ↔ Lj in
Figure 1, the state vectors of the associated set of neurons, {n1, n2, ..., N} in each layer can be
represented as, {(zi)1, (zi)2, ..., (zi)N} for the layer Li and {(zj)1, (zj)2, ..., (zj)N} for the layer Lj;
further, the N-components of neuronal states are interconnected by the set of weight states
{wij}. Hence, the following summation, yin can be specified:
y!" = ∑ [w!#
$
"%& ]" × z" (1)
The summed-up details of equation (1) is then adopted in USL towards realizing the converged
state of self-organization in the pair, Pij: Li ↔ Lj as shown in Figure 2. Summary of the tasks
involved are as follows (and more details will be furnished in later sections: The summed entity
yin is next subjected to a nonlinear activation with bipolar limits of (−1 and +1). That is, yin is
squashed with a sigmoidal function, F(yin) to yield an output yo = F(yin). The next step involved
in the USL refers to building a prorated value (k)ij equal to: a × [yave − y' × wij(old)].
Here, yave is the mean value of the excitatory inputs given by:
y()* = (
&
$) ∑ y" $
"%& (2)
The scaling constant (k)ij refers to the change in the value of wij, (expressed as ± Dwij) applied
iteratively to the interconnections, so that the existing weight, wij (old) modifies to a new value,
wij (new) until yin converges to a minimum value.
USL: Unsupervised
learning
(Expanded Scale)
a
yo
yin
k
yin
F(yin)
S
Layer Lj
yave
wij(old)
k =Dwij
Interconnections
Page 5 of 19
31
De Groff, D., & Neelakanta, P. S. (2022). A Novel Deep Learning ANN Supported on Langevin-Neelakanta Machine. Transactions on Machine Learning
and Artificial Intelligence, 10(4). 27-45.
URL: http://dx.doi.org/10.14738/tmlai.104.13100
In short, [wij (new) − wij (old)] = [k = ± Dwij]; and, this change is governed by the cybernetics of
self-organization pivoted on yave seeking the goal of convergence or global minimum observed
as [yin]min. And, the coefficient a defines the learning rate, which can be optimally chosen as
will be explained later.
OPERATIONAL DETAILS OF THE PROPOSED DEEP ANN
With reference to the architectural details of deep ANN in Figures 1 and 2, the steps of operation
can be stated as a sequential preprocessing of raw, excitatory input data, which is adopted to
train progressively (k/2) pairs of layers, {P1, P2, ..., Pk/2}; and. with each pair is subjected to USL
towards a specified goal of convergence. The USL performed independently on each unit of the
paired-layer set {[P1], [P2], ..., [Pk/2]} refers to a machine learning process of inferring the hidden
patterns from a set of some historical data addressed as input to a paired set of layers; hence,
the learning model tries to find any similarities, differences, patterns, and structure in the data
by itself. No prior human intervention is needed [6] [7].
Commencing from the input pair of layers of the multilayer deep architecture, any (input)
excitatory data supplied at the first input layer L1 progresses forward across interconnected
neurons of paired hidden layers; eventually, the output is observed as a converged map on the
N neurons at the final layer, Lk pertinent to the input details. Relevant output enables prediction
classification on labeled and unlabeled constituents of input stimulus/excitatory input. That is,
the required features in the stimulus input data are extracted at the terminal pair of layers of
the deep architecture as the resulting converged map of predictive details on labeled and
unlabeled contents.
The algorithm on unsupervised training pursued in the proposed deep ANN is based on two
analytical considerations:
(i) The first one refers to formulating a matrix of interconnection weights across identical
number of N neuronal units present in the adjacent pair of layers, Li and Lj framing the unit, Pi,j.
The [N × N] matrix relevant to the layers, Li and Lj corresponds to the set of weights {wij}.
Further, inasmuch as this matrix is symmetric, it can be specified in terms of a diagonal set of
associated eigen values (lij). The learning coefficient (a) is then specified in terms of (lij)
towards achieving fast convergence. In all, the associated multidimensional Hessian of the
interconnection matrix has eigenvalues denoting a measure of steepness of a surface of
convergence with iteratively adjusted values of the weights. A large eigenvalue would signify a
steep curvature implicitly depicting a small learning rate towards convergence of the said error
reaching a minimum value. That is, the learning rate should be inversely proportional to the
eigenvalue. The learning rate is indicated before as a part of scaling factor k prescribed as the
weight adjustment parameter. In summary, prescribing an appropriate learning rate (a) by
formulating a matrix of interconnection weights {wij} across identical number of N neuronal
units present in the adjacent pair of layers, Li and Lj (framing the unit, Pi,j); and, this
interconnection matrix [N × N]ij relevant to the layers Li and Lj being symmetric, it is specified
in terms of a diagonal set of associated eigen values {lij}. The learning rate (a) is then linked to
lij.
(ii) The second part of USL algorithm refers to specifying an appropriate squashing function,
F(yin) on the collective details (yin) of neural unit information gathered across the pair, (i, j) as
Page 6 of 19
32
Transactions on Machine Learning and Artificial Intelligence (TMLAI) Vol 10, Issue 4, August - 2022
Services for Science and Education – United Kingdom
shown in Figure 1. The squashing function proposed thereof conforms to the heuristics of the
so-called Langevin-Neelakanta machine [1]-[3]. In summary, the summed value yin in Figure 1,
corresponds to ensemble of neural information at the layer, Lj; and, the proposed scheme
involves prescribing a squashing transfer function, F(yin) on the collective neural information
gathered at the layer, Lj so that the resulting output (yo) remains limited (typically between ±
1). Conventionally, a simple hyperbolic tangent (sigmoidal) function [1] is adopted as the
squashing function; however, in the present study, the sigmoidal function chosen conforms to
the heuristics of the so-called Langevin-Neelakanta machine [1]-[3] as will be detailed later.
In the following subsections, details on aforesaid algorithms are explained.
Learning rate (a) algorithm
As indicated above, the learning/training schedule pursued conforms to unsupervised
(Hebbian) learning (USL) applied at each unit of interconnected pair of layers. Considering any
such paired-layer, Pij: (Li ↔ Lj), the weighted sum of neural information gathered from the set
of neurons, {n1, n2, ..., N} in layer Lj is given by equation (1) and denoted as yin in Figure 1. It is
then limited or compressed to a value, yo by a sigmoidal function, F(.); hence, yo = F(yin), which
is used to construct a scaling constant, (k)ij equal to: a × [yave − y' × wij(old)] where yave is the
mean value of the excitatory inputs given by equation (2). As stated before, this scaling constant
(k)ij refers to ± Dwij denoting the differential change in the value of wij applied iteratively to the
interconnections, until yin converges to a minimum value. And, the coefficient a defines the
learning rate, which can be optimally chosen as follows: Proposed thereof is a novel fast- convergence algorithm towards establishing the learning rate (a) based on eigenvalues of the
associated Hessian matrix of interconnection weights between N neurons in the pair of layers,
(Li ↔ Lj) [8] [9]. The learning rate so adopted varies dynamically with the input data used for
training and enables fast-convergence of learning at the USL.
In any iteration cycle of the USL, considering the neuronal states {z1,z2,..., zN} at the layer Li, the
associated multidimensional interconnection mentioned earlier is a Hessian matrix
(corresponding to an N × N square matrix); and this Hessian matrix can be put into a diagonal
form, denoted as [HD]. Because of the symmetry of the Hessian matrix, it has a unique, single
eigenvalue, lHD in the diagonal form as shown below (with all other eigenvalues being zero):
[��] =
⎣
⎢
⎢
⎢
⎡
0 0 0 0
0 0 0 0
0 0 0 0
0 0 0 λ+,⎦
⎥
⎥
⎥
⎤
(3)
Hence, in each iteration of USL schedule, the differential change in wij (expressed as ± Dwij)
exercised on the interconnections enables a corresponding error function e(wij) deciding the
value of yin. It is given by:
ε(w!#) = ∑ 8(z")! − λ!# × F(w!# × (z")#; $
"%& (4)
and its corresponding derivative, .Ɛ01!"2
.01!"2 can be determined as follows:
Page 8 of 19
34
Transactions on Machine Learning and Artificial Intelligence (TMLAI) Vol 10, Issue 4, August - 2022
Services for Science and Education – United Kingdom
Typically, the underlying data and the resulting values of the weights in the iterations of USL
schedule are mostly random entities. Relevantly, it is indicated in [2] that the associated F(.) is
a solution of a stochastic Bernoulli-Riccati differential equation developed in the context of
nonlinear input-output response of neural interaction. That is, pertinent dynamics refers to the
interacting neurons being regarded as a stochastic system of dipole activity. Hence, Neelakanta
et al. [3] extended Langevin’s theory of dipole polarization and obtained the following explicit
function for, F(.):
F(X) ≡ LQ(X) = (1 + 1/2Q) ×{coth[(1 +1/2Q) × X]} − (1/2Q) ×{coth[(1/2Q) × X]} (8)
where the parameter, Q denotes the stochastic state of the interacting entities bounded by the
limits, (1⁄2 ≤ Q < ¥). The lower bound on Q = 1/2 depicts a totally anisotropic state of statistically
interacting units; and, the other limit Q → ¥ implies a totally-isotropic state of stochastic
interactions involved. Hence, in the contexts of ANN, the configuration of neural interaction of
disorder states can be characterized in the bounding limits of (1⁄2 ≤ Q < ¥) as outlined and
summarized in Table 5.3 of [1]. Usage of LQ(.) as ANN-implied sigmoid leads to Langevin
machine concept as detailed in [1]-[3]. (Framing neuromorphic functioning via Langevin
dynamics and hence, a neural network based on stochastically justifiable sigmoidal function
was proposed in [1] and relevant considerations are extended to discrete Langevin machine in
[3]). In summary, as regard to deep ANN study pursued currently, it is proposed to adopt the
aforesaid LQ(yin) sigmoidal squashing in the USL loop of Figure 1.
TEST DEEP ANN FOR SIMULATION STUDIES
The efficacy of the proposed deep ANN method is cross-verified against details of simulated
results obtained with a test architecture illustrated in Figure 3. Relevant architectural details,
data adopted towards simulations and for cross-verification towards predictions and
operational routines are described in the following sections:
The test architecture of deep ANN shown in Figure 3 is consistent with the general structure
depicted in Figure 1 and constructed with 6 layers as shown: One input layer, one output layer
and four hidden layers. Each layer in the network of Figure 2 has seven neuronal units and the
set of six layers are paired into three pairs represented as: P1: (L1 ↔ L2); P2: (L3 ↔ L4) and P3:
(L5 ↔ L6). The implied USL is exercised on each of the three pairs and illustrated in expanded
scale in Figure 2.
Simulations performed on the test network and the details of data used in the training and
prediction phases, are as follows: As well known, the ANN has the ability to learn or get trained
from experience (or from the existing data). In general, the exhaustively known details adopted
for ANN training would conform to a big-data universe wherein the information/pattern to be
predicted remains obscured as an admixture of random constituents of labeled and unlabeled
in the stochastic framework of large data. The learning/training exercise imposed on the test
ANN as seen earlier, involves adjustment of weight coefficients pertinent to massively
interconnected neuronal units; and, the adjustment is based on either by capturing the
randomness (stochastic aspects) of the stimuli (depicting the input training data) and/or by
using information present in a set of tagged (labeled and unlabeled) classification flags present
in the input data. Eventually, by virtue of trained details, the test ANN is rendered to predict the
desired pattern or labeled details in a given set of data. Further, it will be shown (in a later
Page 10 of 19
36
Transactions on Machine Learning and Artificial Intelligence (TMLAI) Vol 10, Issue 4, August - 2022
Services for Science and Education – United Kingdom
In all, objectively the tasks in deep neural networking imply processing a set of raw input data
so as to extract required features or the desired/labeled data from it. The extracted details are
based on the inherent capability of classification in an arbitrary input-output mapping and
specify a predictive classification of the desired data at the output. That is, the ANN is trained
to classify the labeled and unlabeled data supplied as the input and spot the labeled pattern as
predictive output. Relevant training scheme in the proposed deep ANN simulation is based on
unsupervised learning (USL) method indicated earlier in Figures 1 and 2.
Consistent with the algorithmic suites described earlier for the test deep ANN, relevant
simulation studies indicated here conform to a set of the input data {y1, y2, ..., y7} addressed at
the input layer and constructing a corresponding transpose [y1, y2, ..., y7]T; hence, a [7 × 7] Hessian
matrix [H] is specified as follows: [HD] = [y1, y2, ..., y7]T. [y1, y2, ..., y7]. The input data set used in the
simulations consists of normalized values attributed to seven observed data pertinent to an
arbitrary, random set of epochs. Corresponding [HD] refers to a symmetric square [7 × 7]
matrix yielding one non-vanishing eigenvalue as a diagonal element. That is, for a given
ensemble of input data set there is one non-vanishing eigenvalue (lij) for the pair of
interconnected layers Li and Lj. The learning coefficient adopted towards weight adjustments
is linked to this eigenvalue as will be indicated later. For training the test ANN of Figure 3, four
sets of independent data, (each set consisting of six unlabeled details plus one labeled data) are
gathered from an observed universe of random epochs as listed in Table 1. While the unlabeled
entries refer to stochastic details of the random epochs, the labeled entity, marked (bold) as X
is a deterministic information on the pattern or desired details of focused relevance in the
stochastic occurrence of epochs. The discrete value of X can be addressed randomly at any one
of the input neurons.
Page 15 of 19
41
De Groff, D., & Neelakanta, P. S. (2022). A Novel Deep Learning ANN Supported on Langevin-Neelakanta Machine. Transactions on Machine Learning
and Artificial Intelligence, 10(4). 27-45.
URL: http://dx.doi.org/10.14738/tmlai.104.13100
Table 2: Example data set, {y1, y2, ..., y7}: SET P containing unlabeled and labeled entities
considered for prediction phase simulations
Data
SET P
Parameters used in USL
schedule
n Values of {yn} and C
1 y1 0.138655049
Learning coefficient:
a = 0.1620 (As specified by
the maximum eigen-value
used in the training phase)
2 y2 0.164597563
3 y3 0.688897953
4 y4 0.623615765
5 C 1.000000000
6 y6 0.730366133
7 y7 0.860958783
Data C is the labeled entity to be
classified and identified in the
output set of the prediction phase.
(It can be present randomly at
any of the input neurons, n: 1, 2,
..., 7)
Perform
→ With the initializations as above and prediction phase data set SET P, of Table 2, do USL
operations on the three paired layers as done in training phase
Output
→ With the completion of USL implemented on the three paired layers of Figure 3, the
observed set of {y1, y2, ..., y7} on layer L6 of the last/output pair: L5 and L6 is noted. It depicts the
required prediction details in the output
Result
← That is, the predicted results on the input data of Table 2 refer
to the converged values on neural units {y1, y2, ..., y7} of layer L6
→ Relevant output data set: SET R is presented in Table 3
Page 16 of 19
42
Transactions on Machine Learning and Artificial Intelligence (TMLAI) Vol 10, Issue 4, August - 2022
Services for Science and Education – United Kingdom
Table 3: Result: SET R containing the mix of unlabeled and labeled entities observed as output
data in layer L6 with SET P details addressed as the input
RESULT
Data SET R
Identified output corresponding to the
labeled entity C in the input data of
Table 2
n
Values of {yn}
Output
Q = 0.5 Q =1.0
1 y1 0.9867 0.9129
Labeled
value
classified
as output
Percentage error:
[(1.0000−0.9867)/1]
×100
2 y2 0.9867 0.9129
0.9867
3 y3 0.9867 0.9129 1.33 %
4 y4 0.9867 0.9129
5 y5 0.9867 0.9129
6 y6 0.9867 0.9129
7 y7 0.9867 0.9129
1. Learning coefficient: a = 0.1620 (As specified by the maximum
eigen-value used in the training phase)
2. Learning curves obtained are shown in Figures 4 and 5
3. The output values 0.9867) of {yn} shown are close to the data C =
1 of the input set of details in SET P used in prediction phase. It
conforms the required classification as the labeled entity by the
test deep ANN
Next
Finding the data to be classified as the pattern details sought
← List the output data set {yn} of the prediction phase at the output
layer L6 as SET R shown in Table 3
→ One or more values of {yn} in SET R conform to the required data C corresponding to the
input set of details in SET P used in prediction phase classified as the labeled entity.
End
RESULTS AND DISCUSSION
The present study refers to a multilayer deep ANN network, which uses unsupervised learning
(USL) scheme across a hierarchical set of paired-layers with a pattern of connectivity locally in
those layers. In the following subsections, summary of the operational details is outlined: In the
test multilayer deep ANN network, the USL or training scheme progresses along the paired
stack of layers. The inputs are addressed at the first layer and USL is performed on the first pair
of layers. The resulting interconnection weights of this input pair upon convergence is stored
frozen. The converged values on neuronal states in the second layer are then transferred to the
third layer. Hence, the pair of third and fourth layers are trained and converged details on
weights are stored; and, the converged neuronal states at fourth layer is passed on to the next
(fifth) layer and so on. This procedure is carried out across entire set of paired layers of the
Page 17 of 19
43
De Groff, D., & Neelakanta, P. S. (2022). A Novel Deep Learning ANN Supported on Langevin-Neelakanta Machine. Transactions on Machine Learning
and Artificial Intelligence, 10(4). 27-45.
URL: http://dx.doi.org/10.14738/tmlai.104.13100
designed architecture and adopted in both training and prediction phases. The output being
sought refers to the final converged details of the last (output) layer corresponding the
prediction phase. The internal (local) calculations between the adjacent layers in any paired set
depend upon the neuronal state details coming from the previous layers.
The proposed deep ANN conceived for ML in big data environments of AI has the following
uniquely specified novel features: (i) The algorithm prescribes at each pair of layers a learning
coefficient inversely proportional to maximum of computed Hessian eigenvalues evaluated for
the ensemble data sets processed across the paired set of layers in training and prediction
phases. This prescription enables a desirable fast convergence of learning involved; and, (ii)
necessary sigmoidal squashing is adopted while performing the USL, which conforms to
Langevin-Neelakanta machine concepts. The choices as above rely on the motivational
considerations towards fast convergence of the test deep ANN even under big data ambient,
yielding the result on correct classification of labeled entity present in the input data analyzed.
Figure 4: Learning curves depicting the observed error e in yin versus number of iterations in a
USL loop of Figure 3 obtained (normalized) with the ensemble of data SET P in Table 2 for two
cases using an optimum value of a = 0.1620 (as deduced by the proposed method): (A) with Q =
0.5 and (B) with Q = 1.0
Presented in Figures 4 and 5 are examples of learning-curves obtained for the exemplar
ensemble of input data (of SET P) of the prediction phase. The learning curves presented are
pertinent to two cases: (i) With learning coefficient, a = 0.1620, inversely proportional to the
maximum value of computed Hessian eigenvalues evaluated for the ensemble data sets; and,
(ii) with a equal to an arbitrary value: 0.001. As seen in Figures 4 and 5, the convergence is
faster with the learning rate based on a = (1/lmax) = 0.1620. Also, the convergence is influenced
by the value of Q adopted in the USL loop. Further, the chosen value of Q decides the accuracy
of predicted output with respect to the labeled value implied at the input as in Table 3.
0 5 10 15 20 25 30 35 40 45 50
Number of iterations
1.0000
0.9991
0.9987
0.9982
0.9978
0.9974
0.9995
0.9969
0
Error: e
A
B
Page 18 of 19
44
Transactions on Machine Learning and Artificial Intelligence (TMLAI) Vol 10, Issue 4, August - 2022
Services for Science and Education – United Kingdom
Figure 5: Learning curves depicting the observed (normalized) error e in yin versus number of
iterations in a USL loop of Figure 3 obtained with the ensemble of data SET P, in Table 2 for two
cases using an arbitrary value of a = 0.001: (X) with Q = 0.5 and (Y) with Q = 1.0
CONCLUDING REMARKS
The present study demonstrates a promising application of deep ANN for classifying and
predicting a labeled pattern present as an admixture of details in a big data environment. A
multilayered deep ANN is described thereof with an architecture of successively placed set of
paired layers, each containing N neuronal units that are massively interconnected. Each pair of
layers is independently trained via unsupervised learning schedule. The underlying training
and prediction phases are implemented with specifically defined algorithms on sigmoidal
squashing involved in defining the weight adjustment parameter and learning rate pursued
independently in the USL loop of each pair of layers.
To illustrate the efficacy of the proposal, simulated details on training a test architecture and
evaluating the prediction phase data are furnished. Results on fast prediction of labeled data
(that could be obscurely present in the stochastic mix of unlabeled and labeled patterns in the
input) concur with the novel heuristics and suggested algorithms proposed. In essence, the
following can be indicated as salient features of the novel deep learning ANN supported on
Langevin-Neelakanta machine:
Machine learning in big data environment objectively requires a deep learning ANN
architecture to handle enormous data and achieving fast convergence. The computational
power of traditional recurrent neural networks depends ultimately on the complexity of
massive interconnections of the network. Relevant implications of information processing are
decided by Kolmogorov complexity associated with the size of the network architecture and
extents of weights present. Any neural net learning algorithms aim at finding simple nets to
assimilate the training (test) data robustly and lead to better the generalization on test data in
the prediction phase [10][11]. This complexity issue grows significantly with increased number
of hidden layers manifesting as computational bottleneck in handling large data and involving
excessive processing time towards convergence. This issue is obviously serious in big data
environment; as such, novelty in conceiving better deep ANN architecture is a motivated goal
0 500 1000 1500 2000 2500 3000
Number of iterations
1.0000
0.9948
0.9896
0.9846
0.9793
0.9740
0.9689
0
X
Y
Error: e
Page 19 of 19
45
De Groff, D., & Neelakanta, P. S. (2022). A Novel Deep Learning ANN Supported on Langevin-Neelakanta Machine. Transactions on Machine Learning
and Artificial Intelligence, 10(4). 27-45.
URL: http://dx.doi.org/10.14738/tmlai.104.13100
towards ML efforts in AI contexts. Hence, the present study on deep ANN has been indicated
with the feasibility to accommodate a large number of cascaded sets of independent paired
layers; and, each pair is trained via USL towards self-organization. In the relevant effort, the
algorithmic suites pursued enable fast convergence with a uniquely specified learning rate as
well as, achieving better (more accurate) results on prediction by resorting to Langevin- Neelakanta machine concept with the associated squashing function in the USL loop. The
architecture of using a cascaded set of individualized paired set of layers is akin to that of
Fukushima’s cognitron [4][5], but modified to match the motivated context of the present study.
Simulation results presented establish the desirable efficacy of the proposed concept.
References
1. Neelakanta, P. S., De Groff, D., Neural Network Modeling: Statistical Mechanics and Cybernetic Perspectives,
1994, CRC Press, Boca Raton, Fl: USA.
2. Neelakanta, P. S., (Editor), Information-Theoretic Aspects of Neural Networks, 1999, CRC Press, Boca Raton,
Fl: USA.
3. Neelakanta, P. S., Sudhakar, R., and DeGroff, D., Langevin machine: A neural network based on stochastically
justifiable sigmoidal function, Biological Cybernetics, 1999, 65, p. 331-338.
4. Fukushima, K. Neocognitron: A self-organizing neural network model for a mechanism of pattern
recognition unaffected by shift in position. Biological Cybernetics, 1980, 36, p.193-202.
5. Fukushima, K. Cognitron: A self-organizing multilayered neural network. Biological Cybernetics, 1975, 20,
p.121-136.
6. Becker, S., Plumbley, M. Unsupervised neural network learning unsupervised neural network learning
procedures for feature extraction and classification. Applied Intelligence, 1996, 6, p.185-203.
7. Wittek, P., Unsupervised learning in quantum machine learning, Elsevier Inc/Morgan Kaufmann, 2014,
Cambridge, MA: USA.
8. De Groff, D., Neelakanta, P. S., Faster convergent neural networks, International Journal of Computers and
Technology, 2018, 17(1), p.7126-7132.
9. Margoulas, G. D., Vrahatis, M. N., Androulakis, G.S. Improving the convergence of the backpropagation
algorithm using learning rate adaptation methods, Neural Computation, 1999, 11(7), p. 1769-1796.
10. Schmidhuber, J. Discovering neural nets with low Kolmogorov complexity and high generalization
capability. Neural Network, 1997, 10(5), p.857-873
11. Balcázar, J. L. Computational power of neural networks: A characterization in terms of Kolmogorov
complexity. IEEE Transactions on Information Theory, 1997, 43(4), p.1175-1183.