Global, Local
and Personalized Modeling and Pattern Discovery in
Bioinformatics:
An Integrated
Approach
Nikola Kasabov
Knowledge Engineering and Discovery Research Institute,
KEDRI (www.kedri.info),
Auckland University of Technology,
Auckland, Private Bag 920010, New Zealand, nkasabov@aut.ac.nz
Abstract
The paper is a comparative study
of major modeling and pattern discovery approaches
applicable to the
area of Bioinformatics and the area of decision support
systems in general. These approaches include inductive
versus transductive reasoning, global, local, and personalized
modeling and their potentials are illustrated on a
case study of gene expression and clinical data related
to cancer outcome prognosis. While inductive modeling
is used to develop a model (function) from data on
the whole problem space and then to recall it on new
data, transductive modeling is concerned with the creation
of single model for every new input vector based on
some closest vectors from the existing problem space.
The paper uses several techniques to illustrate these
approaches – multiple linear regression, Bayesian
inference, support vector machines, evolving connectionist
systems (ECOS), weighted kNN – each of them providing
different accuracy on specific problem and facilitating
the discovery of different patterns and rules from
data.
Keywords: transductive reasoning,
personalized modeling; knowledge discovery; local
modeling; evolving connectionist
systems, Bioinformatics, gene expression data, medical
decision support systems, personalized probabilities;
cancer prognosis.
Bioinformatics – An Area
of Exponentially Increasing Data Volume and Emergence
of Knowledge
With the completion of the sequence
draft of the human genome and the genomes of other
species (more to be
sequenced during this century) the task is now to be
able to process this vast amount of ever growing dynamic
information and to create intelligent systems for data
analysis and knowledge discovery, from cells to whole
organisms and species[1] [2].
The central dogma of molecular
biology is that the DNA (Dioxyribonucleic Acid) present
in the nucleus
of each cell of an organism is transcribed into RNA,
which is translated into proteins [3]. Genes are complex
molecular structures that cause dynamic transformation
of one substance into another during the whole life
of an individual, as well as the life of the human
population over many generations[4]. Even the static
information about a particular gene is very difficult
to understand (see the GenBank database www.genebank.com).
When genes are “in action”, the dynamics
of the processes in which a single gene is involved
are thousand times more complex, as this gene interacts
with many other genes, proteins, and is influenced
by many environmental and developmental factors[5].
Modelling these interactions and
extracting meaningful patterns – knowledge,
is a major goal for the area of Bioinformatics. Bioinformatics
is concerned
with the application and the development of the methods
of information sciences for the collection, storage,
analysis, modelling and knowledge discovery from biological
and medical data.
The whole process of the expression
of genes and the production of proteins, and back
to the genes, evolves
over time. Proteins have 3D structures that evolve
over time governed by physical and chemical laws. Some
proteins bind to the DNA and make some genes to
express and may suppress the expression of other genes.
The genes in an individual may mutate, change slightly
their code, and may therefore express differently at
a next time. Genes represent both static and dynamic
information that is difficult to capture as patterns[6,
7].
Gene and protein expression values
can be measured through micro-array equipment [8]
thus making this
information available for a medical decision making,
such as medical prognosis and diagnosis, and drug design.
Many challenging problems in Bioinformatics need to
be addressed and new knowledge about them revealed,
to name only some of them:
- Recognizing patterns from
sequences of DNA, e.g. promoter recognition[9];
- Recognizing patterns in RNA data (e.g. splice junctions
between introns and exons; micro RNA structures; non-coding
regions analysis)
- Profiling gene microarray expression
data from RNA in different types of tissue (cancer
vs normal), different
types of cells, to identify profiles of diseases[10-15];
- Predicting protein structures;
- Modeling metabolism in cells[16, 17];
- Modeling entire cells[16];
- Modeling brain development and brain diseases[7,
18, 19];
- Creating complex medical decision support systems
that deal with a large set of variables that include
both gene and clinical variables to obtain the right
diagnosis and prognosis for a patient[20].
A main approach to understanding gene interaction
and life science in general and to solve the above
problems is mathematical and computational modeling
[21]. The more new information is made available about
DNA, gene expression, protein creation, metabolic pathways,
etc., the more accurate their information models will
become. They should be adaptive to any new information
made available in a continuous way. The process of
biological patterns and knowledge discovery is also
evolving.
The main contribution of the paper
is the comparative study of different modeling approaches
to solving problems
in Bioinformatics with the emphasis not only on the
accuracy of the model, but on the type of patterns – knowledge,
that these models facilitate to discover from data.
Section 2 introduces briefly three generic types of
modeling approaches applicable to the problems listed
above, namely - global, local and personalized modeling.
It also introduces one particular modeling technique – evolving
connectionist systems (ECOS). These approaches are
applied in section 3 on a case study problem of modeling
and profile discovery from gene expression and clinical
data related to cancer outcome prognosis. Section 4
discusses some modeling issues of gene regulatory networks
while section 5 presents further research directions
in the area of Bioinformatics and in the area of medical
decision support systems in general. The main conclusion
is that for a detailed research on a complex problem
different levels of knowledge need to be discovered – at
global, local and personalised levels. Knowing the
potentials of different modeling approaches, each of
them can be applied to available data on a problem
and results can facilitate the discovery process of
complex patterns and rules of life.
2. Inducive versus Transductive Reasoning. Global,
Local and Personalised Modelling
2.1. Inductive versus Transductive Reasoning
The widely used in all fields of
science inductive reasoning approach is concerned
with the creation of
a model (a function) from all available data, representing
the entire problem space and the model is applied then
on new data (deduction). Transductive inference, introduced
by Vapnik [22] is defined in contrast as a method used
to estimate the value of a potential model (function)
only for a single point of space (that is, a new data
vector) by utilizing additional information related
to that vector. While the inductive approach is useful
when a global model of the problem is needed in an
approximate form, the transductive approach is more
appropriate for applications where the focus is not
on the model, but rather on every individual case.
This is very much related to clinical and medical applications
where the focus needs to be centered on individual
patient’s conditions.
The transductive approach is related to the common
sense principle [23] which states that to solve a given
problem one should avoid solving a more general problem
as an intermediate step. The reasoning behind this
principle is that, in order to solve a more general
problem, resources are wasted or compromised which
is unnecessary for solving the individual problem at
hand (that is, function estimation only on given points).
This common sense principle reduces the more general
problem of inferring a functional dependency on the
whole input space (inductive approach) to the problem
of estimating the values of a function only at given
points (transductive approach).
In the past years, transductive
reasoning has been implemented for a variety of classification
tasks such
as text classification [24, 25], heart disease diagnostics
[26], synthetic data classification using graph based
approach [27], digit and speech recognition [28],
promoter recognition in bioinformatics [29], image
recognition [30] and image classification [31], micro
array gene expression classification [32, 33] and biometric
tasks such as face surveillance [34]. This reasoning
method is also used in prediction tasks such as predicting
if a given drug binds to a target site [35] and evaluating
the prediction reliability in regression [23] and providing
additional measures to determine reliability of predictions
made in medical diagnosis [36]. Out of several research
done in area that utilize transductive principal as
mentioned, transductive support vector machines [25]
and semi-supervised support vector machines [37] stand
as often citied research [38].
In transductive reasoning, for every new input vector
xi that needs to be processed for a prognostic/classification
task, the Ni nearest neighbors, which form a data subset
Di, are derived from an existing dataset D and a new
model Mi is dynamically created from these samples
to approximate the function in the locality of point
xi only. The system is then used to calculate the output
value yi for this input vector xi.
This approach has been implemented
with radial basis function [39] in medical decision
support systems and
time series prediction problem, where individual models
are created for each input data vector. This approach
gives a good accuracy for individual models and has
promising applications especially in medical decision
support systems. This transductive approach has also
been applied using support vector machines as the base
model in area of bioinformatics [29, 32] and the results
indicate that transductive inference performs better
than inductive inference models mainly because it exploits
the structural information of unlabeled data. However,
there are a few open questions that need to be addressed
while implementing transductive modeling, e.g.: How
many neighboring samples K are need? What type of distance
measure to use when choosing the neighbors? What model
to apply on the neighboring samples? These issues will
be addressed in section 3.
2.2. Global, local and personalised modelling
The three main approaches investigated in the paper
are:
- Global modeling – a model
is created from data, that covers the whole problem
space and is represented
as a single function, e.g. a regression formula.
Local modeling - a set of local
models are created from data, each representing a
sub-space (cluster)
of the problem space, e.g. a set of rules;
Individualised (personalised) modeling – a
model is created only for a single point (vector,
patient
record) of the problem space using transductive reasoining.
To illustrate the concepts of global,
local and personalised modelling, here we use a case
study problem and a publicly
available data set from Bioinformatics - the DLBCL
lymphoma data set for predicting survival outcome over
5 years period. This data set contains 58 vectors – 30
cured DLBCL lymphoma disease cases, and 28 - fatal
[12, 40]. There are 6,817 gene expression variables.
Clinical data is available for 56 of the patients represented
as IPI- an International Prognostic Index[C1] ,
which is an integrated number representing overall
effect of several [C2] clinical variables [12,
40]. The task is, based on the existing data, to: (1)
create a prognostic system that predicts the survival
outcome of a new patient; (2) to extract profiles that
can be used to provide an explanation for the prognosis;
(3) to find markers (genes) that can be used for the
design of new drugs to cure the disease or for an early
diagnosis.
Using a global linear regression
method on the 11 DLBCL prognostic genes [12, 40]
(denoted as X1, X2,…,X11)
for the 58 vectors, normalised in the range [0,1],
we derive the following classification model:
Y=0.36 + 0.53 X1 – 0.12 X2 - 0.41 X3 - 0.44
X4 + 0.34 X5 + 0.32 X6 – 0.07 X7 + 0.5 X8 - 0.5
X9 + 0.18 X10 + 0.3 X11 (1)
Formula (1) constitutes a global
model (i.e. it is to be used to evaluate the output
for any input vector
in the 11-dimensional space regardless of where it
is located). It indicates to certain degree the importance
of the genes (e.g., gene X7 has the lowest importance),
but it does not give any information about the relationship
between the genes in different sub-spaces of the problem
space and more importantly, the relevance of each of
these genes to the prediction of survival of an individual. The
model, being global (e.g. “going” through
the whole space) gives the “big” picture,
but is difficult to adapt on new data. Linear and logistic
regression methods have been widely used for gene expression
modelling [41] [42] and for modelling gene regulatory
networks [5] [17].
Another statistical machine learning method, that
is widely used for the creation of classification models,
is the support vector machine (SVM) [22]. A SVM is
kind of a global model, but instead of a single formula,
a SVM consists of a set of vectors described as kernel
functions that are on the border area between the samples
that belong to different classes (called support vectors).
SVM models are very good classification models, but
are difficult to adapt and the knowledge extracted
from them is very limited. SVM models have been used
in many research papers [12, 22].
In contrast to the global models,
local models are created to evaluate the output function
for only a
sub-space of the problem space. Multiple local models
(e.g. one for each cluster of data) can constitute
together the complete model of the problem over the
whole problem space. Local models are often based on
clustering techniques. A cluster is a group of similar
data samples, where similarity is measured predominantly
as Euclidean distance in an orthogonal problem space.
Clustering techniques include: k-means[43]; Self-Organising
Maps (SOM) [41, 44], fuzzy clustering [45-47], hierarchical
clustering [48], simulated annealing [49]. In fuzzy
clustering one sample may belong to several clusters
to a certain membership degree, the sum of which is
1. Generally speaking, local models are easier to adapt
to new data and can provide a better explanation for
individual cases. The ECF model, described in the next
sub-section, is a representative of multiple local
models based on clustering.
A “personalised” model is created “on
the fly” for every new input vector and this
individual model is based on the closest data samples
to the new sample taken from a data set. A simple example
of personalised modelling technique is the K-NN (nearest
neighbour) method, where for every new sample, the
nearest K samples are derived from a data set using
a distance measure, usually Euclidean distance, and
a voting scheme is applied to define the class label
for the new sample [22, 43].
In the K-NN method, the output value yi for a new
vector xi is calculated as the average of the output
values of the k nearest samples from the data set Di.
In the weighted K-NN method (WKNN) the output yi is
calculated based on the distance of the K-NN samples
to xi:
(1)
where: yj is the output value for the sample xj from
Di and wj are their weights measured as distance from
the new input vector:
(2)
In Eq. (2) the vector d = [d1,
d2, … dNi] is
defined as the distances between the new input vector
xi and Ni nearest neighbours (xj, yj) for j = 1 to
Ni; max(d) and min(d) are the maximum and minimum values
in d respectively. The weights wj have the values between
min(d)/max(d) and 1; the sample with the minimum distance
to the new input vector has the weight value of 1,
and it has the value min(d)/max(d) in case of maximum
distance.
If WKNN is used to solve a classification
problem and two classes are represented by 0 (class
1) and
1 (class 2) output class labels, the output for a new
input vector xi calculated in eq.(1) has the meaning
of a personalised probability that the new vector xi will
belong to class 2. In order to finally classify a vector
xi into one of the (two) classes, there has to be a
probability threshold selected Pthr, so that if yi >=
Pthr, then the sample xi is classified in class 2.
For different values of the threshold Pthr, the classification
error might be different.
Personalised probability is calculated
with the use of transductive reasoning and is different
from local
probability (probability of class 2 samples in
a local region – cluster) and is also different
from the global probability measure (the usual way
to deal with probabilities) N2/N, where N2 is the number
of samples in the whole problem space that belong to
class 2 and N is the total number of all samples.
Using global probability measures to evaluate a probability
of single input vector x to belong to a class A (the
Bayesian probability inference approach) requires that
some prior probabilities are available and these are
not easy to obtain and often too uncertain. The Bayesian
posterior probability p(A|x) of a new input vector
x to belong to class A is calculated with the use of
the formula:
(3)
where: p(A) and p(x) are prior
probabilities and p(A|x) and p(x|A) are posterior
probabilities.
Calculating personalised probability
(in a transductive way) does not require any prior
information.
3. Evolving Connectionist Systems
ECOS for Local Modelling and Cluster-based Rule Discovery
3.1. The ECOS architecture
Some traditional neural network
models are seen as “black
boxes” and are not very useful models for the
discovery of new patterns from data [50]. A new type
of neural networks, evolving connectionist systems
(ECOS) were introduced in [51]. They allow for structural
adaptation, fast incremental, on-line learning, and
rule extraction and rule adaptation. One of its simplest
implementations is the evolving classifier function
ECF [51, 52] (see fig.1).
[Figure 1]
The ECOS from fig.1 consists of five layers of neurons
and four layers of connections. The first layer of
neurons receives the input information. The second
layer (optional) calculates the fuzzy membership degrees
to which the input values belong to predefined fuzzy
membership functions, e.g. Low, Medium, or High. The
membership functions can be kept fixed, or can change
during training. The third layer of neurons represents
associations between the input and the output variables,
rules. The fourth layer (optional) calculates the degree
to which output membership functions are matched by
the rule node activation, and the fifth layer does
defuzzification and calculates values for the output
variables.
3.2. The ECOS learning algorithms
ECOS in general are connectionist
systems that evolve their structure and functionality
in a continuous,
self-organised, on-line, adaptive, interactive way
from incoming information. They can learn from data
in a supervised or unsupervised way. Learning
is based on clustering of input vectors and function
estimation for the clusters in the output space. Prototype
rules can be extracted to represent the clusters and
the functions associated with them. The ECOS models
allow for an incremental change of the number and types
of inputs, outputs, nodes, connections. The algorithm
to evolve a simple classification system called ECF
(Evolving Classification Function) from incoming stream
of data is shown in fig.2. The internal nodes in the
ECF structure capture clusters of input data and are
called rule nodes.
[Figure 2]
Different types of rules are facilitated
by different ECOS architectures, i.e. Zadeh-Mamdani
rules - in the
evolving fuzzy neural networks EfuNN [53, 54] – see
fig.1, or Takagi-Sugeno rules - in the dynamic neuro-fuzzy
inference systems DENFIS[55]. An ECOS structure grows
and “shrinks” in a continuous way from
input data streams. Feed-forward and feedback connections
are both used in the architecture. The ECOS are not
limited in number and types of inputs, outputs, nodes,
connections. Several machine learning methods are facilitated
in different types of ECOS that have been already applied
Bioinformatics problems from section 1[51].
4. A Comparative Study of Global, Local and Personalised
Modelling on the Case Study of Gene Expression and
Clinical Information
4.1. Problem definition and data
sets
A gene expression profile is defined here as a pattern
of expression of a number of significant genes for
a group (cluster) of samples of a particular output
class or category. A gene expression profile is represented
here as an IF-THEN inference rule:
IF <A pattern of gene expression values of selected
genes is observed> THEN <There is a likelihood
for a certain diagnostic or prognostic outcome>.
Having profiles/rules for a particular
disease makes it possible to set up early diagnostic
tests so that
a sample can be taken from a patient, data related
to the sample processed, and then mapped into the existing
profiles. Based on similarity between the new data
and the existing profiles, the new data vector can
be classified as belonging to the group of “good
outcome”, or “poor outcome” with
a certain confidence and a good explanation can be
provided for the final decision as the matched local
rules/profile(s) will be the closest to the person’s
individual profile[56].
Contemporary technologies, such as gene microarrays,
allow for the measurement of the level of expression
of up to 30,000 genes in RNA sequences that is indicative
of how much protein will be produced by each of these
genes in the cell[57]. The goal of the microarray gene
expression data analysis is to identify a gene or a
group of genes that are differently expressed in one
state of the cell or a tissue (e.g. cancer) versus
another state (normal)[58]. Generally, it is difficult
to find consistent patterns of gene expression for
a class of tissues.
Gene expression data is often accompanied by clinical
data variables. The issue of gene and clinical variables
integration for the discovery of combined patterns
is addressed here as well.
4.2. Experimental results with the use of different
modeling techniques
The two main reasoning approaches – inductive
and transductive are used here to develop global, local
and personalized models on the same data in order to
compare different approaches on two main criteria – (1)
accuracy of the model; and (2) type of patterns discovered
from data. The following classification techniques
are used: multiple linear regression (MLR); SVM; ECF;
WKNN.
Each of the models are validated
through the same leave-one-out cross validation method
[22]. The accuracy
of the different models is presented in Table 1. It
can be seen that the transductive reasoning and personalized
modeling is sensitive to the selection of the K value.
Its optimization is discussed in the next section.
The best accuracy is manifested by the local ECF model,
trained on a combined feature vector of 11 gene expression
variables and the clinical variable IPI. Its prognostic
accuracy is 88% (83% for class 1- cured, and 92% for
class 2- fatal). This compares favorably with the 75%
accuracy of the SVM model used in[12].
[Table 1]
In addition, local rules that represent
cluster gene profiles of the survival versus the
fatal group of
patients were extracted as graphically shown in fig.3.
These profiles show that there is no single variable
that clearly discriminates the two classes – it
is a combinations of them that discriminates different
sub-groups within a class and between classes.
[Figure 3]
The local profiles can be aggregated
into global class profiles through averaging the
variable values across
all local profiles that represent one class – fig.4.
Global profiles may not be very informative if data
samples are dispersed in the problems space and each
class samples are spread out in the space.
[Figure 4]
4. Model Optimisation with the Use of Evolutionary
Computation
4.1. Evolutionary computation
Using a same modelling technique,
but for different parameter values and different
input variables, may
lead to different results and different information
extracted from the same initial data set. One way to
optimise these parameters and obtain an optimal model
according to certain criteria (e.g. classification
accuracy) is through evolutionary computation techniques[59,
60]. One of them - genetic algorithms[60], is an optimisation
technique that generates a population of individual
solutions (models) for a problem, e.g. classification
systems, and trains these systems on data, so that
after training, the best systems (e.g. with the highest
accuracy - fitness) can be selected and some operations
of “crossover” and “mutation” applied
on them to obtain the next generation of models [60].
The process continues until a satisfactory model is
obtained. Applications of GA for gene expression data
modelling and GRN modelling are presented in [61, 62].
The problem of the evolutionary computation techniques
is that there is no guaranteed optimal solution obtained,
as they are heuristic search techniques in a solution
space. This is in contrast to the exhaustive search
technique that will guarantee an optimal solution,
but the time the procedure would take may not be acceptable
and practically applicable.
4.2. Experimental results – Optimisation
of local models - ECF
In the models explored in the previous section, neither
the model parameters (such as Rmax, Rmin, m and number
of membership functions in an ECF model; K in the WKNN;
etc), nor the set of input variables (features) were
optimised to produce the best accuracy. Out of 11 genes
and the IPI clinical features, there may by only a
sub-set of them that would produce better result (if
the other ones were noisy features).
In an experiment shown in Fig.
5 both the ECF parameters and features are optimised
with the use of a GA which
ran over 20 generations, each of them containing 20
ECF models with different parameter values with a fitness
criteria being the overall highest accuracy for
a smaller number of features. The optimal ECF parameters
are given in the figure and the best model has an overall
accuracy of 90.66%, which is higher than any of the
non-optimised models from Table 1.
[Figure 5]
4.3. Experimental results - Optimisation of Transductive,
Personalised Models
We noticed from Table 1 that the
accuracy of the transductive, personalised models
depend on the choice of some parameters – K,
distance measure, model parameters (e.g. ECF parameters).
Optimising these parameters during the process of the
model development “on the fly” is experimented
here and results presented in fig.6. For every sample
of the 56 samples in the DLBCL Lymphoma data set (one
IPI variable and 11 gene variables) optimised values of
the number of the neighbouring samples K (fig.6.a)
and the type of distance measure (fig.6.b) is defined
with the use of a GA optimization procedure. The fitness
function is the cross validation accuracy in a leave-one-out
method for all K samples in the neighbourhood.
So, here not only a personalised
model is derived for every new data sample, but an
optimal one is created
through GA optimisation procedure.
[Figure 6]
5. Gene Regulatory Network Modelling and Discovery
In a living cell genes interact
in a complex, dynamic way and this interaction is
crucial for the cell behavior.
This interaction can be represented in an approximate
way as a gene regulatory network (GRN)[5]. An example
is shown in fig. 7.
GRN models can be derived from time course gene expression
data of many genes measured over a period of time.
Some of these genes have similar expressions to each
other as shown in fig. 8.
Genes that share similar functions usually show similar
gene expression profiles and cluster together. In a
GRN clusters can be used and represented by nodes instead
of genes or proteins. A GRN model can be used to predict
the expression of genes and proteins in a future time
and to predict the development of a cell or an organism.
The process of deriving GRN from data is called reverse
engineering[5]. Many methods of computational intelligence
and machine learning have been used so far for the
problem, that include: correlation and regression analysis,
Boolean networks, graph theory, differential equations,
evolutionary computation, neural networks, etc.
In [63] local modeling with ECOS (EFuNN and DENFIS)
was introduced on a small data set of Leukemia cell
line U937 data to extract GRN and to represent it as
a set of rules associating the expression of the genes
at time t, with the level of their expression
in the next time moment (t + dt).
An ECOS is incrementally evolved
from a series of gene expression vectors X(t0), X(t1),
X(t2), …,
representing the expression values of all, or some
of the genes or their clusters. Consecutive vectors
X(t) and X(t+k) are used as input and output vectors
respectively in an ECOS model, as shown in fig.1. After
training of an ECOS on the data, rules are extracted,
e.g.:
IF x1(t) is High (0.87) and
x2(t) is Low (0.9)
THEN x3 (t+k) is High (0.6) and x5(t+k) is Low. (4)
Each rule represents a transition between a current
and a next state of the system variables - genes. All
rules together form a representation of the GRN.
By modifying a threshold for rule extraction, one
can extract in an incremental way stronger, or weaker
patterns of relationships between the variables [56].
Using the DENFIS ECOS [55] other types of variable
relationship rules in a GRN can be extracted, e.g.:
IF x1(t) is (0.63 0.70 0.76) and x2(t)
is (0.71 0.77 0.84) and
x3(t) is (0.71 0.77 0.84) and x4(t) is (0.59 0.66 0.72)
THEN x5(t+k) = 1.84 -1.26x1(t) - 1.22x2(t)
+ 0.58x3(t) - 0.3 x4(t), (5)
where the cluster for which the value of the gene
variable x5 is defined in the rule above, is a fuzzy
cluster represented through triangular membership functions
defined as triplets of values for the left-, centre-,
and right points of the triangle on a normalisation
range of [0,1]. The fuzzy representation allows for
dealing with imprecise data. The rules extracted from
the ECOS form a representation of the GRN. Rules may
change with the addition of new data, thus making it
possible to identify stable versus dynamic parts of
the GRNs.
6. Conclusions and Future Directions
The problems in Bioinformatics
are too complex to be adequately modeled with the
use of a single approach.
The paper compared the main existing approaches to
modeling and pattern discovery from biological data
on the case study of cancer prognostic data consisting
of gene expression and clinical variables. The approaches
discussed are: inductive and transductive reasoning;
global, local and personalized modeling. As a general
conclusion, for a detailed study on a given problem
and for the discovery of patters the characterise different
aspects of the processes, all these approaches need
to be applied and the results interpreted in an
integrated way.
New methods are needed in the future
for the integration of biological data – both
molecular and clinical; for a personalised drug design
and personalised medicine;
for building embedded systems and implementing them
into biological environments; for computational modelling
of proteins and gene regulatory networks; and for many
other challenging problems in Bioinformatics.
Acknowledgement
The work is funded by the NERF – FRST
grant AUTX0201 at the Auckland University of Technology,
New Zealand (www.aut.co.nz). The data analysis in the
paper was conducted with the use of two software environments
- NeuCom (www.theneucom.com, or www.kedri.info/) and
SIFTWARE (available from Pacific Edge Biotechnology
Ltd (www.peblnz.com). I would like to thank my students
and associates Nisha Mohan, Dougal Greer, Peter Hwang,
Dr Qun Song for the implementation of some of the code
of the experimental software.
REFERENCES
1. Dow, J., G. Lindsay, and
J. Morrison, Biochemistry Molecules, Cells and the
Body. 1995, Boston, MA: Addison-Wesley.
592.
2. Baldi, P. and S. Brunak,
Bioinformatics. A Machine Learning Approach. 2nd
ed. 2001, Cambridge,
MA: MIT Press. 351.
3. Crick, F., Central dogma
of molecular biology. Nature, 1970. 227: p. 561-563.
4. Snustad, D.P. and M.J.
Simmons, The Principles of Genetics. 2003: wiley.
5. D'Haeseleer, P., S. Liang,
and R. Somogyi, Genetic network inference: from co-expression
clustering
to reverse engineering. Bioinformatics, 2000. 16(8):
p. 707-726.
6. Collado-Vides, J. and R.
Hofestadt, eds. Gene Regulation and Metabolism. Post-Genomic
Computational
Approaches. 2002, MIT Press: Cambridge, MA. 310.
7. Marnellos, G. and E.D.
Mjolsness, Gene network models and neural development,
in Modeling Neural Development,
A. vanOoyen, Editor. 2003, MIT Press: Cambridge, MA.
p. 27-48.
8. Quakenbush, J., Microarray
data normalization and transformation. Nature Genetics,
2002. 32: p. 496-501.
9. Bajic, V., et al., Computer
model for recognition of functional trascription
start sites in RNA polymerase
II promoters of vertebrates. J. Molecular Graphics
and Modelling, 2003(21): p. 323-332.
10. Ramaswamy, S., et al.,
Multiclass cancer diagnosis using tumor gene expression
signatures. Proceedings
of the National Academy of Sciences of the United States
of America, 2001. 98(26): p. 15149.
11. Perou, C., et al., Molecular
portraits of human breast tumours. Nature, 2000.
406.
12. Shipp, M.A., et al., Diffuse
large B-cell lymphoma outcome prediction by gene-expression
profiling
and supervised machine learning. Nature Medicine, 2002.
8(1): p. 68-74.
13. Singh, D., et al., Gene
expression correlates of clinical prostate cancer
behavior. Cancer Cell,
2002. 1: p. 203-209.
14. van de Vijver, M.J., et
al., A Gene-Expression Signature as a Predictor of
Survival in Breast Cancer.
N Engl J Med, 2002. 347(25): p. 1999-2009.
15. Veer, L.J.v.t., et al.,
Gene expression profiling predicts clinical outcome
of breast cancer. Nature,
2002. 415(6871): p. 530.
16. Vides, J., B. Magasanik,
and T. Smith, Integrated approaches to molecular
biology. 1996: The MIT Press.
17. Bower, J. and H. Bolouri,
eds, Computational Modelling of Genetic and Biochemical
Networks. 2001,
Cambridge, MA: The MIT Press.
18. LeCun, Y., J.S. Denker,
and S.A. Solla, Brain damage, in Advances in Neural
Information Processing
Systems, D.S. Touretzky, Editor. 1990, Morgan Kaufmann:
San Francisco, CA. p. 598-605.
19. Kasabov, N. and L. Benuskova,
Computational neurogenetics. Journal of Computational
and Theoretical
Nanoscience, 2004. 1(1): p. in press.
20. Kasabov, N., et al., Medical Decision Support
Systems Utilizing Gene Expression and Clinical Information
And Methods for Use, in PCT/US03/25563, USA. 2003,
Pacific Edge Biotechnology Pte Ltd: USA.
21. Sobral, B., Bioinformatics
and the future role of computing in biology, in From
Jay Lush to Genomics:
Visions for animal breeding and genetics. 1999.
22. Vapnik, V.N., Statistical
Learning Theory. 1998: Wiley Inter-Science. 736.
23. Bosnic, Z., et al., Evaluation
of prediction reliability in regression using the
transduction principle.
EUROCON 2003. Computer as a Tool. The IEEE Region 8,
2003. 2: p. 99 - 103.
24. Chen, Y., G. Wang, and
S. Dong, Learning with progressive transductive support
vector machine.
Pattern Recognition Letters, 2003. 24(12): p. 1845
- 1855.
25. Joachims, T. Transductive
Inference for Text Classification using Support Vector
Machines. in Proceedings
of the Sixteenth International Conference on Machine
Learning. 1999: Morgan Kaufmann Publishers Inc. San
Francisco, CA, USA.
26. Wu, D., et al. Large Margin
Trees for Induction and Transduction. in Proceedings
for 16th International
conference of machine learning. 1999. Bled, Slovenia:
Morgan Kaufmann 1999.
27. Li, C.H. and P.C. Yuen.
Transductive learning: Learning Iris data with two
labeled data. in ICANN
2001. 2001. Berlin, Heidelberg: Springer Verlag.
28. Joachims, T. Transductive
Learning via Spectral Graph Partitioning. in Proceedings
of the Twentieth
International Conference on Machine Learning, ICML-2003.
2003. Washington DC.
29. Kasabov, N. and S. Pang,
Transductive Support Vector Machines and Applications
in Bioinformatics
for Promoter Recognition. Neural Information Processing
- Letters and Reviews, 2004. 3(2): p. 31-38.
30. Li, J. and C.-S. Chua.
Transductive inference for color-based particle filter
tracking. in Proceedings
of International Conference on Image Processing, 2003.
2003. Nanyang Technol. Univ., Singapore.
31. Proedrou, K., et al. Transductive
confidence machine for pattern recognition. in Proceedings
of
the 13th European Conference on Machine Learning. 2002:
Springer-Verlag.
32. Pang, S. and N. Kasabov.
Inductive vs Transductive Inference, Global vs Local
Models: SVM, TSVM, and SVMT
for Gene Expression Classification Problems. in International
Joint Conference on Neural Networks, IJCNN 2004. 2004.
Budapest: IEEE Press.
33. Wolf, L. and S. Mukherjee,
Transductive learning via Model selection. 2004,
The center for Biological
and Computational Learning, Massachusetts Institute
of Technology: Cambridge, MA.
34. Li, F. and H. Wechsler,
Watch List Face Surveillance Using Transductive Inference.
Lecture Notes in Computer
Science, 2004. 3072: p. 23-29.
35. Weston, J., et al., Feature
selection and transduction for prediction of molecular
bioactivity
for drug design. Bioinformatics, 2003. 19(6): p. 764-771.
36. Kukar, M., Transductive
reliability estimation for medical diagnosis. Artifical
intelligence in medicine,
2003. 29: p. 81 - 106.
37. Bennett, K.P. and A. Demiriz.
Semi-supervised support vector machines. in Proceedings
of the 1998
conference on Advances in neural information processing
systems II. 1998: MIT Press, Cambridge, MA, USA.
38. Liu, H. and S.-T. Huang,
Evolutionary semi-supervised fuzzy clustering. Pattern
Recognition Letters, 2003.
24: p. 3105-3113.
39. Song, Q. and N. Kasabov,
TWRBF - Transductive RBF Neural Network with Weighted
Data Normalization.
Lecture Notes in Computer Science, 2004. 3316: p. 633-640.
40. Shipp, M.A., et al., Supplementary
Information for Diffuse large B-cell lymphoma outcome
prediction
by gene-expression profiling and supervised machine
learning. Nature Medicine, 2002. 8(1): p. 68-74.
41. DeRisi, J., et al., Use
of a cDNA microarray to analyse gene expression patterns
in human cancer.
Nature Genetics, 1996. 14(4): p. 457-60.
42. Furey, T.S., et al., Support
vector machine classification and validation of cancer
tissue samples
using microarray expression data. Bioinformatics, 2000.
16(10): p. 906-914.
43. Mitchell, M.T., Machine
Learning,. MacGraw-Hill, 1997.
44. Kohonen, T., Self-Organizing
Maps. Second ed. 1997: Springer, Verlag.
45. Bezdek, J.C., Pattern
Recognition with Fuzzy Objective Function Algorithms.
1981, New York: Plenum
Press.
46. Futschik, M.E. and N.K.
Kasabov. Fuzzy clustering of gene expression data.
in Fuzzy Systems, 2002. FUZZ-IEEE'02.
Proceedings of the 2002 IEEE International Conference
on. 2002.
47. Dembele, D. and P. Kastner,
Fuzzy C-means method for clustering microarray data.
Bioinformatics.,
2003. 19(8): p. 973-80.
48. Alon, U., et al., Broad
patterns of gene expression revealed by clustering
analysis of tumor
and normal colon tissues probed by oligonucleotide
arrays. PNAS, 1999. 96(12): p. 6745-6750.
49. Lukashin, A.V. and R.
Fuchs, Analysis of temporal gene expression profiles:
clustering by simulated
annealing and determining the optimal number of clusters.
Bioinformatics, 2001. 17: p. 405-414.
50. Arbib, M., ed. The Handbook
of Brain Theory and Neural Networks. 2nd ed. 2003,
MIT Press: Cambridge,
MA.
51. Kasabov, N., Evolving
Connectionist Systems. Methods and Applications in
Bioinformatics, Brain Study
and Intelligent Machines. 2002, London: Springer-Verlag.
52. Kasabov, N. and Q. Song.
GA-parameter optimisation of evolving connectionist
systems for classification
and a case study from bioinformatics. in ICONIP'2002
- International Conference on Neuro-Information Processing,
Singapore. 2002: IEEE Press.
53. Kasabov, N., Evolving
fuzzy neural networks for on-line supervised/unsupervised,
knowledge-based
learning. IEEE Trans. SMC - part B, Cybernetics, 2001.
31(6): p. 902-918.
54. Kasabov, N., Adaptive
Learning method and system, in University of Otago.
2000: New Zealand.
55. Kasabov, N. and Q. Song,
DENFIS: Dynamic, evolving neural-fuzzy inference
systems and its application
for time-series prediction. IEEE Trans. on Fuzzy Systems,
2002. 10(2): p. 144-154.
56. Kasabov, N., et al., Medical
Applications of Adaptive Learning Systems, in PCT
NZ03/00045. 2002,
Pacific Edge Biotechnology Pte Ltd: New Zealand.
57. Gollub, J., et al., The
Stanford Microarray Database: data access and quality
assessment tools.
Nucl. Acids. Res., 2003. 31(1): p. 94-96.
58. Gollub, T.R., et al.,
Molecular classification of cancer: class discovery
and class prediction by
gene expression monitoring. Science, 1999. 286(5439):
p. 531-537.
59. Holland, J.H., Adaptation
in natural and artificial systems,. 1975: The University
of Michigan
Press, Ann Arbor, MI.
60. Goldberg, D.E., Genetic
Algorithms in Search, Optimisation and Machine Learning.
1989, Reading: Addison-Wesley.
61. Fogel, G. and D. Corne,
Evolutionary Computation for Bioinformatics. 2003:
Morgan Kaufmann Publ.
62. Ando, S., E. Sakamoto,
and H. Iba. Evolutionary Modelling and Inference
of Genetic Networks. in the
6th Joint Conference on Information Sciences. 2002.
63. Kasabov, N. and D. Dimitrov.
A method for gene regulatory network modelling with
the use of evolving
connectionist systems. in ICONIP'2002 - International
Conference on Neuro-Information Processing. 2002. Singapore:
IEEE Press.
[C1]Please define
[C2]This is a little too vague. Should be explained
in more detail