IEEE International Workshop on Soft Computing Applications

Global, Local and Personalized Modeling and Pattern Discovery in Bioinformatics: An Integrated Approach

Nikola Kasabov

Knowledge Engineering and Discovery Research Institute, KEDRI (www.kedri.info),

Auckland University of Technology, Auckland, Private Bag 920010, New Zealand, nkasabov@aut.ac.nz

Abstract

The paper is a comparative study of major modeling and pattern discovery approaches applicable to the area of Bioinformatics and the area of decision support systems in general. These approaches include inductive versus transductive reasoning, global, local, and personalized modeling and their potentials are illustrated on a case study of gene expression and clinical data related to cancer outcome prognosis. While inductive modeling is used to develop a model (function) from data on the whole problem space and then to recall it on new data, transductive modeling is concerned with the creation of single model for every new input vector based on some closest vectors from the existing problem space. The paper uses several techniques to illustrate these approaches – multiple linear regression, Bayesian inference, support vector machines, evolving connectionist systems (ECOS), weighted kNN – each of them providing different accuracy on specific problem and facilitating the discovery of different patterns and rules from data.

Keywords: transductive reasoning, personalized modeling; knowledge discovery; local modeling; evolving connectionist systems, Bioinformatics, gene expression data, medical decision support systems, personalized probabilities; cancer prognosis.

Bioinformatics – An Area of Exponentially Increasing Data Volume and Emergence of Knowledge

With the completion of the sequence draft of the human genome and the genomes of other species (more to be sequenced during this century) the task is now to be able to process this vast amount of ever growing dynamic information and to create intelligent systems for data analysis and knowledge discovery, from cells to whole organisms and species[1] [2].

The central dogma of molecular biology is that the DNA (Dioxyribonucleic Acid) present in the nucleus of each cell of an organism is transcribed into RNA, which is translated into proteins [3]. Genes are complex molecular structures that cause dynamic transformation of one substance into another during the whole life of an individual, as well as the life of the human population over many generations[4]. Even the static information about a particular gene is very difficult to understand (see the GenBank database www.genebank.com). When genes are “in action”, the dynamics of the processes in which a single gene is involved are thousand times more complex, as this gene interacts with many other genes, proteins, and is influenced by many environmental and developmental factors[5].

Modelling these interactions and extracting meaningful patterns – knowledge, is a major goal for the area of Bioinformatics. Bioinformatics is concerned with the application and the development of the methods of information sciences for the collection, storage, analysis, modelling and knowledge discovery from biological and medical data.

The whole process of the expression of genes and the production of proteins, and back to the genes, evolves over time. Proteins have 3D structures that evolve over time governed by physical and chemical laws. Some proteins bind to the DNA and make some genes to express and may suppress the expression of other genes. The genes in an individual may mutate, change slightly their code, and may therefore express differently at a next time. Genes represent both static and dynamic information that is difficult to capture as patterns[6, 7].

Gene and protein expression values can be measured through micro-array equipment [8] thus making this information available for a medical decision making, such as medical prognosis and diagnosis, and drug design.

Many challenging problems in Bioinformatics need to be addressed and new knowledge about them revealed, to name only some of them:

- Recognizing patterns from sequences of DNA, e.g. promoter recognition[9];

- Recognizing patterns in RNA data (e.g. splice junctions between introns and exons; micro RNA structures; non-coding regions analysis)

- Profiling gene microarray expression data from RNA in different types of tissue (cancer vs normal), different types of cells, to identify profiles of diseases[10-15];

- Predicting protein structures;

- Modeling metabolism in cells[16, 17];

- Modeling entire cells[16];

- Modeling brain development and brain diseases[7, 18, 19];

- Creating complex medical decision support systems that deal with a large set of variables that include both gene and clinical variables to obtain the right diagnosis and prognosis for a patient[20].

A main approach to understanding gene interaction and life science in general and to solve the above problems is mathematical and computational modeling [21]. The more new information is made available about DNA, gene expression, protein creation, metabolic pathways, etc., the more accurate their information models will become. They should be adaptive to any new information made available in a continuous way. The process of biological patterns and knowledge discovery is also evolving.

The main contribution of the paper is the comparative study of different modeling approaches to solving problems in Bioinformatics with the emphasis not only on the accuracy of the model, but on the type of patterns – knowledge, that these models facilitate to discover from data. Section 2 introduces briefly three generic types of modeling approaches applicable to the problems listed above, namely - global, local and personalized modeling. It also introduces one particular modeling technique – evolving connectionist systems (ECOS). These approaches are applied in section 3 on a case study problem of modeling and profile discovery from gene expression and clinical data related to cancer outcome prognosis. Section 4 discusses some modeling issues of gene regulatory networks while section 5 presents further research directions in the area of Bioinformatics and in the area of medical decision support systems in general. The main conclusion is that for a detailed research on a complex problem different levels of knowledge need to be discovered – at global, local and personalised levels. Knowing the potentials of different modeling approaches, each of them can be applied to available data on a problem and results can facilitate the discovery process of complex patterns and rules of life.

2. Inducive versus Transductive Reasoning. Global, Local and Personalised Modelling

2.1. Inductive versus Transductive Reasoning

The widely used in all fields of science inductive reasoning approach is concerned with the creation of a model (a function) from all available data, representing the entire problem space and the model is applied then on new data (deduction). Transductive inference, introduced by Vapnik [22] is defined in contrast as a method used to estimate the value of a potential model (function) only for a single point of space (that is, a new data vector) by utilizing additional information related to that vector. While the inductive approach is useful when a global model of the problem is needed in an approximate form, the transductive approach is more appropriate for applications where the focus is not on the model, but rather on every individual case. This is very much related to clinical and medical applications where the focus needs to be centered on individual patient’s conditions.

The transductive approach is related to the common sense principle [23] which states that to solve a given problem one should avoid solving a more general problem as an intermediate step. The reasoning behind this principle is that, in order to solve a more general problem, resources are wasted or compromised which is unnecessary for solving the individual problem at hand (that is, function estimation only on given points). This common sense principle reduces the more general problem of inferring a functional dependency on the whole input space (inductive approach) to the problem of estimating the values of a function only at given points (transductive approach).

In the past years, transductive reasoning has been implemented for a variety of classification tasks such as text classification [24, 25], heart disease diagnostics [26], synthetic data classification using graph based approach [27], digit and speech recognition [28], promoter recognition in bioinformatics [29], image recognition [30] and image classification [31], micro array gene expression classification [32, 33] and biometric tasks such as face surveillance [34]. This reasoning method is also used in prediction tasks such as predicting if a given drug binds to a target site [35] and evaluating the prediction reliability in regression [23] and providing additional measures to determine reliability of predictions made in medical diagnosis [36]. Out of several research done in area that utilize transductive principal as mentioned, transductive support vector machines [25] and semi-supervised support vector machines [37] stand as often citied research [38].

In transductive reasoning, for every new input vector xi that needs to be processed for a prognostic/classification task, the Ni nearest neighbors, which form a data subset Di, are derived from an existing dataset D and a new model Mi is dynamically created from these samples to approximate the function in the locality of point xi only. The system is then used to calculate the output value yi for this input vector xi.

This approach has been implemented with radial basis function [39] in medical decision support systems and time series prediction problem, where individual models are created for each input data vector. This approach gives a good accuracy for individual models and has promising applications especially in medical decision support systems. This transductive approach has also been applied using support vector machines as the base model in area of bioinformatics [29, 32] and the results indicate that transductive inference performs better than inductive inference models mainly because it exploits the structural information of unlabeled data. However, there are a few open questions that need to be addressed while implementing transductive modeling, e.g.: How many neighboring samples K are need? What type of distance measure to use when choosing the neighbors? What model to apply on the neighboring samples? These issues will be addressed in section 3.

2.2. Global, local and personalised modelling

The three main approaches investigated in the paper are:

- Global modeling – a model is created from data, that covers the whole problem space and is represented as a single function, e.g. a regression formula.

Local modeling - a set of local models are created from data, each representing a sub-space (cluster) of the problem space, e.g. a set of rules;

Individualised (personalised) modeling – a model is created only for a single point (vector, patient record) of the problem space using transductive reasoining.

To illustrate the concepts of global, local and personalised modelling, here we use a case study problem and a publicly available data set from Bioinformatics - the DLBCL lymphoma data set for predicting survival outcome over 5 years period. This data set contains 58 vectors – 30 cured DLBCL lymphoma disease cases, and 28 - fatal [12, 40]. There are 6,817 gene expression variables. Clinical data is available for 56 of the patients represented as IPI- an International Prognostic Index[C1] , which is an integrated number representing overall effect of several [C2] clinical variables [12, 40]. The task is, based on the existing data, to: (1) create a prognostic system that predicts the survival outcome of a new patient; (2) to extract profiles that can be used to provide an explanation for the prognosis; (3) to find markers (genes) that can be used for the design of new drugs to cure the disease or for an early diagnosis.

Using a global linear regression method on the 11 DLBCL prognostic genes [12, 40] (denoted as X1, X2,…,X11) for the 58 vectors, normalised in the range [0,1], we derive the following classification model:

Y=0.36 + 0.53 X1 – 0.12 X2 - 0.41 X3 - 0.44 X4 + 0.34 X5 + 0.32 X6 – 0.07 X7 + 0.5 X8 - 0.5 X9 + 0.18 X10 + 0.3 X11 (1)

Formula (1) constitutes a global model (i.e. it is to be used to evaluate the output for any input vector in the 11-dimensional space regardless of where it is located). It indicates to certain degree the importance of the genes (e.g., gene X7 has the lowest importance), but it does not give any information about the relationship between the genes in different sub-spaces of the problem space and more importantly, the relevance of each of these genes to the prediction of survival of an individual. The model, being global (e.g. “going” through the whole space) gives the “big” picture, but is difficult to adapt on new data. Linear and logistic regression methods have been widely used for gene expression modelling [41] [42] and for modelling gene regulatory networks [5] [17].

Another statistical machine learning method, that is widely used for the creation of classification models, is the support vector machine (SVM) [22]. A SVM is kind of a global model, but instead of a single formula, a SVM consists of a set of vectors described as kernel functions that are on the border area between the samples that belong to different classes (called support vectors). SVM models are very good classification models, but are difficult to adapt and the knowledge extracted from them is very limited. SVM models have been used in many research papers [12, 22].

In contrast to the global models, local models are created to evaluate the output function for only a sub-space of the problem space. Multiple local models (e.g. one for each cluster of data) can constitute together the complete model of the problem over the whole problem space. Local models are often based on clustering techniques. A cluster is a group of similar data samples, where similarity is measured predominantly as Euclidean distance in an orthogonal problem space. Clustering techniques include: k-means[43]; Self-Organising Maps (SOM) [41, 44], fuzzy clustering [45-47], hierarchical clustering [48], simulated annealing [49]. In fuzzy clustering one sample may belong to several clusters to a certain membership degree, the sum of which is 1. Generally speaking, local models are easier to adapt to new data and can provide a better explanation for individual cases. The ECF model, described in the next sub-section, is a representative of multiple local models based on clustering.

A “personalised” model is created “on the fly” for every new input vector and this individual model is based on the closest data samples to the new sample taken from a data set. A simple example of personalised modelling technique is the K-NN (nearest neighbour) method, where for every new sample, the nearest K samples are derived from a data set using a distance measure, usually Euclidean distance, and a voting scheme is applied to define the class label for the new sample [22, 43].

In the K-NN method, the output value yi for a new vector xi is calculated as the average of the output values of the k nearest samples from the data set Di. In the weighted K-NN method (WKNN) the output yi is calculated based on the distance of the K-NN samples to xi:

(1)

where: yj is the output value for the sample xj from Di and wj are their weights measured as distance from the new input vector:

(2)

In Eq. (2) the vector d = [d1, d2, … dNi] is defined as the distances between the new input vector xi and Ni nearest neighbours (xj, yj) for j = 1 to Ni; max(d) and min(d) are the maximum and minimum values in d respectively. The weights wj have the values between min(d)/max(d) and 1; the sample with the minimum distance to the new input vector has the weight value of 1, and it has the value min(d)/max(d) in case of maximum distance.

If WKNN is used to solve a classification problem and two classes are represented by 0 (class 1) and 1 (class 2) output class labels, the output for a new input vector xi calculated in eq.(1) has the meaning of a personalised probability that the new vector xi will belong to class 2. In order to finally classify a vector xi into one of the (two) classes, there has to be a probability threshold selected Pthr, so that if yi >= Pthr, then the sample xi is classified in class 2. For different values of the threshold Pthr, the classification error might be different.

Personalised probability is calculated with the use of transductive reasoning and is different from local probability (probability of class 2 samples in a local region – cluster) and is also different from the global probability measure (the usual way to deal with probabilities) N2/N, where N2 is the number of samples in the whole problem space that belong to class 2 and N is the total number of all samples.

Using global probability measures to evaluate a probability of single input vector x to belong to a class A (the Bayesian probability inference approach) requires that some prior probabilities are available and these are not easy to obtain and often too uncertain. The Bayesian posterior probability p(A|x) of a new input vector x to belong to class A is calculated with the use of the formula:

(3)

where: p(A) and p(x) are prior probabilities and p(A|x) and p(x|A) are posterior probabilities.

Calculating personalised probability (in a transductive way) does not require any prior information.

3. Evolving Connectionist Systems ECOS for Local Modelling and Cluster-based Rule Discovery

3.1. The ECOS architecture

Some traditional neural network models are seen as “black boxes” and are not very useful models for the discovery of new patterns from data [50]. A new type of neural networks, evolving connectionist systems (ECOS) were introduced in [51]. They allow for structural adaptation, fast incremental, on-line learning, and rule extraction and rule adaptation. One of its simplest implementations is the evolving classifier function ECF [51, 52] (see fig.1).

[Figure 1]

The ECOS from fig.1 consists of five layers of neurons and four layers of connections. The first layer of neurons receives the input information. The second layer (optional) calculates the fuzzy membership degrees to which the input values belong to predefined fuzzy membership functions, e.g. Low, Medium, or High. The membership functions can be kept fixed, or can change during training. The third layer of neurons represents associations between the input and the output variables, rules. The fourth layer (optional) calculates the degree to which output membership functions are matched by the rule node activation, and the fifth layer does defuzzification and calculates values for the output variables.

3.2. The ECOS learning algorithms

ECOS in general are connectionist systems that evolve their structure and functionality in a continuous, self-organised, on-line, adaptive, interactive way from incoming information. They can learn from data in a supervised or unsupervised way. Learning is based on clustering of input vectors and function estimation for the clusters in the output space. Prototype rules can be extracted to represent the clusters and the functions associated with them. The ECOS models allow for an incremental change of the number and types of inputs, outputs, nodes, connections. The algorithm to evolve a simple classification system called ECF (Evolving Classification Function) from incoming stream of data is shown in fig.2. The internal nodes in the ECF structure capture clusters of input data and are called rule nodes.

[Figure 2]

Different types of rules are facilitated by different ECOS architectures, i.e. Zadeh-Mamdani rules - in the evolving fuzzy neural networks EfuNN [53, 54] – see fig.1, or Takagi-Sugeno rules - in the dynamic neuro-fuzzy inference systems DENFIS[55]. An ECOS structure grows and “shrinks” in a continuous way from input data streams. Feed-forward and feedback connections are both used in the architecture. The ECOS are not limited in number and types of inputs, outputs, nodes, connections. Several machine learning methods are facilitated in different types of ECOS that have been already applied Bioinformatics problems from section 1[51].

4. A Comparative Study of Global, Local and Personalised Modelling on the Case Study of Gene Expression and Clinical Information

4.1. Problem definition and data sets

A gene expression profile is defined here as a pattern of expression of a number of significant genes for a group (cluster) of samples of a particular output class or category. A gene expression profile is represented here as an IF-THEN inference rule:

IF <A pattern of gene expression values of selected genes is observed> THEN <There is a likelihood for a certain diagnostic or prognostic outcome>.

Having profiles/rules for a particular disease makes it possible to set up early diagnostic tests so that a sample can be taken from a patient, data related to the sample processed, and then mapped into the existing profiles. Based on similarity between the new data and the existing profiles, the new data vector can be classified as belonging to the group of “good outcome”, or “poor outcome” with a certain confidence and a good explanation can be provided for the final decision as the matched local rules/profile(s) will be the closest to the person’s individual profile[56].

Contemporary technologies, such as gene microarrays, allow for the measurement of the level of expression of up to 30,000 genes in RNA sequences that is indicative of how much protein will be produced by each of these genes in the cell[57]. The goal of the microarray gene expression data analysis is to identify a gene or a group of genes that are differently expressed in one state of the cell or a tissue (e.g. cancer) versus another state (normal)[58]. Generally, it is difficult to find consistent patterns of gene expression for a class of tissues.

Gene expression data is often accompanied by clinical data variables. The issue of gene and clinical variables integration for the discovery of combined patterns is addressed here as well.

4.2. Experimental results with the use of different modeling techniques

The two main reasoning approaches – inductive and transductive are used here to develop global, local and personalized models on the same data in order to compare different approaches on two main criteria – (1) accuracy of the model; and (2) type of patterns discovered from data. The following classification techniques are used: multiple linear regression (MLR); SVM; ECF; WKNN.

Each of the models are validated through the same leave-one-out cross validation method [22]. The accuracy of the different models is presented in Table 1. It can be seen that the transductive reasoning and personalized modeling is sensitive to the selection of the K value. Its optimization is discussed in the next section.

The best accuracy is manifested by the local ECF model, trained on a combined feature vector of 11 gene expression variables and the clinical variable IPI. Its prognostic accuracy is 88% (83% for class 1- cured, and 92% for class 2- fatal). This compares favorably with the 75% accuracy of the SVM model used in[12].

[Table 1]

In addition, local rules that represent cluster gene profiles of the survival versus the fatal group of patients were extracted as graphically shown in fig.3. These profiles show that there is no single variable that clearly discriminates the two classes – it is a combinations of them that discriminates different sub-groups within a class and between classes.

[Figure 3]

The local profiles can be aggregated into global class profiles through averaging the variable values across all local profiles that represent one class – fig.4. Global profiles may not be very informative if data samples are dispersed in the problems space and each class samples are spread out in the space.

[Figure 4]

4. Model Optimisation with the Use of Evolutionary Computation

4.1. Evolutionary computation

Using a same modelling technique, but for different parameter values and different input variables, may lead to different results and different information extracted from the same initial data set. One way to optimise these parameters and obtain an optimal model according to certain criteria (e.g. classification accuracy) is through evolutionary computation techniques[59, 60]. One of them - genetic algorithms[60], is an optimisation technique that generates a population of individual solutions (models) for a problem, e.g. classification systems, and trains these systems on data, so that after training, the best systems (e.g. with the highest accuracy - fitness) can be selected and some operations of “crossover” and “mutation” applied on them to obtain the next generation of models [60]. The process continues until a satisfactory model is obtained. Applications of GA for gene expression data modelling and GRN modelling are presented in [61, 62]. The problem of the evolutionary computation techniques is that there is no guaranteed optimal solution obtained, as they are heuristic search techniques in a solution space. This is in contrast to the exhaustive search technique that will guarantee an optimal solution, but the time the procedure would take may not be acceptable and practically applicable.

4.2. Experimental results – Optimisation of local models - ECF

In the models explored in the previous section, neither the model parameters (such as Rmax, Rmin, m and number of membership functions in an ECF model; K in the WKNN; etc), nor the set of input variables (features) were optimised to produce the best accuracy. Out of 11 genes and the IPI clinical features, there may by only a sub-set of them that would produce better result (if the other ones were noisy features).

In an experiment shown in Fig. 5 both the ECF parameters and features are optimised with the use of a GA which ran over 20 generations, each of them containing 20 ECF models with different parameter values with a fitness criteria being the overall highest accuracy for a smaller number of features. The optimal ECF parameters are given in the figure and the best model has an overall accuracy of 90.66%, which is higher than any of the non-optimised models from Table 1.

[Figure 5]

4.3. Experimental results - Optimisation of Transductive, Personalised Models

We noticed from Table 1 that the accuracy of the transductive, personalised models depend on the choice of some parameters – K, distance measure, model parameters (e.g. ECF parameters). Optimising these parameters during the process of the model development “on the fly” is experimented here and results presented in fig.6. For every sample of the 56 samples in the DLBCL Lymphoma data set (one IPI variable and 11 gene variables) optimised values of the number of the neighbouring samples K (fig.6.a) and the type of distance measure (fig.6.b) is defined with the use of a GA optimization procedure. The fitness function is the cross validation accuracy in a leave-one-out method for all K samples in the neighbourhood.

So, here not only a personalised model is derived for every new data sample, but an optimal one is created through GA optimisation procedure.

[Figure 6]

5. Gene Regulatory Network Modelling and Discovery

In a living cell genes interact in a complex, dynamic way and this interaction is crucial for the cell behavior. This interaction can be represented in an approximate way as a gene regulatory network (GRN)[5]. An example is shown in fig. 7.

GRN models can be derived from time course gene expression data of many genes measured over a period of time. Some of these genes have similar expressions to each other as shown in fig. 8.

Genes that share similar functions usually show similar gene expression profiles and cluster together. In a GRN clusters can be used and represented by nodes instead of genes or proteins. A GRN model can be used to predict the expression of genes and proteins in a future time and to predict the development of a cell or an organism. The process of deriving GRN from data is called reverse engineering[5]. Many methods of computational intelligence and machine learning have been used so far for the problem, that include: correlation and regression analysis, Boolean networks, graph theory, differential equations, evolutionary computation, neural networks, etc.

In [63] local modeling with ECOS (EFuNN and DENFIS) was introduced on a small data set of Leukemia cell line U937 data to extract GRN and to represent it as a set of rules associating the expression of the genes at time t, with the level of their expression in the next time moment (t + dt).

An ECOS is incrementally evolved from a series of gene expression vectors X(t0), X(t1), X(t2), …, representing the expression values of all, or some of the genes or their clusters. Consecutive vectors X(t) and X(t+k) are used as input and output vectors respectively in an ECOS model, as shown in fig.1. After training of an ECOS on the data, rules are extracted, e.g.:

IF x1(t) is High (0.87) and x2(t) is Low (0.9)

THEN x3 (t+k) is High (0.6) and x5(t+k) is Low. (4)

Each rule represents a transition between a current and a next state of the system variables - genes. All rules together form a representation of the GRN.

By modifying a threshold for rule extraction, one can extract in an incremental way stronger, or weaker patterns of relationships between the variables [56].

Using the DENFIS ECOS [55] other types of variable relationship rules in a GRN can be extracted, e.g.:

IF x1(t) is (0.63 0.70 0.76) and x2(t) is (0.71 0.77 0.84) and

x3(t) is (0.71 0.77 0.84) and x4(t) is (0.59 0.66 0.72)

THEN x5(t+k) = 1.84 -1.26x1(t) - 1.22x2(t) + 0.58x3(t) - 0.3 x4(t), (5)

where the cluster for which the value of the gene variable x5 is defined in the rule above, is a fuzzy cluster represented through triangular membership functions defined as triplets of values for the left-, centre-, and right points of the triangle on a normalisation range of [0,1]. The fuzzy representation allows for dealing with imprecise data. The rules extracted from the ECOS form a representation of the GRN. Rules may change with the addition of new data, thus making it possible to identify stable versus dynamic parts of the GRNs.

6. Conclusions and Future Directions

The problems in Bioinformatics are too complex to be adequately modeled with the use of a single approach. The paper compared the main existing approaches to modeling and pattern discovery from biological data on the case study of cancer prognostic data consisting of gene expression and clinical variables. The approaches discussed are: inductive and transductive reasoning; global, local and personalized modeling. As a general conclusion, for a detailed study on a given problem and for the discovery of patters the characterise different aspects of the processes, all these approaches need to be applied and the results interpreted in an integrated way.

New methods are needed in the future for the integration of biological data – both molecular and clinical; for a personalised drug design and personalised medicine; for building embedded systems and implementing them into biological environments; for computational modelling of proteins and gene regulatory networks; and for many other challenging problems in Bioinformatics.

Acknowledgement

The work is funded by the NERF – FRST grant AUTX0201 at the Auckland University of Technology, New Zealand (www.aut.co.nz). The data analysis in the paper was conducted with the use of two software environments - NeuCom (www.theneucom.com, or www.kedri.info/) and SIFTWARE (available from Pacific Edge Biotechnology Ltd (www.peblnz.com). I would like to thank my students and associates Nisha Mohan, Dougal Greer, Peter Hwang, Dr Qun Song for the implementation of some of the code of the experimental software.

REFERENCES

1. Dow, J., G. Lindsay, and J. Morrison, Biochemistry Molecules, Cells and the Body. 1995, Boston, MA: Addison-Wesley. 592.

2. Baldi, P. and S. Brunak, Bioinformatics. A Machine Learning Approach. 2nd ed. 2001, Cambridge, MA: MIT Press. 351.

3. Crick, F., Central dogma of molecular biology. Nature, 1970. 227: p. 561-563.

4. Snustad, D.P. and M.J. Simmons, The Principles of Genetics. 2003: wiley.

5. D'Haeseleer, P., S. Liang, and R. Somogyi, Genetic network inference: from co-expression clustering to reverse engineering. Bioinformatics, 2000. 16(8): p. 707-726.

6. Collado-Vides, J. and R. Hofestadt, eds. Gene Regulation and Metabolism. Post-Genomic Computational Approaches. 2002, MIT Press: Cambridge, MA. 310.

7. Marnellos, G. and E.D. Mjolsness, Gene network models and neural development, in Modeling Neural Development, A. vanOoyen, Editor. 2003, MIT Press: Cambridge, MA. p. 27-48.

8. Quakenbush, J., Microarray data normalization and transformation. Nature Genetics, 2002. 32: p. 496-501.

9. Bajic, V., et al., Computer model for recognition of functional trascription start sites in RNA polymerase II promoters of vertebrates. J. Molecular Graphics and Modelling, 2003(21): p. 323-332.

10. Ramaswamy, S., et al., Multiclass cancer diagnosis using tumor gene expression signatures. Proceedings of the National Academy of Sciences of the United States of America, 2001. 98(26): p. 15149.

11. Perou, C., et al., Molecular portraits of human breast tumours. Nature, 2000. 406.

12. Shipp, M.A., et al., Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature Medicine, 2002. 8(1): p. 68-74.

13. Singh, D., et al., Gene expression correlates of clinical prostate cancer behavior. Cancer Cell, 2002. 1: p. 203-209.

14. van de Vijver, M.J., et al., A Gene-Expression Signature as a Predictor of Survival in Breast Cancer. N Engl J Med, 2002. 347(25): p. 1999-2009.

15. Veer, L.J.v.t., et al., Gene expression profiling predicts clinical outcome of breast cancer. Nature, 2002. 415(6871): p. 530.

16. Vides, J., B. Magasanik, and T. Smith, Integrated approaches to molecular biology. 1996: The MIT Press.

17. Bower, J. and H. Bolouri, eds, Computational Modelling of Genetic and Biochemical Networks. 2001, Cambridge, MA: The MIT Press.

18. LeCun, Y., J.S. Denker, and S.A. Solla, Brain damage, in Advances in Neural Information Processing Systems, D.S. Touretzky, Editor. 1990, Morgan Kaufmann: San Francisco, CA. p. 598-605.

19. Kasabov, N. and L. Benuskova, Computational neurogenetics. Journal of Computational and Theoretical Nanoscience, 2004. 1(1): p. in press.

20. Kasabov, N., et al., Medical Decision Support Systems Utilizing Gene Expression and Clinical Information And Methods for Use, in PCT/US03/25563, USA. 2003, Pacific Edge Biotechnology Pte Ltd: USA.

21. Sobral, B., Bioinformatics and the future role of computing in biology, in From Jay Lush to Genomics: Visions for animal breeding and genetics. 1999.

22. Vapnik, V.N., Statistical Learning Theory. 1998: Wiley Inter-Science. 736.

23. Bosnic, Z., et al., Evaluation of prediction reliability in regression using the transduction principle. EUROCON 2003. Computer as a Tool. The IEEE Region 8, 2003. 2: p. 99 - 103.

24. Chen, Y., G. Wang, and S. Dong, Learning with progressive transductive support vector machine. Pattern Recognition Letters, 2003. 24(12): p. 1845 - 1855.

25. Joachims, T. Transductive Inference for Text Classification using Support Vector Machines. in Proceedings of the Sixteenth International Conference on Machine Learning. 1999: Morgan Kaufmann Publishers Inc. San Francisco, CA, USA.

26. Wu, D., et al. Large Margin Trees for Induction and Transduction. in Proceedings for 16th International conference of machine learning. 1999. Bled, Slovenia: Morgan Kaufmann 1999.

27. Li, C.H. and P.C. Yuen. Transductive learning: Learning Iris data with two labeled data. in ICANN 2001. 2001. Berlin, Heidelberg: Springer Verlag.

28. Joachims, T. Transductive Learning via Spectral Graph Partitioning. in Proceedings of the Twentieth International Conference on Machine Learning, ICML-2003. 2003. Washington DC.

29. Kasabov, N. and S. Pang, Transductive Support Vector Machines and Applications in Bioinformatics for Promoter Recognition. Neural Information Processing - Letters and Reviews, 2004. 3(2): p. 31-38.

30. Li, J. and C.-S. Chua. Transductive inference for color-based particle filter tracking. in Proceedings of International Conference on Image Processing, 2003. 2003. Nanyang Technol. Univ., Singapore.

31. Proedrou, K., et al. Transductive confidence machine for pattern recognition. in Proceedings of the 13th European Conference on Machine Learning. 2002: Springer-Verlag.

32. Pang, S. and N. Kasabov. Inductive vs Transductive Inference, Global vs Local Models: SVM, TSVM, and SVMT for Gene Expression Classification Problems. in International Joint Conference on Neural Networks, IJCNN 2004. 2004. Budapest: IEEE Press.

33. Wolf, L. and S. Mukherjee, Transductive learning via Model selection. 2004, The center for Biological and Computational Learning, Massachusetts Institute of Technology: Cambridge, MA.

34. Li, F. and H. Wechsler, Watch List Face Surveillance Using Transductive Inference. Lecture Notes in Computer Science, 2004. 3072: p. 23-29.

35. Weston, J., et al., Feature selection and transduction for prediction of molecular bioactivity for drug design. Bioinformatics, 2003. 19(6): p. 764-771.

36. Kukar, M., Transductive reliability estimation for medical diagnosis. Artifical intelligence in medicine, 2003. 29: p. 81 - 106.

37. Bennett, K.P. and A. Demiriz. Semi-supervised support vector machines. in Proceedings of the 1998 conference on Advances in neural information processing systems II. 1998: MIT Press, Cambridge, MA, USA.

38. Liu, H. and S.-T. Huang, Evolutionary semi-supervised fuzzy clustering. Pattern Recognition Letters, 2003. 24: p. 3105-3113.

39. Song, Q. and N. Kasabov, TWRBF - Transductive RBF Neural Network with Weighted Data Normalization. Lecture Notes in Computer Science, 2004. 3316: p. 633-640.

40. Shipp, M.A., et al., Supplementary Information for Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature Medicine, 2002. 8(1): p. 68-74.

41. DeRisi, J., et al., Use of a cDNA microarray to analyse gene expression patterns in human cancer. Nature Genetics, 1996. 14(4): p. 457-60.

42. Furey, T.S., et al., Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics, 2000. 16(10): p. 906-914.

43. Mitchell, M.T., Machine Learning,. MacGraw-Hill, 1997.

44. Kohonen, T., Self-Organizing Maps. Second ed. 1997: Springer, Verlag.

45. Bezdek, J.C., Pattern Recognition with Fuzzy Objective Function Algorithms. 1981, New York: Plenum Press.

46. Futschik, M.E. and N.K. Kasabov. Fuzzy clustering of gene expression data. in Fuzzy Systems, 2002. FUZZ-IEEE'02. Proceedings of the 2002 IEEE International Conference on. 2002.

47. Dembele, D. and P. Kastner, Fuzzy C-means method for clustering microarray data. Bioinformatics., 2003. 19(8): p. 973-80.

48. Alon, U., et al., Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. PNAS, 1999. 96(12): p. 6745-6750.

49. Lukashin, A.V. and R. Fuchs, Analysis of temporal gene expression profiles: clustering by simulated annealing and determining the optimal number of clusters. Bioinformatics, 2001. 17: p. 405-414.

50. Arbib, M., ed. The Handbook of Brain Theory and Neural Networks. 2nd ed. 2003, MIT Press: Cambridge, MA.

51. Kasabov, N., Evolving Connectionist Systems. Methods and Applications in Bioinformatics, Brain Study and Intelligent Machines. 2002, London: Springer-Verlag.

52. Kasabov, N. and Q. Song. GA-parameter optimisation of evolving connectionist systems for classification and a case study from bioinformatics. in ICONIP'2002 - International Conference on Neuro-Information Processing, Singapore. 2002: IEEE Press.

53. Kasabov, N., Evolving fuzzy neural networks for on-line supervised/unsupervised, knowledge-based learning. IEEE Trans. SMC - part B, Cybernetics, 2001. 31(6): p. 902-918.

54. Kasabov, N., Adaptive Learning method and system, in University of Otago. 2000: New Zealand.

55. Kasabov, N. and Q. Song, DENFIS: Dynamic, evolving neural-fuzzy inference systems and its application for time-series prediction. IEEE Trans. on Fuzzy Systems, 2002. 10(2): p. 144-154.

56. Kasabov, N., et al., Medical Applications of Adaptive Learning Systems, in PCT NZ03/00045. 2002, Pacific Edge Biotechnology Pte Ltd: New Zealand.

57. Gollub, J., et al., The Stanford Microarray Database: data access and quality assessment tools. Nucl. Acids. Res., 2003. 31(1): p. 94-96.

58. Gollub, T.R., et al., Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science, 1999. 286(5439): p. 531-537.

59. Holland, J.H., Adaptation in natural and artificial systems,. 1975: The University of Michigan Press, Ann Arbor, MI.

60. Goldberg, D.E., Genetic Algorithms in Search, Optimisation and Machine Learning. 1989, Reading: Addison-Wesley.

61. Fogel, G. and D. Corne, Evolutionary Computation for Bioinformatics. 2003: Morgan Kaufmann Publ.

62. Ando, S., E. Sakamoto, and H. Iba. Evolutionary Modelling and Inference of Genetic Networks. in the 6th Joint Conference on Information Sciences. 2002.

63. Kasabov, N. and D. Dimitrov. A method for gene regulatory network modelling with the use of evolving connectionist systems. in ICONIP'2002 - International Conference on Neuro-Information Processing. 2002. Singapore: IEEE Press.

[C1]Please define

[C2]This is a little too vague. Should be explained in more detail