News & EventsJuly 1, 2008 June 3, 2008 May 2008 February 19, 2008 |
![]() GlossaryAccuracyMeasures the overall classification performance. For example, in a sample set consisting of X samples of phenotype 1 and Y samples of phenotype 2, if the classification accurately predicts A samples of phenotype 1 and B samples of phenotype Y, accuracy is defined as (A+B)/(X+Y). See also Sensitivity and Specificity. AlgorithmA set of mathematical rules for solving complex problems with the aid of computer technology. Correlogic develops algorithms as computational tools to understand complex biological data. Bayesian NetBayesian nets consist of a collection of Bayesian classifiers connected in manner resembling a neural net. A Bayesian net uses adjusted probabilities to arrive at an answer where neural nets use non-linear transformed dot products. BioinformaticsThe scientific discipline that encompasses all aspects of biological information acquisition, processing, storage, distribution, analysis and interpretation that combines the tools of mathematics, computer science and biology with the aim of understanding the biological significance of a variety of data. BiomarkerA specific biochemical in the body which has a particular molecular feature that makes it useful for measuring the progress of disease or the effects of treatment. CentroidThe center of a cluster. Cluster HomogeneityIn a perfect classification, each cluster would be composed of only a single phenotype. In practice no classification is perfect, and each cluster is a mix of phenotypes, usually with one phenotype predominating. The cluster heterogeneity is the percentage of the dominant phenotype in a cluster. Decision BoundaryThe decision boundary defines the edge of a cluster. If the cluster is a spherical one, the decision boundary would be the set of points at a fixed distance (radius) from the centroid. FeatureThe name given to the index value of a datastream. For example, in a mass spectrometer, the features are the m/z values; in NMR, the features are the chemical shift values; in an expression array, the features are the gene names. Genetic Algorithm (GA)A genetic algorithm (or short GA) is a technique used to search through large, complex data sets to rapidly identify near-optimal solutions. They are most effective in high dimensional space where linear statistical analyses lose their power. GenomicsThe study of the human genome—the entire genetic composition of each individual. This discipline and its sibling, “Proteomics,” are the cutting edge of the biotechnology revolution, allowing scientists to delve further than ever before into the nature and origin of human physiology and disease. GC-FAIMSA separation technique that uses a serial combination of gas chromatography and High-Field Asymmetric Waveform Ion Mobility Spectrometry. GC-MSA separation technique that uses a serial combination of gas chromatography and mass spectrometry. HeuristicsA learning method employing experimentation, evaluation, and trial-and-error methods to learn, discover, understand, or solve problems. A heuristic is a rule. The KDE is a rule finding algorithm. KDE™The Knowledge Discovery Engine® – Correlogic’s patented for efficient pattern discovery in highly complex systems. kNNK-nearest neighbor analysis. A kNN model is a map of a corpus of known data vectors. When an unknown data vector is presented to the model a score is produced based on the k known vectors nearest to the unknown vector. For example, if k=7 the vectors nearest to an unknown vector consisted of three state 0 and 4 state 1, the score returned is 4/7, or, 0.5714. LC-MSA separation technique that uses a serial combination of liquid chromatography and mass spectrometry. Lead Cluster MapA fast clustering technology employed during a KDE modeling process. Logical ChromosomeA unique combination of features. Mass SpectrometerAn analytical instrument that ionizes samples and separates them based on their mass to charge ratio. MetabolomicsAn extension of proteomics in that proteins catalyze biochemical reactions that either produce or consume small molecules, or metabolites. Disease processes affect metabolites in ways characteristics of the disease, e.g., hepatitis, myocardial disease. A global understanding of metabolite dynamics could lead to better diagnosis and treatment of disease. ModelA KDE model is a collection of clusters and their decision radii in N-dimension, where N is the number of features in the chromosome forming the map. Neural NetAn artificial neural net is a supervised non-linear modeling algorithm based on a theoretical conception of biological memory and learning. Artificial neural nets are considered to be universal function approximators. NormalizeA method of scaling data to a common dynamic range. NMRNuclear Magnetic Resonance. A NMR spectrum provides specific qualitative information regarding a chemical or mixture of chemicals. PhenotypeThe classification state given to a sample. A phenotype is a refection of the expression of one or more genes. PCAPrincipal Component Analysis is a statistical method that is often applied to non-linear, complex data. Protein Separation and Sequencing EquipmentThe type of specialized equipment that generates protein data for analysis by the algorithms forming the basis of Correlogic’s proprietary software. ProteomicsThe study of proteins and their interactions. Much the same as genomics where the goal was structural and functional knowledge of the entire set of human genes, the goal of proteomics is the identification and characterization of all human proteins. However, the number of proteins may be indeterminate. Proteins undergo post-synthesis modifications before they become functional. The nature of these modifications and their products increases the complexity of proteomics far beyond that encountered in genomics. Proteome QuestA software realization of the KDE algorithm used to analyze data streams generated from protein profiles. PSA TestAcronym for Protein Specific Antigen test, the most widely used test for the detection of prostate cancer. RobustnessClassification robustness is a measure of how accurate a classification scheme generated on one data set will be on a completely independent data set. Classifications that retain accuracy when challenged by more, independent data, are considered robust. Classifications that lose accuracy when challenged by more independent sets lack robustness. Self-Organizing Map (SOM)An unsupervised learning method that uses data in a training data set to define a two dimensional surface where dense data is spread out to reveal hidden detail and sparse data still retains its identity. Sensitivity and SpecificityMeasures of classification performance. In a binary classification these values measure the accuracy of prediction of each phenotype. For example, in a sample set consisting of X samples of phenotype 1 and Y samples of phenotype 2, if the classification accurately predicts A samples of phenotype 1 and B samples of phenotype Y, sensitivity may be defined as A/X and specificity as B/Y. See also Accuracy.
|