Michael E. Cotterell Mustafa V. Nural http://cobweb.cs.uga.edu/~jam/scalation.html John A. Miller This ontology aims to provide comphresensive yet practical representation of data analytics framework. 1.0 Analytics Ontology Indicates whether the input matrix (predictors) of this model is ill-conditioned. If the log of condition number of the matrix is greater than or equal to 18, it is determined to be ill-conditioned. Analysis of Covariance ANCOVA A model may be considered as an ANCOVA model if in addition to conforming to GLM requirements it has some categorical variables (factors). Analysis of Variance ANOVA An ANCOVA model may be considered as an ANOVA model if it only has categorical variables (factors). A Bernoulli distribution is a probability distribution that models the Bernoulli trials process. A sequence of Bernoulli trials satisfies the following assumptions: i) Each trial has two possible outcomes, in the language of reliability called success and failure; ii) The trials are independent; iii) On each trial, the probability of success is p and the probability of failure is 1−p where p is an element of [0,1]. Bernoulli Distribution http://www.math.uah.edu/stat/bernoulli/Introduction.html Dichotomous A variable with a binary variable type generally denotes that the domain of discourse of the variable is dichotomous. This means that the variable can only take on one of two values. Binary Binomial Distribution Polychotomous A variable with a categorical variable type generally denotes that the domain of discourse of the variable is polychotomous. This means that the variable can take on one of a finite number of values. Categorical A variable with a continuous variable type generally denotes that the domain of discourse for the variable is the set of real numbers. Continuous true Dependent Model A variable with a discrete variable type generally denotes that the domain of discourse of the variable includes only elements that are individually separate and distinct. Discrete A probability distribution assigns a probability to each measurable subset of the possible outcomes of a random experiment, survey, or procedure of statistical inference. Distribution An exponential distribution is a probability distribution that models the time between subsequent events based on a rate parameter. Exponential Distribution http://www.math.uah.edu/stat/poisson/Exponential.html The exponential family of distributions is the set of probability distributions whose probaility distribution functions (PDFs) and cumulative distribution functions (CDFs) can be written such that their terms are moved to the exponent (often called the canonical form. FISHER, R. A. (1934). Two new properties of mathematical likelihood. Proceedings of the Royal Society A, 144, 285-307. Exponential Family 0 Exponential Regression 0 A GZLM model may be considered as an Exponential Regression Model if the residual distribution of the model is Exponential Distribution and the response variable is a Non-Negative Continuous variable. A function is a relation that uniquely associates members of one set with members of another set. Function http://mathworld.wolfram.com/Function.html true Generalized Estimating Equations GEE 0 General Linear Model GLM model makes two major assumptions 1) The response variable is a Continuous variable. 2) The residuals are Normally distributed. GLM 0 A GZLM model may be considered as a GLM model if its residuals are Normally Distributed and the response variable is a Continuous variable. true Generalized Linear Mixed Models GLMM Gaussian Mixture Models 16 false Generalized Linear Model GZLM The gamma distribution family is a subset of the exponential family of distributions that have a shape parameter and a rate parameter. Gamma Distribution http://www.math.uah.edu/stat/special/Gamma.html Gamma Regression false Independent Model false A model may be considered an independent model if the model does not have repeated observations (i.e., dependency within response). A variable with an integer variable type generally denotes that the domain of discourse for the variable is the set of integer numbers. Integer The inverse function is a function that reverses another function. Inverse Function Inverse Gaussian Regression false 16 0 Log Linear Regression 0 Logistic Regression 0 A GZLM model may be considered as a Logistic Regression Model if the residual distribution of the model is Bernoulli Distribution and the response variable is a binary variable. All individuals that have at least one variable are considered a Model. A model then could be inferred to have more specific model types depending on how it conforms to the equivalence axioms stated in the hierarchy. Model class and its subclasses define different model types that can be used for analyzing data. There are many ways of specifying the class hierarchy for a collection of model types, however, we have given priority to correspondence with implementations of these types (e.g., ScalaTion). This becomes important when running models generated from the abstract models represented in the ontology. Model Multinomial Distribution 0 Multinomial Logistic Regression MLR Multiple Regression Multiple Linear Regression A GLM model may be considered a Multiple Linear Regression if at least some of its variables are Continuous variables. Naïve Bayes Naive Bayes Negative Binomial Distribution true 0 Negative Binomial Regression A variable with a continuous non-negative variable type generally denotes that the domain of discourse for the variable is the set of all non-negative real numbers. Non-Negative Continuous Natural A variable with an integer non-negative variable type generally denotes that the domain of discourse for the variable is the set of all non-negative integer numbers. Non-Negative Integer Normal Distribution A variable with an ordinal variable type generally denotes that the domain of discourse of the variable is polychotomous and ordered. This means that the variable can take on one of a finite number of values and that the set has an ordering. Ordinal Ordinal Logistic Regression true Perceptron Poisson Distribution 0 Poisson Regression 0 A GZLM model may be considered as a Poisson Regression model if the residual distribution of the model is Binomial Distribution and the response variable is a Non-Negative Integer. Polynomial Regression The reciprocal function, given a number x, returns 1/x (i.e., the reciprocal). Reciprocal Function Response Surface Analysis 1 SLR Simple Linear Regression 1 A MLR model may be considered a Simple Linear Regression model if it has only one predictor variable. The square root function, given a number x, returns a value y such that y * y = x. In most cases, this function produces a positive value. If the input to the function is a negative number, then the function produces a complex number. Square Root Function Time Series Model 0 Transformed Multiple Regression Transformed Multiple Linear Regression Trigonometric Regression A variable is an element, feature, or factor that is liable to vary or change. Variable The type of a variable usually denotes or restricts its corresponding domain of discourse. Variable Type 100.0 Weighted Least Squares Regression true Zero-Inflated Negative Binomial Regression true Zero-Inflated Poisson Regression false false false false false false false Identity Function The identity function, given a number x, returns x (the identity). Natural Log Function The natural logarithm function, given a number x, produces a result y such that x = exp(y). Log Function The logit function is the inverse of the logistic transform. Logit Function