Michael E. Cotterell
Mustafa V. Nural
http://cobweb.cs.uga.edu/~jam/scalation.html
John A. Miller
This ontology aims to provide comphresensive yet practical representation of data analytics framework.
1.0
Analytics Ontology
Indicates whether the input matrix (predictors) of this model is ill-conditioned. If the log of condition number of the matrix is greater than or equal to 18, it is determined to be ill-conditioned.
Analysis of Covariance
ANCOVA
A model may be considered as an ANCOVA model if in addition to conforming to GLM requirements it has some categorical variables (factors).
Analysis of Variance
ANOVA
An ANCOVA model may be considered as an ANOVA model if it only has categorical variables (factors).
A Bernoulli distribution is a probability distribution that models the Bernoulli trials process. A sequence of Bernoulli trials satisfies the following assumptions: i) Each trial has two possible outcomes, in the language of reliability called success and failure; ii) The trials are independent;
iii) On each trial, the probability of success is p and the probability of failure is 1−p where p is an element of [0,1].
Bernoulli Distribution
http://www.math.uah.edu/stat/bernoulli/Introduction.html
Dichotomous
A variable with a binary variable type generally denotes that the domain of discourse of the variable is dichotomous. This means that the variable can only take on one of two values.
Binary
Binomial Distribution
Polychotomous
A variable with a categorical variable type generally denotes that the domain of discourse of the variable is polychotomous. This means that the variable can take on one of a finite number of values.
Categorical
A variable with a continuous variable type generally denotes that the domain of discourse for the variable is the set of real numbers.
Continuous
true
Dependent Model
A variable with a discrete variable type generally denotes that the domain of discourse of the variable includes only elements that are individually separate and distinct.
Discrete
A probability distribution assigns a probability to each measurable subset of the possible outcomes of a random experiment, survey, or procedure of statistical inference.
Distribution
An exponential distribution is a probability distribution that models the time between subsequent events based on a rate parameter.
Exponential Distribution
http://www.math.uah.edu/stat/poisson/Exponential.html
The exponential family of distributions is the set of probability distributions whose probaility distribution functions (PDFs) and cumulative distribution functions (CDFs) can be written such that their terms are moved to the exponent (often called the canonical form.
FISHER, R. A. (1934). Two new properties of mathematical likelihood. Proceedings of the Royal Society A, 144, 285-307.
Exponential Family
0
Exponential Regression
0
A GZLM model may be considered as an Exponential Regression Model if the residual distribution of the model is Exponential Distribution and the response variable is a Non-Negative Continuous variable.
A function is a relation that uniquely associates members of one set with members of another set.
Function
http://mathworld.wolfram.com/Function.html
true
Generalized Estimating Equations
GEE
0
General Linear Model
GLM model makes two major assumptions
1) The response variable is a Continuous variable.
2) The residuals are Normally distributed.
GLM
0
A GZLM model may be considered as a GLM model if its residuals are Normally Distributed and the response variable is a Continuous variable.
true
Generalized Linear Mixed Models
GLMM
Gaussian Mixture Models
16
false
Generalized Linear Model
GZLM
The gamma distribution family is a subset of the exponential family of distributions that have a shape parameter and a rate parameter.
Gamma Distribution
http://www.math.uah.edu/stat/special/Gamma.html
Gamma Regression
false
Independent Model
false
A model may be considered an independent model if the model does not have repeated observations (i.e., dependency within response).
A variable with an integer variable type generally denotes that the domain of discourse for the variable is the set of integer numbers.
Integer
The inverse function is a function that reverses another function.
Inverse Function
Inverse Gaussian Regression
false
16
0
Log Linear Regression
0
Logistic Regression
0
A GZLM model may be considered as a Logistic Regression Model if the residual distribution of the model is Bernoulli Distribution and the response variable is a binary variable.
All individuals that have at least one variable are considered a Model. A model then could be inferred to have more specific model types depending on how it conforms to the equivalence axioms stated in the hierarchy.
Model class and its subclasses define different model types that can be used for analyzing data. There are many ways of specifying the class hierarchy for a collection of model types, however, we have given priority to correspondence with implementations of these types (e.g., ScalaTion). This becomes important when running models generated from the abstract models represented in the ontology.
Model
Multinomial Distribution
0
Multinomial Logistic Regression
MLR
Multiple Regression
Multiple Linear Regression
A GLM model may be considered a Multiple Linear Regression if at least some of its variables are Continuous variables.
Naïve Bayes
Naive Bayes
Negative Binomial Distribution
true
0
Negative Binomial Regression
A variable with a continuous non-negative variable type generally denotes that the domain of discourse for the variable is the set of all non-negative real numbers.
Non-Negative Continuous
Natural
A variable with an integer non-negative variable type generally denotes that the domain of discourse for the variable is the set of all non-negative integer numbers.
Non-Negative Integer
Normal Distribution
A variable with an ordinal variable type generally denotes that the domain of discourse of the variable is polychotomous and ordered. This means that the variable can take on one of a finite number of values and that the set has an ordering.
Ordinal
Ordinal Logistic Regression
true
Perceptron
Poisson Distribution
0
Poisson Regression
0
A GZLM model may be considered as a Poisson Regression model if the residual distribution of the model is Binomial Distribution and the response variable is a Non-Negative Integer.
Polynomial Regression
The reciprocal function, given a number x, returns 1/x (i.e., the reciprocal).
Reciprocal Function
Response Surface Analysis
1
SLR
Simple Linear Regression
1
A MLR model may be considered a Simple Linear Regression model if it has only one predictor variable.
The square root function, given a number x, returns a value y such that y * y = x. In most cases, this function produces a positive value. If the input to the function is a negative number, then the function produces a complex number.
Square Root Function
Time Series Model
0
Transformed Multiple Regression
Transformed Multiple Linear Regression
Trigonometric Regression
A variable is an element, feature, or factor that is liable to vary or change.
Variable
The type of a variable usually denotes or restricts its corresponding domain of discourse.
Variable Type
100.0
Weighted Least Squares Regression
true
Zero-Inflated Negative Binomial Regression
true
Zero-Inflated Poisson Regression
false
false
false
false
false
false
false
Identity Function
The identity function, given a number x, returns x (the identity).
Natural Log Function
The natural logarithm function, given a number x, produces a result y such that x = exp(y).
Log Function
The logit function is the inverse of the logistic transform.
Logit Function