VARIABLE TYPES
 Factor = Categorical (Qualitative)Independent variable.
 Factor= IV (input)
 Response=DV (output)
 Independent variable are also called “regressors,“controlled variable,” manipulated variable,” “explanatory variable,” “exposure variable,” and/or “input variable.
 Dependent variables are also called “response variable,” “regressand,” “measured variable,” “observed variable,” “responding variable.
FACTORIAL DESIGN
 Factorial design is a study design that is used to examine how two or more categorical IVs /predictors (Factors) predict or explain an outcome.
 (Nway) ANOVA is a statistical test to find the significance of main effects and interactions.
 Strengthcan look at effect of each factor separately and also in combination with other factors.
 Weaknessgets complicated and hard to interpret when more than two factors.
 Levels–subdivisons of each IV. For example:3 polymer levels of High, Medium and Low.
 ConditionsAll levels of each IV are combined with all levels of other IVs to produce all poaasible conditions.
 Factorial Notations
 Number of numbers refers to total number of factors in design (e.g. 2×2 = 2 factors, 2x2x2 = 3 factors)
 Number values refer to the number of levels of each factor (e.g. 2×2 = 2 factors each at 2 levels)
 All levels of each independent variable are combined with all levels of the other independent variables to produce all possible conditions (e.g. 3×4 = 2 factors, one with 3 levels and one with 4 levels to produce a total of 12 possible conditions).
 Main effecteffect of ane factor(IV) on response (DV) ignoring any other IVs.
 RepeatedMeasure Factorial Designeach paticipant undergoes each conditions within experiment (reduces participant numer and within subject variance but high possibility of carryover effect)
 Randomly Assigned Factorial Design–each participant is randomly assigned to just one condition.
 Mixed Factorial Design–Repeated measure on one IV while radomly assigned participant on other IV.
 NonManipulated Variablesome preexisting condition of the participants, that experimenter cannot change. (E.g. Gender, Race, Height etc).
 Correlational Factorial Design– two or more predictors (Factors) that are not manipulated in the study.
 QuasiExperimental Factorial Design– two or more quasiIVs, meaning that the IVs are manipulated but participants are not randomly assigned to IV conditions.
 Experimental Factorial Designtwo or more IVs that are manipulated and in which participants are randomly assigned to IV levels.
 Hybrid Factorial Designat least one experimental IV and at least one quasiIV
 Cella comparison of one level of a factor across a level of another factor.
ANOVA Vs REGRESSION
 ANOVA and regression ( or multivariate regression) are really the same models. But, Regression uses numeric/continuous IV instead of categorical (or factor) IV in ANOVA/MANOVA.
 ANOVA/MANOVA both come in “Nway” varieties.
 Oneway ANOVAmeasures effect of one independent variable (i.e,effect of polymer type on EE of NPs)
 TwoWay and Threeway ANOVA also k/a Factorial ANOVA measure effect of 2 factors (Polymer, drug) and 3 factors (Polymer, drug, surfactant) on EE of NPs respectively.
 Univariate analysis is a descriptive analysis of one variable.
 Oneway ANOVA is a “bivariate” analysis.
Bivariate Analysis
 Bivariate analysis involves the analysis of two variables (often denoted as X, Y), to explores the concept of relationship between two variables, whether there exists an association and the strength of this association, or whether there are differences between two variables and the significance of these differences. There are three types of bivariate analysis.
 (1) Numerical & Numerical Bivariate analysis
 Scatter Plot, Linear Correlation(r).
 Linear correlation quantifies the strength of a linear relationship between two numerical variables. When there is no correlation between two variables, there is no tendency for the values of one quantity to increase or decrease with the values of the second quantity.
 (2) Categorical & Categorical Bivariate analysis–
 Stacked Column Chart, Chisquare Test .
 Chisquare test can determine the association between categorical variables. It is based on the difference between expected frequencies (e) and observed frequencies (n) in one or more categories in the frequency table.
 The chisquare distribution returns a probability for the computed chisquare and the degree of freedom. A probability of zero shows a complete dependency between two categorical variables and a probability of one means that two categorical variables are completely independent.
 (1) Numerical & Numerical Bivariate analysis
 (3)Numerical & Categorical Bivariate analysis–
 Line Chart with Error Bars, Ztest and ttest, ANOVA
 The ANOVA test assesses whether the averages of a numerical dependent variable (2 or more in case of MNOVA) for more than two groups (categorical IV) are statistically different from each other.
ANOVA Vs MANOVA
 The difference between ANOVA and a “Multivariate Analysis of Variance” (MANOVA) is the “M”, which stands for multivariate.
 Unlike ANOVA, MANOVA compares for two or more continuous response (or dependent) variables.
 Like ANOVA, MANOVA has both a oneway flavor and an Nway flavor.
 The number of factors (categorical independent variables) involved distinguish a oneway MANOVA from a twoway MANOVA.
 To measure effect of single factor (Polymer) on particle size and EE of NPs is examples of OneWay MANOVA .
 To measure effect of 2 factors (Polymer, drug) on particle size and EE of NPs is examples of TwoWay MANOVA .
ANOVA Vs ANCOVA Vs MANCOVA
 The difference between ANOVA and ANCOVA is the letter “C”, which stands for ‘covariance’.
 Like ANOVA, “Analysis of Covariance” (ANCOVA) has a single continuous/numerical response variable (DV).
 Unlike ANOVA, ANCOVA compares a response variable by both a factor (qualitative, categorical IV) and a continuous/numerical IV(e.g. comparing test score by both ‘level of education’ and ‘number of hours spent studying’).
 The term for the continuous/numerical independent variable used in ANCOVA is “covariate”.
 Unlike ANCOVA, MANCOVA compares for two or more continuous response (or dependent) variables.
ANOVA
ANOVA uses the F–test to determine whether the variability between group means is larger than the variability of the observations within the groups.
F value = variance of the group means (Mean square Between) / mean of the within group variances(Mean square error)
If that ratio is sufficiently large, you can conclude that not all the means are equal.
 Note that the Fcritical value can be obtained from a computer before the experiment is run, as long as we know how many subjects will be studied and how many levels the explanatory variable has.
 Then when the experiment is run, we can calculate the observed Fstatistic and compare it to Fcritical.
 If the observed Fstatistic is smaller than the critical value, we retain the null hypothesis because the pvalue must be bigger than alpha, and if the observed Fstatistic is equal to or bigger than the Fcritical value, we reject the null hypothesis because the pvalue must be equal to or smaller than alpha.
F statistics
http://www.socr.ucla.edu/Applets.dir/F_Table.html
 An F statistic is a value you get when you run an ANOVA test or a regression analysis to find out if the mean between two populations are significantly different. It’s similar to a T statistics from a TTest.
 The Fdistribution is a nonnegative distribution in the sense that F
values, which are squares, can never be negative numbers.  While the ttest compares “means”, ANOVA compares the “variance” between the populations.
 tTest=when population means of only two groups is to be compared ANOVA = preferred when means of three or more groups are to be compared.
 AT test will tell you if a single variable is statistically significant and an F test will tell you if a group of variables are jointly significant (Interaction effect along with main effects is possible for “ANOVA with replication”) .
The F test results have both an F value and an F critical value.
 The F critical value is also called the F statistic.
 The value you calculate from your data is called the F value (without the “critical” part).
In general, if the calculated F value is larger than the F statistic, we can reject the null hypothesis. If the null hypothesis is true, we expect F to have a value close to 1.0 most of the time.
However, we should also consider the p value .
The p value is the evidence against a null hypothesis. The smaller the pvalue, the stronger the evidence that you should reject the null hypothesis.
Read your pvalue first. If the pvalue is small (less than your alpha level ), you can reject the null hypothesis. Only then should you consider the fvalue. If you don’t reject the null, ignore the fvalue.
Choosing a Statistical Test (3 groups of Tests)
Examples of Chi Squared Test
Chisquared test
 It is a special type of test that deals with frequency of data instead of means like some other tests(they look for independence of events instead of simple numerical difference.).
 It is most useful for data that is nonparametric .
 degrees of freedom is not just the sample size minus 1.
df = (# rows – 1) x (# columns – 1) . (In the case of scary movies example, we have 1 degree of freedom.)
Assumptions for a chisquare independence test If these assumptions hold, χ^{2} test statistic follows a χ^{2} distribution

 independent observations. This usually not always holds
 For a 2 by 2 table, all expected frequencies > 5.
For a larger table, all expected frequencies > 1 and no more than 20% of all cells may have expected frequencies < 5.
Chisquared test of independence
 2 categorical variables from a single population.
 data are collected randomly from a population, to determine if there is significant association between two categorical variables.
 For example, in a university, students might be classified their gender (female or male) or by their primary major (mathematics, chemistry, history, etc.). We use a chisquare test for independence to determine whether gender is related to their choice of study.
Chisquared test of homogeneity
 only 1 categorical variable from 2 (or more) populations.
 data are collected by sampling each subgroup separately, to determine if the frequency count differed significantly across different populations.
 For example, in a survey of subject preferences, we might ask students for their favorite subject. We ask the same question of two different populations, such as females and males. We then use a chisquare test for homogeneity to determine whether female subject preferences differed significantly from male subject preferences.
Post hoc Tests (Multiple comparison analysis in ANOVA)
https://www.graphpad.com/support/faqid/1091/
 Tests conducted on subsets of data tested previously in another analysis are called post hoc tests.
 Posthoc test is used for situations where you can decide which comparisons you want to make after looking at the data. You don’t need to plan ahead.
 A class of post hoc tests that provide this type of detailed information for ANOVA results are called “multiple comparison analysis” tests.Multiple comparison test applies whenever you make several comparisons at once.
 The most commonly used multiple comparison analysis statistics include the following tests: Tukey, NewmanKeuls, Scheffee, Bonferroni and Dunnett.
 These statistical tools each have specific uses, advantages and disadvantages.
 Some are best used for testing theory while others are useful in generating new theory.
We can make several types of Multiple comparisons. There are several ways we can do this:(STDB)

 All possible comparisons, including averages of groups. So you might compare the average of groups A and B with the average of groups C, D and E. Or compare group A, to the average of BF (Scheffe’s test).
 All possible pairwise comparisons. Compare the mean of every group with the mean of every other group, such as mean of group A with mean of group B (Tukey or NewmanKeuls).
 All against a control. If group A is the control, you may only want to compare A with B, A with C, A with D; but not compare B with C or C with D (Dunnett’s test).
 Only a few comparisons based on your scientific goals. So you might want to compare A with B and B with C and that’s it.(Bonferroni’s test).
 Planned comparison tests require that you focus in on a few scientifically sensible comparisons. You can’t decide which comparisons to do after looking at the data. The choice must be based on the scientific questions you are asking, and be chosen when you design the experiment.

Orthogonal comparisonWhen you only make a few comparison, the comparisons are called “orthogonal” when the each comparison is among different groups. Comparing Groups A and B is orthogonal to comparing Groups C and D, because there is no information in the data from groups A and B that is relevant when comparing Groups C and D. In contrast, comparing A and B is not orthogonal to comparing B and C.

When comparisons are orthogonal, the comparison can use ordinary t tests. You may still want to use the Bonferroni correction to adjust the significance level.
Covariance Vs Correlation Vs Regression
Covarianceindicates only the direction of linear relationship between variables. Covariance values are not standardized, can range from negative infinity to positive infinity.
Correlation determines corelationship or association of two variables (extent to which two variables tend to change together). It describes both the strength and the direction of the relationship. Correlation coefficient values are standardized values, can range from 1 to+1.
RegressionIt describes numeric relation between an independent variable to dependent variable. Regression indicates the impact of a unit change in known variable (x) on the estimated variable (y).
Correlation is used to represent the linear relationship between two variables. On the contrary, regression is used to fit the best line and estimate one variable on the basis of another variable.
Regression, Multiple Regression
Simple linear regression plots one independent variable X against one dependent variable Y. Technically, in regression analysis,
 independent variable is called = predictor variable (‘x’)
 dependent variable is called = criterion variable (‘y’)
Multiple regression– It use multiple independent(“x” variables) or predictors variables used in the regression.
Regression analysis can result in linear or nonlinear graphs. A linear regression is where the relationships between your independent and dependent variables can be described with a straight line.Nonlinear regression produces curved line.
 ExampleIn onevariable linear regression, you would input one independent variable (i.e. “sales”) against an independent variable (i.e. “profit”). But you might be interested in how different types of sales effect the regression. You could set your X_{1} as one type of sales, your X_{2} as another type of sales and so on.
 Simple regression analysis uses a single x variable for each dependent “y” variable. For example: (x_{1}, Y_{1}).
 Multiple regression uses multiple “x” variables for each independent variable: (x1)_{1}, (x2)_{1}, (x3)_{1}, Y_{1}).
 Simple regression: Y = b_{0} + b_{1} x.
 Multiple regression: Y = b_{0} + b_{1} x1 + b_{0} + b_{1} x2…b_{0}…b_{1} xn.
The output would include a summary, similar to a summary for simple linear regression, that includes:
 R (the multiple correlation coefficient),
 R squared (the coefficient of determination),
 adjusted Rsquared,
 The standard error of the estimate.
Pearson Vs Spearman correlation
 Minitab offers two different correlation analyses:
Pearson product moment correlation– “linear/proportional relationship” between two continuous (interval/ratio) variables.
 For example, to evaluate whether increases in temperature at your production facility are associated with decreasing thickness of your chocolate coating.
Spearman rankorder correlationIt evaluates “monotonic relationship” between two ordinal/ranked variables (the variables tend to change together, but not necessarily at a constant rate). The Spearman correlation coefficient is based on the ranked values for each variable rather than the raw data.
 For example, to evaluate whether the order in which employees complete a test exercise is related to the number of months they have been employed.
Correlation Coefficient (R), Rsquared, Adjusted R squared
There are several types of correlation coefficient:Pearson’s correlation coefficient ‘r’ (most common), Cramer’s V correlation etc.
R, or Pearson’s r, is a measure of the strength and direction of the linear relationship between two variables.
The absolute value of the correlation coefficient (The formulas return a value between 1 and 1,) gives the relationship strength. The larger the number, the stronger the relationship. For example, .75 = .75, which has a stronger relationship than .65.
 1 indicates a strong positive relationship
 1 indicates a strong negative relationship
 A result of zero indicates no relationship at all.
Correlation can be rightfully explained for simple linear regression – because you only have one x and one y variable. For multiple linear regression R is computed, but R square is a better term. You can explain R square for both simple linear regressions and also for multiple linear regressions.
 Coefficient of determination, denoted R^{2} or r^{2}, is the proportion of the variance in the dependent variable that is predictable from the independent variable(s).

Coefficient of Determination (R Squared) can never be negative – since it is a squared value (always between 0 and 1).
 R^{2} shows how well terms (data points) fit a curve or line.
 Adjusted R^{2} also indicates how well terms fit a curve or line, but adjusts for the number of terms in a model. If you add more and more useless variables to a model, adjusted rsquared will decrease. If you add more useful variables, adjusted rsquared will increase.
 Adjusted R^{2} will always be less than or equal to R^{2}.
Standard Error of a Sample
 The standard error(SE) is very similar to standard deviation.
 Both are measures of spread. The higher the number, the more spread out your data is.
 While the standard error uses statistics (sample data) standard deviations use parameters (population data).
“OneWay” Vs “TwoWay
 Oneway has one independent variable or factor (with 3 or more levels, number of observations need not to be same in each group). For example: brand of cereal.
 Twoway has two independent variables or factors (it can have multiple levels, same number of observations in each group). For example: brand of cereal, calories.
“Groups” or “Levels”
A level of an independent variable, means that the variables can be split up into separate parts.
For example, let’s say you were studying the effect of alcohol on performance in a driving simulator. Alcohol — the independent variable — could be composed of different parts: no alcohol, two drinks, four drinks. Each of those parts is called a level.
In the above example, levels for IV “brand of cereal” might be Lucky Charms, Raisin Bran, Cornflakes — a total of three levels. The levels for IV “Calories” might be: sweetened, unsweetened — a total of two levels.
Replication

 A Two way ANOVA without replication can compare a group of individuals performing more than one task (like unpaired ttest for two groups). For example, you could compare students’ scores across a battery of tests.
 A two–way ANOVA is usually done with replication (more than one observation for each combination of the nominal variables).
https://www.youtube.com/watch?v=2fytt7BZJMI
https://www.youtube.com/watch?v=Zb1wxUEbbJ4
REFERENCES
 Google.com
 Quora.com
 https://slideplayer.com/slide/5147312/
 https://medium.com/@rndayala/datalevelsofmeasurement4af33d9ab51a
 courses.engr.illinois.edu/cs498ka4/fa2018/CS498KA_Fall18_Lecture12.pdf
 https://www.statisticshowto.com/probabilityandstatistics/hypothesistesting/anova/
 https://www.statisticshowto.com/probabilityandstatistics/fstatisticvaluetest/
 https://www.statisticshowto.com/probabilityandstatistics/correlationcoefficientformula/
 http://www.realstatistics.com/twowayanova/twofactoranovawithoutreplication/
 https://www.youtube.com/watch?v=Zb1wxUEbbJ4
 https://www.ncbi.nlm.nih.gov/pubmed/22420233
 https://apiproject1022638073839.appspot.com/questions/whatisthedifferencebetweenthechisquaredtestforindependenceandthechi
 https://www.saedsayad.com/numerical_numerical.htm