COURSE CONTENT:
LECTURES
1. Populations and samples. Measuring scales. Categorical and numerical features.
2. Descriptive statistics. Numerical analysis of distributions - mean values and measures of dispersion. Graphical analysis of distributions - pie-charts, bar charts, histograms, box and whisker graphs.
3. Confidence intervals, hypothesis testing, p-values.
4. Student's t-test.
5. Non-parametric statistical tests. Analyses of contingency tables.
6. Simple and multiple linear regression.
7. Analysis of variance.
8. Linear models for classification.
9. Methods of regularization of linear regression coefficients (ridge regression, LASSO, elastic net models)
10. Assessment and selection of the optimal model - resampling methods (bootstrap, cross-validation), variable selection methods.
11. Methods based on trees (regression and classification trees, bagging method, boosting method, Random Forest method)
12. Support vector machines
13. Hidden Markov models
14. Principal component analysis (PCA)
15. Clustering methods (hierarchical clustering, k-means clustering)
PRACTICALS
1. Analysis of distributions of variables in the statistical environment R. Calculation of mean values and measures of dispersion using the functions included in the base package.
2. Analysis of distributions of variables in the statistical environment R. Functions for graphic analysis of samples.
3. Calculating confidence intervals. Hypothesis testing and errors in hypothesis testing (type I error and type II error). Power of the statistical test.
4. Comparisons of different types of Student's t-test (with respect to dependence of observations in samples, sample size and sample variance) in the R statistical environment.
5. Implementation of Fisher's test, Chi-square test and hypergeometric test in statistical environment R.
6. Analysis of the linear regression model. Interactions. Qualitative predictors. Transformation of non-linear regression models. Residual analysis.
7. Implementation of analysis of variance in the R statistical environment.
8. Comparison of different linear models for classification (logistic regression, linear discriminant analysis, quadratic discriminant analysis) in the R statistical environment.
9. Comparison of ridge regression, LASSO and elastic net models in the statistical environment R (glmnet package). Choosing the optimal lambda parameter.
10. Implementation of resampling methods in the statistical environment R (k-fold cross-validation, leave-one-out cross-validation).
11. Using tree-based methods in the R statistical environment (tree, gbm and randomForest packages).
12. Support vector machines in statistical environment R (e1071 package). Application of cross-validation to select optimal model parameters.
13. Application of the Hidden Markov model for the analysis of biological sequences in the R statistical environment.
14. Analysis of principal components in the statistical environment R (prcomp() function).
15. Application of clustering methods in the R statistical environment (hclust() and kmeans() functions). Choosing the optimal number of clusters.
|