RaukR 2018 Workshop - Machine Learning

0.1 What is machine learning?

Machine learning maps input X to output Y as:

\[ Y = f(X) \]

Machine learning provides two major things:
- Predicion
- Feature selection
Categorized into:
- Parametric
- Non-parametric
Unsupervised learning
- Data-driven approach
Supervised learning
- Hypothesis-driven approach
Which features are characterisic for the type of cells etc. you want to predict?

0.1.1 Main steps of machine learning

Clean the data: correct, normalize, standardize etc.
Identify features in the data (deep learning skips this step, it builds its own features)
Machine learning model is fitted on the training and evaluated on an independent subset

0.1.2 How does machine learning work?

Five steps:
- Split data set into train, validation and test subsets
  - Randomly assign 70 % to training and 30 % to test (approx.)
- Fit model in the train subset
- Validate model on validation subset
- Repeat steps 1-3 a number of times
- Test the accuracy of the optimized model on test subset

0.1.3 What is a hyperparameter?

Machine learning design parameters which are set before the learning process starts
- E.g. the number of covariates to adjust the main variable x of interest for

0.2 Random Forest

Bases predictions on TRUE/FALSE trees
Makes predictions based on the information given by iterating through the tree

0.3 What is Deep Learning?

Artificial neural networks with multiple layers