Introduction
|
Machine learning and artificial intelligence have recently come to mean similar things. Often machine learning is the task of using data to achieve a goal. Statistics is the related field focussed on gaining understanding from data.
Google Colab allows you to run Python code online.
|
Data
|
Good data is the key to success in machine learning
More data allows for more complicated machine learning.
You will (and should) spend most of your time checking and cleaning up data.
|
Classification
|
You should learn a pattern using the training data and then see if the pattern holds with the testing data
Scikit-learn has many different classifiers. You may need to test a few to see which is best for your dataset
|
Evaluation
|
A confusion matrix shows the counts of the true positives, false positives, true negatives and false negatives that the classifier gives.
Various statistics can be calculate from these four numbers. The statistic to use depends on what errors you want to minimize.
Further reading: Points of Significance: Classification Evaluation
|
The following is an overview of a standard Unix filesystem.
The exact hierarchy depends on the platform,
so you may not see exactly the same files/directories on your computer: