Machine Learning Concepts

Key Points

Introduction
  • Machine learning and artificial intelligence have recently come to mean similar things. Often machine learning is the task of using data to achieve a goal. Statistics is the related field focussed on gaining understanding from data.

  • Google Colab allows you to run Python code online.

Data
  • Good data is the key to success in machine learning

  • More data allows for more complicated machine learning.

  • You will (and should) spend most of your time checking and cleaning up data.

Classification
  • You should learn a pattern using the training data and then see if the pattern holds with the testing data

  • Scikit-learn has many different classifiers. You may need to test a few to see which is best for your dataset

Evaluation
  • A confusion matrix shows the counts of the true positives, false positives, true negatives and false negatives that the classifier gives.

  • Various statistics can be calculate from these four numbers. The statistic to use depends on what errors you want to minimize.

  • Further reading: Points of Significance: Classification Evaluation