Data transformation is key to effective machine learning. This post explores essential techniques in dealing with numerical data: normalization (linear, Z-score, log, clipping) and binning along with their pros and cons.
There are two common ways to encode categorical features. We can use Ordinal Encoding or we can use One Hot Encoding. Let's explore when you should use Ordinal encoding and when to use One hot encoding in this post.
I explore 4 methods used to classify text using the bag-of-words approach, we'll see the code and math required, and then I'll leave you with improvements we could make to these techniques in order to improve accuracy.