Time Based Splitting of Data

In a Machine Learning algorithm we can split the given dataset into training and test data. We can either split randomly or use time based splitting.

For time based splitting we need a timestamp as one of the attributes / features.

Like in case of e-commerce website we can have reviews for various products. These reviews can have timestamps also. In such scenarios it’s better to use time-based strategy.

To do this we can first sort the reviews using timestamp and then do the split.

  1. Sort data by time.
  2. Split – Training (80%) and Testing(20%)

This approach can give better accuracy. Since the testing data will be more recent and hence better prediction.

Author: Ankur

An Engineer by choice

Leave a Reply

Your email address will not be published. Required fields are marked *