In order to understand selection bias we must understand how particular machine learning algorithm works. In order to train the machine learning algorithm, we need a lot of data. If we can comprehensively collect all the data and then feed that into the machine learning algorithm there is nothing like that.
But, it is not always possible due to time and money constraint, hence we resort to sampling.
Sampling is a process of collecting a small proportion of the data set which we assume to have all the characteristics of the population we are interested to study. In the process of sampling certain discrepancies may creep in while selecting a particular data set from the whole and this is known as selection bias.
To elaborate simply, if you are asked to select 100 people from your town which has a population of about 1 lakh it is most likely that you will be recalling the names of people whom you know. This particular type of bias is known as selection bias and it often dilutes the characteristics of the training data set which is used to train the machine-learning algorithm and hence the learning process is compromised.
More robust the training data set, more is the probability of the machine learning algorithm to predict correct results with new data sets.
If you are interested to know more about Machine Learning, You can mail to smartsubu2020@gmail.com.