# Underfitting and Overfitting

Underfitting and Overfitting – We’ll be seeing what is model overfitting and underfitting in decision trees. so we have already seen what are decision trees and what is decision tree capable of in doing machine learning. So today let us understand the two most terms of machine learning or which are associated with machine learning, particularly with decision trees. That is we have the underfitting and overfitting.

So let’s get started. So basically what is meant by fitting first of all so fitting is to fit a model. Which means you mean to fit a curve? So a curve can be something like this or it can be like this or it can be some other random shape. So you need to fit a curve. So this is represented in two Dimension as y-axis and x-axis and you have to plot this curve somewhere like this and it will be in this now what happens is that you have your data set.

So it’s divided into a number of rows and number of columns and you partition in do something. So for example, 67% and they have thirty-three per cent. So this will be our test set this will be our training set and this part you train used to create your model or used to develop your model. And this part you just used to validate your developed model.

So in this process, you will get some kind of error so that kind of percentage error you market and you obtain and you keep it aside and you’re also you’ll get some percentage ever. So these are two errors which you do when you train your model so you have some training error as well as you have some test error. Now what underfitting means is that when you are training errors and test errors? Both are large Since it’s very high, it’s not unacceptable in that case you call it as underfitting means this majorly happens when you are developed, model.

The developed classification model is made very simple. So in this case under fitting happens, so underfitting. So underfitting is mainly detected with the help of bias detection. So various techniques are there in order to determine the underfitting so say, for example, you have something like this. So here what you have on the y-axis is just plot the percentage error, which you have from this part that is training or test set and on the x-axis what you have is basically the number of Instances Which is used for building your model so underfitting curve say, for example, it’s like this and you have some instances say we have some labels like this.

I’m some labels like this. So some part of it may fall in this category and some part may fall above the sign. So this is not well fitted or this curve is not will fit into this particular graph. So this is mainly the description of the goodness of a fit. Means how well the given observations are given samples are being fit into that particular.

So this is the case of underfitting now, what do you mean by overfitting now, you have this kind of model. So you build your model and you have your training going very well. So if the training errors are very good, but the test errors are very large. So if the training errors are less but test error is large. Is this what happens is you have the problem of overfitting? So overfitting is mainly detected.

With the help of Variance So you have these test for detecting the overfitting of the curve so the curve say for example. We have these similar curve and you have percentage F on the y-axis and on the x-axis you have the number of instances. So say, for example, you have some instances like this. So we need to classify it properly. So what overfitting does this your Curve will be something like this? So this is not accepted curve so overfitting curve would look something like this.

So this is not the case. So you have to like properly fit the data are properly fit the observations and instances. So perfect fit or an accurate, fit would be something like this. So you have some instances like this say some instances like this if your curve is something like this, which well classifies which instances are where.

And which instances are on the other side then this would be a perfect fit. And this mainly happens due to bad performance. Of your developed model So if your model is not will good, or if it does not performs very good on the tests and training errors. Then you may have the problem of overfitting as well as underfitting.