How to interpret a LinearRegression model - Data Science
For each increase of SOMETHING, we can expect the price is increased by XXX holding all other features consistent
All I needed to know is LinearRegression().fit(X,y).coef_
to fill out above XXX
. You can get coefficient of the trained LinearRegression
model following steps and the coefficient is the XXX
.
Summary
- Objective: Interpret
LinearRegression
model - Assemble a model: Made with orange juice price, season and weight
- Interpretation: How the features affect the price?
Objective
I would like to know what an interpretation of LinearRegression
is like. This model was made with price, season and weight of orange juice. I set the price as a target value, and the others as predictors so I could know how the season and weight determine the price of orange juice. Here, assume that season is when the orange juices were sold.
Assemble a model
I used orange_juice.csv
which contains price and weight of the orange juices and also season when they were sold.
How to obtain the dataset
You can create this dataset with below code Or you can download the code and csv at my Github
orange_juice.csv
Result
Assemble LinearRegression
model with below code
And this is the coefficients and respective columns.
Interpretation
As you can see above result, you obtained a coefficient for each column.
Let’s look at weight
. The coefficient is 0.014. This means that for each increase of weight
unit, the price of orange juice increases by 0.014.
More instinctive, this orange juice costs $0.014 per weight. 100g of this juice costs $1.4.
However, we have to add one condition because this orange juice price is affected by season as well. So we can say that if other features are consistent, this orange juice price costs $0.014 per its weight.
You found this orange juice in Summer. It was 100g and $1. If you find 200g of this orange juice in Summer, it would cost $2.4. However, If you find 200g of the juice in Fall, the price would be different
Now let’s look at season
coefficients. We can see 3.05 in season_Summer
. If we interpret it as we did for weight
, it would be like, for each increase of unit in Summer, the price would be increased by $3.05. What does this mean? If you buy in Summer, the price is $3.05 more than any other season? This is not true. Because if so, the price in other seasons must be same. But it is not.
Now let’s look back the table of result. Do you see one season is missing? You do not see season_Fall
because I set drop_first=True
when I assembled the model. This means season_Fall
is set as base for this interpretation. So we can interpret like below.
The price of orange juice in Summer is $3.05 higher than its price in Fall
We need to add one condition for this case as well so in the other words,
The price of orange juice in Summer is $3.05 higher than its price in Fall holding other features consistent
Other feature means weight
for this case. If weight
is same, you pay $3.05 more compared to when you pay in Fall.
I would like to fill the very begging statement.
For each increase of weight unit, we can expect the price is increased by $0.014 holding all other features consistent
Also
We can expect price of the orange juice in Summer is $3.05 higher than its price in Fall holding all other features consistent