Multiple Regression

After taking a look at how our original model was for predicting the amount of loves, let’s try to improve our model to explain more of the variation in the amount of loves.

We started by originally using the difference between the value price and actual price, so I am going to add some additional factors.

I decided to add rating and the number of reviews to try to improve my model, which is each products star rating and the number of reviews each product has. The results of my model are shown below.

As we can see by adding these two variables we lose the significance of the intercept and the difference in value price. The rating and the number of reviews are both significant as shown with the three stars (***) to the right of the p value. My Intercept and slope have a p value greater than .05 (much greater) which means that there is a likelihood of mistakenly accepting these values as true. If all my coefficients were significant my model equation would look like this.

Loves = 401.97 + 13.98*Diff in Val Price + 1458.12*Rating + 35.56*Number of Reviews

The intercept would now represent the average amount of loves for products that have 0 in a difference in value price, 0 star rating, 0 number of reviews, and that average is 402.

Ratings has the highest impact on the amount of loves by increasing 1458 loves for each increase in ratings on average. Ratings only go 0 through 5, so you would not be able to input a number less than 0 or greater than 5.

The amount of loves also increases by 14 for every dollar increase in the difference in price on average. It isn’t that much of an increase based on the other variables.

The amount of loves increases by 36 for each person that reviews a product on average. This is a higher slope than difference in value price, but still lower than the slope of ratings.

Even though only two of my slopes are significant, my R-squared value has increased dramatically. It had gone from 0.01% to 55.78%. This shows that ratings and number of reviews contribute an additional 55.77% to the model. This R-squared value shows that the difference in price values, ratings, and number of reviews explain 55.78% of the variations in the amount of loves.

If all of my coefficients were significant I would say we were a lot closer to solving the problem, but since they are not I can’t confidently make that statement. The model equation and coefficients would be accurate if they were all significant.

Now since half the coefficients are not significant I am going to eliminate the insignificant predictor to try to get a correct model that I can use.

Now that I have eliminated the insignificant predictor I still run into the same problem. My Intercept is not significant. The coefficients did not change too much after removing the difference in price, but I still cannot accurately use this model. After trying multiple variations and variables I cannot seem to get the intercept significant. I think this is where human behavior comes into play.

Essentially I would have rather not dropped the difference in price because that is what I am looking for, but for the sake of trying to get all significant predictors I did (even though it did not change the significance of my intercept).

Since this is all based on website data and most of my variables are based off human behavior it can become very hard to predict which could be the cause as to why my intercept and slope are not significant, but my other two slopes were. The best I can do is to try to keep improving our model. Thanks for following along.

Leave a comment

Design a site like this with WordPress.com
Get started