Advanced Mathematical Statistics

With reference to my previous findings the substantial rise in the R-squared value, transitioning from 0.35 (in the case of the simple linear model) to 0.43 (with the quadratic model on log-transformed data), indicates a noteworthy improvement in how well the quadratic model fits the data in comparison to the straightforward linear model.

I plotted scatterplots to visually explore and understand the relationship between the original predictor variables (% INACTIVE and % OBESE) and the log-transformed target variable (% DIABETIC) in the dataset.

The scatter points represent individual data points, with one axis showing the values of the predictor variable (% INACTIVE or % OBESE) and the other axis showing the log-transformed values of the target variable (% DIABETIC). Each point corresponds to a data observation.

The red and green lines overlaid on the scatter plots represent the predictions made by a quadratic regression model. These lines indicate how well the model fits the data. If the model fits well, the red and green lines should follow the data points’ general trend.

By having two separate graphs (one for each predictor variable), you can compare the relationships between the predictors and the log-transformed target variable. We can visually assess which predictor appears to have a stronger relationship with the target variable.

While the R-squared value has improved, it’s essential to evaluate the model further. I will consider using techniques such as cross-validation to assess the model’s performance on unseen data. I will also try to compare different polynomial degrees (e.g., cubic or higher) to see if a higher-degree polynomial provides a better fit.

Leave a Reply Cancel reply