Topics covered in the lecture :

- p-value: which are significant in statistics. Here is the key concept: When we assume that there is no actual difference or relationship (also known as the “null hypothesis”), a p-value helps us determine how likely we will observe outcomes similar to those we have. In conclusion, p-values are crucial for assessing whether the data includes strong evidence against the null hypothesis. A high p-value denotes inadequate evidence to reject the null hypothesis and suggests the potential of chance rather than a significant effect. A low p-value (usually below 0.05) indicates strong evidence against the null hypothesis.

- Breusch pagan test : In order to find heteroscedasticity, the Breusch-Pagan test comes to our aid. It evaluates whether one or more independent variables in our regression model are related to the variance of the residuals. A major outcome from this test raises the possibility that our data may be heteroscedastic, which calls for more research.

CDC Dataset :

1.Multiple Regression :

Conducted a multiple regression analysis where the target variable was the percentage of diabetes, with the predictor variables being the percentages of inactivity and obesity.

Created several plots to visualize the results using seaborn library

A residual plot to help check for the assumption of constant variance and identify any patterns or outliers in the residuals.

Regression plots to visualize the relationships between individual predictor variables and the target variable while holding the other variables constant

code snippet :

#For Regression plots

import matplotlib.pyplot as plt

import seaborn as sns

sns.regplot(x=merged_df[‘% DIABETIC’], y=y, scatter_kws={‘alpha’:0.3}, color=’green’, label=’%Diabetic’)

sns.regplot(x=merged_df[‘% INACTIVE’], y=y, scatter_kws={‘alpha’:0.3}, color=’blue’, label=’inactivity’)

sns.regplot(x=merged_df[‘% OBESE’], y=y, scatter_kws={‘alpha’:0.3}, color=’red’, label=’obesity’)

plt.xlabel(‘Predictor Variables’)

plt.ylabel(‘Target Variable’)

plt.title(‘Regression Plots’)

plt.legend()

plt.show()

corr_matrix = merged_df[[‘% DIABETIC’, ‘% INACTIVE’, ‘% OBESE’]].corr()

# For heatmap of the correlation matrix

sns.heatmap(corr_matrix, annot=True, cmap=’coolwarm’, linewidths=0.5)

plt.title(‘Correlation Matrix Heatmap’)

plt.show()

Heatmap to visualize the correlation matrix between predictor variables

2.Breusch pagan test : performed breusch pagan test on multiple regression using python