Today in class :
t-test – statistical analysis technique that compares the means of two groups to ascertain whether discrepancies between them are more likely to be the result of chance than randomness. It’s a tool that aids in determining whether a difference between two sets of data is meaningful and statistically significant.
Analyzed data that lists pairs (post-molt, pre-molt) where “post-molt” is the size of a crab’s shell after molting, and “pre-molt” is the size of a crab’s shell before molting.
Sine there are situations where the assumptions of a t-test may not hold, particularly when dealing with highly non-normally distributed data or data with other violations of t-test assumptions (such as equal variance). To address these issues and obtain a more robust estimate of the p-value, we can sometimes turn to Monte Carlo procedures. Monte Carlo procedures offer a powerful alternative for estimating p-values.
Following up on my previous post
Since the results for the log transformation of the dependent variable (%DIABETIC) did not result in any significant rise in R Squared value.
I tried to apply a quadratic (or polynomial) regression model to log-transformed data.
- I defined the feature matrix X, including the predictor variables % INACTIVE, % OBESE. The log-transformed target variable is stored in y.
- Created a ‘PolynomialFeatures’ object with a degree of 2, which indicates that we want to generate quadratic (second-degree) polynomial features.
- Used the fit_transform method to transform the original features in X to include quadratic terms. This creates a new feature matrix X_poly with the original features and their quadratic combinations
- Created a linear regression model (LinearRegression) and fit it to the transformed feature matrix X_poly
- Calculated the R squared value .
R Squared value = 0.43
This improvement in R-squared indicates that the quadratic model is explaining more of the variance in the data and is likely capturing the underlying relationships more accurately.