September 22

On three datasets related to diabetes, obesity, and inactivity, I conducted a correlation study. The analysis showed a significant link among all three datasets, with the FIPS code acting as the tying element. I combined these three datasets into a single Excel spreadsheet in order to conduct a more thorough analysis after realizing the necessity for one.


I created a piece of code to integrate the three datasets, and it revealed 356 shared data points. The Excel file was then cleaned up, which entailed dealing with duplicate columns including data on the county, state, and year. I eliminated these columns to make the data more readable. I also increased the readability of the dataset by renaming particular columns and changing column widths to aid with data visualization.


Additionally, I am eager to defend the facts and ensure that the information is accurate. I am examining many tests, like the T-test and Bruesch-Pagan Test, to achieve this. Even though I have a preliminary T-test code, I have not yet tested it on the dataset.

September 20


I learnt about the t-test in the class and later i went through various tutorial and found out i can use this in my model to calculate the means of %diabetes and %inactivity with %diabetes and %obesity of different county. I am planning to go with the county wise projection as i have mentioned in my previous blog. Will be updating my result once I successfully implement it.

I have discussed the questions with the team and we have finalized few questions that we will be working on.

September 18


I have learnt about the linear regression using two predictor variables, interaction terms and quadratic terms in the class. I have also learnt about the  Bayesian localized conditional autoregressive model. Bayesian Conditional Autoregressive (CAR) model is a disease mapping method that is commonly used for smoothening the relative risk of any disease. I am thinking to use this model in the project to analyse the county-wise data.

I have came across few questions while going through the datasheet and will be discussing with the team.

September 15


I have been working on finding the correlation between the CDC datasets and i have decided to go county wise comparision in the datasets. I tried to apply the bruesh pagan test to the CDC datasets. Compute the residual and form a linear regression line and then calculate the R-squared value.

Later I had discussed the p-value with my team to see if they got the same or approximately similar value. The p-value i got is very small and thus the model is heteroscedastic.


September 13, 2023

Hi, I have gone through the topics like Mean squared error, heteroskedasticity, p-value and Breusch- Pagan test. I tried to find the correlation between the % diabetes and % obesity and % inactivity and on analysis I decided to keep diabetes as an dependent variable and other two as independent variable which will help me understand the residual and scattered plot.

I will be going through the resources provided in the course structure to understand the topics better and apply the knowledge in the CDC datasets. In the last week, I was working with my own datasets and I was able to plot the regression line. Now I am going forward with the CDC datasets and will be using Breusch Pagan test on it.