With great pleasure, I offer this thorough report that provides a thoughtful and nuanced analysis of incidents involving police shootings. Our goal is to provide light on trends, demographic differences, and predictive insights by carefully examining the data and applying statistical testing and machine learning techniques. This report contributes to a better understanding of this important issue by demonstrating a committed effort to disentangle the complexities surrounding police shootings.
Report_2_MTH522 (1)Category: Uncategorized
Week 9 Wednesday
In today’s blog, I have gone through the random forest technique for the analysis of datasets. The Random Forest ensemble learning algorithm is widely used in data analysis, machine learning, and statistics. It is especially effective for classification and regression tasks. An ensemble learning method combines multiple models’ predictions to improve overall accuracy and robustness.
The Random forest works in a different approaches like decision tree, ensemble learning, and voting and aggregation. The key advantages of random forest are:
Reduced Overfitting: Overfitting is reduced as a result of the randomness introduced during the tree-building process, making the model more robust and generalizable.
Random Forest gives a measure of the importance of each feature in making accurate predictions. This can help with feature selection.
High Accuracy: When compared to a single decision tree, Random Forest frequently produces high-accuracy models.
Handle Missing Values: Random Forest is robust to outliers and can handle missing values in the dataset.
I will implement this technique in my work so that it can be understandable the structure of decision tree and provide the insights of the data for the police shooting.
Week 9, Monday
Today I have learnt about common approaches to statistics:
- Probability is defined by Frequentist Statistics (FS) as the long-run frequency of events in a repeated, hypothetical infinite sequence of trials. It is founded on the concept of objective randomness.
Probability is viewed as a measure of belief or uncertainty in Bayesian statistics (BS). Using Bayes’ theorem, it incorporates prior beliefs and updates them based on new evidence. - Parameter Estimation:
FS: The emphasis is on estimating unknown fixed parameters from observed data. Maximum likelihood estimation (MLE) is used for this estimation.
BS: Bayesian inference generates a probability distribution for parameters by incorporating prior knowledge and updating it with observed data to produce a posterior distribution. - Testing hypotheses:
FS: Frequentist hypothesis testing entails making population parameter decisions based on sample data. The level of significance is frequently determined using p-values.
BS: Bayesian hypothesis testing compares the probabilities of various hypotheses given data. It makes decisions based on posterior probabilities and Bayes factors.
To account for this prior knowledge, I used a Bayesian t-test strategy because I am confident that the difference in average ages is approximately 7 and statistically significant. The results, however, revealed an intriguing discrepancy: the observed difference was located near the tail of the posterior distribution. This disparity bothered me, as it demonstrated how sensitive Bayesian analysis is to previous specifications.
I will be posting my results in next blog.
Week 8, Friday
Welch’s ANOVA is an alternative to conventional ANOVA that does not require equal variances. It may be more resilient when there is variable variance.
Welch’s ANOVA is designed to be resilient when the premise of equal variances is violated. It achieves this robustness by modifying the conventional F-statistic to account for unequal variances. The Welch F-statistic, which accounts for various group variances, is computed using the mean squares ratio.
The Levene’s test revealed significant differences in variances between ethnic groups in my earlier ANOVA analysis, casting doubt on the fundamental premise of equal variances in conventional ANOVA. I chose Welch’s ANOVA as a replacement because I understood how its distinct design addressed different variances by varying the F-statistic and degrees of freedom for greater precision. The Shapiro-Wilk tests, however, revealed that the age distribution was not normal in the majority of racial groups. I recognise that large deviations from normalcy can have an impact on results, even though Welch’s ANOVA can handle minor deviations. The severity of these violations, as well as the specifics of my dataset, will determine whether I use Welch’s ANOVA or investigate other options. In some cases, I might consider using non-parametric tests when there is extreme non-normality or small sample numbers.
Week 8, Wednesday
ANOVA is a statistical method for analysing differences in group means in a sample. Can the data from the Washington Post assist me in determining whether there are any significant differences in the ages of people shot across different races?
Null Hypothesis (H0): There is no significant difference in age means across races.
(In the frequentist interpretation, a small p-value indicates that the observed data is unlikely to have occurred by chance, leading to the rejection of the null hypothesis. It is called a ‘frequentist’ theory because it views probabilities as the frequency of events occurring over repeated experiments. It contrasts with the Bayesian approach, in which probabilities can also represent degrees of uncertainty.)
Alternative Hypothesis (H1): Significant difference in the means of age across different races.
Let’s do the Shapiro-Wilk test for normality and Levene’s test for Homogeneity of Variances.
The lack of variance homogeneity and the absence of normality in any major group raises concerns about the robustness of the ANOVA results. Out of curiosity, I continued with ANOVA and obtained the following results.
Given the assumptions violations, I must interpret these results with caution. While the results indicate significant differences, the reliability is questionable, despite the fact that I intuitively believe them after looking at the data for a long time.
I’m thinking about doing more analyses, like Welch’s ANOVA or non-parametric tests, to see if the results are consistent across methods.
Week 8, Monday
Heat Map
As part of my investigation, I focused on the “armed” and “race” columns when creating a Python heatmap for the police-shootings dataset. We visualised the distribution of racial groups by armed status using matplotlib, seaborn, and pandas. I refined the heatmap so that it only displayed the values “gun,” “knife,” and “unarmed.” The resulting red-colored chart provided a clear explanation of these specific armed statuses and racial populations. This improved my data visualisation skills by helping me understand how heatmaps can be tailored to extract relevant information from large, complex datasets.
Week 7, Wednesday
Hierarchical Clustering
Hierarchical Clustering produces clusters that are strikingly similar to K-means. In fact, the outcome can sometimes be identical to k-means clustering. However, the entire procedure is distinct. There are two types: agglomerative and divisive. Agglomerative is a bottom-up approach, while Divisive is the inverse. Today, I concentrated primarily on the Agglomerative approach.
Step 1: Make each data point into a single point cluster, resulting in N clusters.
Step 2: Combine the two closest data points into one cluster. This produces N-1 clusters.
Step 3: Combine the two closest clusters into one. This results in the formation of N-2 clusters.
Step 4: Repeat Step 3 until there is only one large cluster.
Step 5: Finish.
Cluster closeness differs from data point closeness, which can be measured using techniques such as the euclidean distance between the points.
I learned about dendrograms, which have a vertical axis that shows the Euclidean distance between two points and a horizontal axis that shows the data points. As a result, the higher the lines, the more dissimilar the clusters. We can set dissimilarity thresholds, and the largest clusters below the thresholds are what we need based on the number of lines cut in our dendrogram by the threshold.
Week 7, Monday
Monte Carlo Simulation
Today I discovered what is known as a Monte Carlo Simulation. Monte Carlo Simulation is a mathematical approach used to approximate the potential outcomes of uncertain events, thereby improving decision-making processes.
The mechanism entails building a model that assesses the likelihood of various outcomes in a system with unpredictability due to the intervention of random variables. Using random sampling, the technique generates a large number of potential outcomes and computes the average.
A three-step process is used to start a Monte Carlo Simulation:
- Predictive Model Development: Define the dependent variable to be predicted and identify the independent variables.
- Probability Distribution of Independent Variables: Using historical data, define a range of plausible values for independent variables and assign weights to them.
- Iterative Simulation Runs: Run simulations iteratively by generating random values for independent variables until a representative sample with a large number of potential combinations is obtained.
The frequency of sampling directly proportions to the precision of the sampling range and the accuracy of estimations. In essence, a greater number of samples results in a more refined sampling range, which improves estimation accuracy.
Week 6, Friday
I examined the data to determine the pattern of police shootings in various states across the United States. In addition, I am looking for a data processing method that will assist me in identifying missing values or outliers by imputing them with cluster-specific statistics. I primarily uncover underlying structures in data using the clustering technique, which aids in data reduction, supports segmentation, classification, and anomaly detection, and discovers applications in a wide range of domains.
I have came across two other clustering methods:
The mode-seeking algorithm is Mean Shift Clustering. Mean Shift finds cluster centres by shifting each data point in the direction of the density function mode repeatedly. It is useful for identifying patterns in data distribution.
Spectral clustering involves projecting data points into a lower-dimensional space and clustering them using the eigenvalues of a similarity matrix. It is useful for non-convex clusters.
Week 6, Wednesday
Today I learned how and when to use DBSCAN (Denisty-Based Spatial Clustering for Applications with Noise). It is a clustering method that identifies clusters of data points in a space based on their density. It does not require us to specify the number of clusters in advance, like in k-means. It can find groups of arbitrary forms.
According to DBSCAN, a cluster is a dense region of data points separated by sparser regions. It divides points into three types: core, boundary, and noise.
A core point has a minimal number of neighbouring points within a given distance or epsilon. A border point has fewer neighbours than the min_samples yet is in the neighbourhood of a core point. The remaining points are noise and do not belong to any cluster. Dbscan chooses a data point at random first, and if it is a core point, it creates a new cluster with all of its neighbours who are reachable from this cluster. These neighbours could be either core or border points. Repeat for neighbours, adding their reachable neighbours to the cluster. Continue until there are no more points to add to the cluster. Now return to an unvisited point and repeat the process. One of its main advantages is that it is resistant to outliers.