Boston Analyze – crime incident reports july-2012-august-2015

Dataset Overview

The dataset comprises various columns, each providing specific details about crime incidents. The key columns we’ll focus on are:

  • COMPNOS
  • NatureCode
  • INCIDENT_TYPE_DESCRIPTION
  • MAIN_CRIMECODE
  • REPTDISTRICT
  • REPORTINGAREA
  • FROMDATE
  • WEAPONTYPE
  • Shooting
  • DOMESTIC
  • SHIFT
  • Year
  • Month
  • DAY_WEEK
  • UCRPART
  • X
  • Y
  • STREETNAME
  • XSTREETNAME
  • Location

Let’s start our exploration!


Exploratory Data Analysis (EDA)

  1. Incident Types and Crime Codes:
    • What are the most common incident types?
    • Which crime codes are prevalent?
  2. Distribution of Crime by District:
    • How does the number of crimes vary across different districts?
    • Are there districts with higher or lower crime rates?
  3. Time-based Analysis:
    • How has the overall crime rate changed over the years?
    • Is there a monthly or weekly pattern in crime incidents?

Insights from EDA

  1. Incident Types and Crime Codes:
    • The dataset provides a diverse range of incident types, from thefts to assaults.
    • Certain crime codes might be more common, indicating specific types of criminal activity.
  2. Distribution of Crime by District:
    • Some districts may experience higher crime rates than others.
    • Understanding the variations can help allocate resources effectively.
  3. Time-based Analysis:
    • Over the years, there might be trends or fluctuations in crime rates.
    • Analyzing monthly and weekly patterns can reveal when certain types of crimes are more likely to occur.

Questions for Further Exploration

  1. Spatial Analysis:
    • Are there specific locations or streets with consistently high crime rates?
    • How do crime rates vary between main streets and cross streets?
  2. Weapon Involvement:
    • What types of weapons are most commonly involved in crimes?
    • Is there a correlation between weapon type and the severity of incidents?
  3. Domestic Incidents:
    • How prevalent are domestic incidents, and do they follow any patterns?
    • Are there specific shifts or days of the week when domestic incidents are more likely?

Having established the presence of significant age disparities, I proceeded with post-hoc analyses to pinpoint specific differences between racial groups. Employing Tukey’s Honestly Significant Difference (HSD) test, I identified the pairs of races with the most notable age differences. This step was vital for a nuanced understanding of the dynamics at play within the dataset.

The outcomes of this analysis offer crucial insights into the complexities of racial disparities in police shootings. Acknowledging the presence of age differences among various racial groups is pivotal for shaping informed discussions and policy interventions.

When I conducted the ANOVA test on the dataset, it helped me determine whether there were any significant differences among the means of the groups. ANOVA provided a broad understanding, indicating that there were differences in at least one pair of groups’ means.

To delve deeper and identify the specific groups with different means, I employed Tukey’s Honestly Significant Difference (HSD) test. Unlike ANOVA, Tukey’s HSD is a post hoc test tailored to be used after ANOVA. It enabled me to pinpoint the exact groups that were driving the significant difference revealed by ANOVA.

In simpler terms, ANOVA acted as a preliminary indicator, suggesting the presence of differences, while Tukey’s HSD stepped in to provide detailed insights. It answered the crucial question: which particular groups within the dataset exhibited distinct means?

Having established the presence of significant age disparities, I proceeded with post-hoc analyses to pinpoint specific differences between racial groups. Employing Tukey’s Honestly Significant Difference (HSD) test, I identified the pairs of races with the most notable age differences. This step was vital for a nuanced understanding of the dynamics at play within the dataset.

The outcomes of this analysis offer crucial insights into the complexities of racial disparities in police shootings. Acknowledging the presence of age differences among various racial groups is pivotal for shaping informed discussions and policy interventions.

Analysis of Variance (ANOVA): 

Moving beyond basic descriptions, I conducted Analysis of Variance (ANOVA) test to investigate potential age differences among racial groups. This powerful statistical tool allowed me to compare means across multiple groups simultaneously.

The results of the ANOVA test indicated that there is a significant difference in ages among racial groups. The F-statistic of 103.50 and a very low p-value (approximately 6.13e-44) suggest that the differences in ages between these groups are unlikely to have occurred by random chance.

In other words, you have evidence to reject the null hypothesis, which means that there are statistically significant differences in ages among the racial groups you tested. This finding can be important for further analyses and discussions regarding disparities or variations in ages within different racial categories.

The outcomes of this analysis offer crucial insights into the complexities of racial disparities in police shootings. Acknowledging the presence of age differences among various racial groups is pivotal for shaping informed discussions and policy interventions.

Having established the presence of significant age disparities, I will proceed with post-hoc analyses to pinpoint specific differences between racial groups.

The Mean Square Difference (MSD) I calculated from the probabilities of age and race in the dataset provided a quantitative measure of disparity in the experiences of Black and White individuals with respect to police shootings.

I performed a Monte Carlo simulation to test the mean square difference (MSD) calculated from our dataset against random samples. First, I defined the MSD value that we calculated from our data. Then, I conducted 10,000 simulations where I randomly shuffled the age and race data. For each simulation, I calculated the probabilities for Black and White races similar to how we did with our original data.

In each simulation, I computed the ratio of probabilities for Black and White races and then calculated the MSD using this ratio. This process allowed me to create a distribution of MSD values based on random chance.

After running the simulations, I compared the MSD value we calculated from our dataset with the distribution of MSD values from the random samples. The comparison was done by computing a p-value, which represents the probability of observing an MSD value as extreme as ours, or even more extreme, under the null hypothesis that there is no difference between races.