import numpy as np
import pandas as pd
Adventures in Data: Homicide / Chapter 1: Data
Version requirements
The project requires Python > 3.7.
Introduction
Homicide is one of those things that’s so far removed from most of our lives in Western countries that it’s easy to sensationalize it to an absurd degree. Just look at all the detective shows (which I admit I enjoy) or serial killer docu-series available now. Nevertheless, the act of taking a human life is so grotesque that there has to be some rhyme or reason to it (or so one hopes).
The practical implications of that existence (again, hopefully) of motive are in the actual solving of the case and arresting the perpetrators, and this is the thing I was originally interested in.
I used two datasets of homicide case data. The first was compiled by a team at The Washington Post [1], covering 52,000 hommicides in 50 of the largest cities in the United States, from January 1, 2007 to December 31, 2017. The accompanying story [2] focuses on the causes and impacts of unsolved homicides and the regions with low homicide arrest rates. The second is from the Murder Accountability Project [3], dowloaded September 19, 2023, which covers over ten times as many homicides, from 1985 to the present.
The hope here was that combining the two datasets would lead to a clearer picture of homicide case closes and arrests than either dataset could provide individually. During the data prepping, however, I quickly encountered a challenge that piqued my interest; so, I followed my curiosity, and expanded this would-be small project into a series.
Data provenance
The first, more detailed dataset was compiled by a team at The Washington Post [1] using various sources, complete and partial. To avoid typing out the full name, I often refer to this dataset as simply the “Post” dataset throught the series.
The second dataset was compiled by the Murder Accountability Project (MAP) [3], using data from official FBI records as well as other sources accessed using the Freedom of Information Act. The Murder Accountability Project claims to have the most complete repository on US homicide data available anywhere, so I have high hopes for its utility.
Preliminary imports
In this first chapter I’m only interested in exploring the “structural” characteristics of the two datasets, so we won’t need anything fancy. numpy
will supply some NaN values we can use to replace null data and pandas
of course will be used to place the data into a DataFrame which can be manipulated and explored.
Import data
The data files were too large to upload to GitHub (I don’t have Large File Storage), so I hope that just specifying where I acquired the data (above) will suffice.
Originally the Washington Post data was downloaded as a CSV, but importing it as a CSV directly caused issues so I opened it in Excel and saved it in the XLSX format, which imports without issue.
Additionally, the dtype
argument specifies that the data should be imported as the object
type, which allows for all types of values in all columns.
The data columns will be formatted after some initial exploration.
= pd.read_excel("./data/homicide-data.xlsx", dtype=object) post_df
= pd.read_csv("./data/SHR76_21.csv", dtype=object) map_df
We will also enable an option to see all the columns of a DataFrame:
"display.max_columns", None) pd.set_option(
Data quality checks and preprocessing
Washington Post data
Data value exploration
post_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52179 entries, 0 to 52178
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 uid 52179 non-null object
1 reported_date 52179 non-null object
2 victim_last 52178 non-null object
3 victim_first 52179 non-null object
4 victim_race 52179 non-null object
5 victim_age 52179 non-null object
6 victim_sex 52179 non-null object
7 city 52179 non-null object
8 state 52179 non-null object
9 lat 52119 non-null object
10 lon 52119 non-null object
11 disposition 52179 non-null object
dtypes: object(12)
memory usage: 4.8+ MB
Looks like there are just over 52,000 entries.
A good way to see what values exist in a dataset, especially one that’s been imported with the object
dtype
, is to examine random data samples. I prefer this to the usual method of calling the head()
method on the DataFrame because it gets a range of values that may not be fully covered in the first few rows of the dataset.
10) post_df.sample(
uid | reported_date | victim_last | victim_first | victim_race | victim_age | victim_sex | city | state | lat | lon | disposition | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
25146 | Jac-000736 | 20130331 | HICKS | WILLIAM | Black | 32 | Male | Jacksonville | FL | 30.44433 | -81.658364 | Closed by arrest |
9025 | Chi-005085 | 20101111 | AUGUST | DENNIS C | Black | 25 | Male | Chicago | IL | 41.763618 | -87.56538 | Open/No arrest |
5768 | Bos-000367 | 20091223 | CORRAO | JULIENNE | White | 49 | Female | Boston | MA | 42.377336 | -71.036022 | Open/No arrest |
37938 | NY-000538 | 20170920 | Unknown | Unknown | Black | 58 | Male | New York | NY | 40.798964 | -73.950314 | Open/No arrest |
15885 | Dal-002402 | 20151222 | Unknown | Unknown | Unknown | Unknown | Unknown | Dallas | TX | 32.71088 | -96.69997 | Closed by arrest |
27040 | Las-000172 | 20080609 | RIMER | JASON | White | 4 | Male | Las Vegas | NV | 36.146227 | -115.066844 | Closed by arrest |
37758 | NY-000358 | 20170129 | Unknown | Unknown | Asian | 33 | Male | New York | NY | 40.74773 | -73.986839 | Closed by arrest |
51439 | Was-000630 | 20100528 | HAGOS | ASSEFAW | Black | 41 | Male | Washington | DC | 38.916622 | -77.024275 | Closed by arrest |
17006 | Det-000591 | 20150902 | SAVORS | ED | Black | 30 | Male | Detroit | MI | 42.37608 | -83.199761 | Closed by arrest |
3871 | Bal-700036 | 20170203 | CHADWICK | TONJA | Black | 20 | Female | Baltimore | MD | 39.329098 | -76.567193 | Closed by arrest |
Note: It’s important not to make the sample size too large or run the sample()
method too many times if you intend to do any kind of hypothesis testing. Ideally, one would be able to access mutiple datasets from the same source, perform data analysis on one and generate hypotheses to test, and use the others for testing those hypotheses, with one dataset per hypothesis, to avoid the multiple-comparisons problem. Fortunately, this project will not require any hypothesis testing, so we can get as familiar with the data as we want. We will split it into train and test sections in the future, however, to build and tune machine-learning models.
Conveniently, it looks likely that null values are specified as Unknown
all throughout the dataset.
Cleaning and transformation
This function was written iteratively; I didn’t know ahead of time which columns would transform easily into another dtype
, so it took some trial and error.
Note that it adds some columns, to make analysis easier. For instance, the numeric date
column is used to derive month
, weekday
, and delta_days
(number of days between the incident report date and the latest date) columns.
We’ll have to import the datetime
package to gain access to datetime processing and transformation functions, and then we can go about cleaning the data:
import datetime as dt
def transform_post_dataframe(data):
# Avoid accidentally overriding the original
= data.copy()
transformed_df
"reported_date"] = pd.to_datetime(
transformed_df["reported_date"].map(
transformed_df[lambda d: f"{str(d)[:4]}-{str(d)[4:6]}-{str(d)[-2:]}"
)
)= transformed_df["reported_date"].max()
max_date "year"] = transformed_df["reported_date"].dt.year.astype(int)
transformed_df["month"] = (
transformed_df["reported_date"].dt.month_name().astype("category")
transformed_df[
)"weekday"] = (
transformed_df["reported_date"].dt.day_name().astype("category")
transformed_df[
)# How long has it been since the case was opened?
"delta_days"] = (
transformed_df[- transformed_df["reported_date"]).dt.total_seconds() / (60 * 60 * 24)
(max_date int)
).astype(
"victim_race"] = transformed_df["victim_race"].astype("category")
transformed_df["victim_age"] = (
transformed_df["victim_age"]
transformed_df[map(lambda s: np.nan if not str(s).isnumeric() else int(s))
.
.astype(pd.Int8Dtype())
)"victim_sex"] = transformed_df["victim_sex"].astype("category")
transformed_df["city"] = transformed_df["city"].astype("category")
transformed_df["state"] = transformed_df["state"].astype("category")
transformed_df[
"lat"] = transformed_df["lat"].astype(float)
transformed_df["lon"] = transformed_df["lon"].astype(float)
transformed_df[
"disposition"] = transformed_df["disposition"].astype("category")
transformed_df[
return transformed_df
= transform_post_dataframe(post_df) post_data
post_data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 52179 entries, 0 to 52178
Data columns (total 16 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 uid 52179 non-null object
1 reported_date 52179 non-null datetime64[ns]
2 victim_last 52178 non-null object
3 victim_first 52179 non-null object
4 victim_race 52179 non-null category
5 victim_age 49180 non-null Int8
6 victim_sex 52179 non-null category
7 city 52179 non-null category
8 state 52179 non-null category
9 lat 52119 non-null float64
10 lon 52119 non-null float64
11 disposition 52179 non-null category
12 year 52179 non-null int32
13 month 52179 non-null category
14 weekday 52179 non-null category
15 delta_days 52179 non-null int32
dtypes: Int8(1), category(7), datetime64[ns](1), float64(2), int32(2), object(3)
memory usage: 3.2+ MB
=(16, 9)) post_data.hist(figsize
array([[<Axes: title={'center': 'reported_date'}>,
<Axes: title={'center': 'victim_age'}>],
[<Axes: title={'center': 'lat'}>, <Axes: title={'center': 'lon'}>],
[<Axes: title={'center': 'year'}>,
<Axes: title={'center': 'delta_days'}>]], dtype=object)
All seems to be in working order. There are null values, but that’s not a problem since I’m only analyzing the data and not building any machine learning models (yet). I’ll locally save the cleaned data in a CSV file move on to the Murder Accountability Project dataset:
"./data/post-data.csv") post_data.to_csv(
Murder Accountability Project Data
Data value exploration
map_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 849144 entries, 0 to 849143
Data columns (total 31 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ID 849144 non-null object
1 CNTYFIPS 849144 non-null object
2 Ori 849144 non-null object
3 State 849144 non-null object
4 Agency 849144 non-null object
5 Agentype 849144 non-null object
6 Source 849144 non-null object
7 Solved 849144 non-null object
8 Year 849144 non-null object
9 StateName 849144 non-null object
10 Month 849144 non-null object
11 Incident 849144 non-null object
12 ActionType 849144 non-null object
13 Homicide 849144 non-null object
14 Situation 849144 non-null object
15 VicAge 849144 non-null object
16 VicSex 849144 non-null object
17 VicRace 849144 non-null object
18 VicEthnic 849144 non-null object
19 OffAge 849144 non-null object
20 OffSex 849144 non-null object
21 OffRace 849144 non-null object
22 OffEthnic 849144 non-null object
23 Weapon 849144 non-null object
24 Relationship 849144 non-null object
25 Circumstance 849144 non-null object
26 Subcircum 849144 non-null object
27 VicCount 849144 non-null object
28 OffCount 849144 non-null object
29 FileDate 849144 non-null object
30 MSA 849144 non-null object
dtypes: object(31)
memory usage: 200.8+ MB
10) map_df.sample(
ID | CNTYFIPS | Ori | State | Agency | Agentype | Source | Solved | Year | StateName | Month | Incident | ActionType | Homicide | Situation | VicAge | VicSex | VicRace | VicEthnic | OffAge | OffSex | OffRace | OffEthnic | Weapon | Relationship | Circumstance | Subcircum | VicCount | OffCount | FileDate | MSA | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
296896 | 199012006NY01401 | Erie, NY | NY01401 | New York | Buffalo | Municipal police | FBI | No | 1990 | December | 6 | Adjustment | Murder or Non-negligent manslaughter | Single victim/unknown offender(s) | 80 | Male | Black | Unknown or not reported | 999 | Unknown | Unknown | Unknown or not reported | Blunt object - hammer, club, etc | Relationship not determined | Robbery | 0 | 0 | 040491 | Buffalo-Niagara Falls, NY | ||
452681 | 199803001TX11602 | Hunt, TX | TX11602 | Texas | Commerce | Municipal police | FBI | Yes | 1998 | March | 1 | Normal update | Murder or Non-negligent manslaughter | Single victim/single offender | 28 | Male | White | Hispanic Origin | 43 | Male | White | Hispanic Origin | Knife or cutting instrument | Acquaintance | Other arguments | 0 | 0 | 080498 | Dallas-Fort Worth-Arlington, TX | ||
603644 | 200707010TXHPD00 | Harris, TX | TXHPD00 | Texas | Houston | Municipal police | FBI | Yes | 2007 | July | 10 | Normal update | Murder or Non-negligent manslaughter | Single victim/single offender | 27 | Male | Black | Not of Hispanic Origin | 34 | Male | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Acquaintance | Other arguments | 0 | 0 | 122007 | Houston-Sugar Land-Baytown, TX | ||
192867 | 198509001MA309SP | Middlesex, MA | MA309SP | Massachusetts | State Police: Middlesex County | Primary state LE | FBI | Yes | 1985 | September | 1 | Normal update | Murder or Non-negligent manslaughter | Single victim/single offender | 30 | Female | White | Not of Hispanic Origin | 40 | Male | Black | Not of Hispanic Origin | Knife or cutting instrument | Acquaintance | Narcotic drug laws | 0 | 0 | 022586 | Boston-Cambridge-Quincy, MA-NH | ||
1037 | 197601006MA01301 | Suffolk, MA | MA01301 | Massachusetts | Boston | Municipal police | FBI | Yes | 1976 | January | 6 | Normal update | Murder or Non-negligent manslaughter | Single victim/single offender | 35 | Male | Black | Unknown or not reported | 26 | Male | Black | Unknown or not reported | Rifle | Acquaintance | Other arguments | 0 | 0 | 030180 | Boston-Cambridge-Quincy, MA-NH | ||
102907 | 198102001CO02101 | El Paso, CO | CO02101 | Colorado | Colorado Springs | Municipal police | FBI | No | 1981 | February | 1 | Normal update | Murder or Non-negligent manslaughter | Single victim/unknown offender(s) | 26 | Male | White | Unknown or not reported | 999 | Unknown | Unknown | Unknown or not reported | Knife or cutting instrument | Relationship not determined | Circumstances undetermined | 0 | 0 | 073081 | Colorado Springs, CO | ||
810742 | 202004001SC04100 | Saluda, SC | SC04100 | South Carolina | Saluda County | Sheriff | FBI | Yes | 2020 | SOUTH | April | 1 | Normal update | Manslaughter by negligence | Single victim/multiple offenders | 28 | Male | White | Not of Hispanic Origin | 37 | Female | White | Not of Hispanic Origin | Narcotics or drugs, sleeping pills | Boyfriend | All other manslaughter by negligence | 0 | 2 | 083121 | Columbia, SC | |
474480 | 199908001VA12200 | Richmond, VA | VA12200 | Virginia | Richmond | Municipal police | FBI | Yes | 1999 | August | 1 | Normal update | Murder or Non-negligent manslaughter | Single victim/single offender | 17 | Male | Black | Unknown or not reported | 17 | Male | Black | Unknown or not reported | Handgun - pistol, revolver, etc | Acquaintance | Other | 0 | 0 | 111099 | Richmond, VA | ||
255209 | 198812001NC01400 | Caldwell, NC | NC01400 | North Carolina | Caldwell County | Sheriff | FBI | Yes | 1988 | December | 1 | Normal update | Murder or Non-negligent manslaughter | Single victim/single offender | 25 | Male | White | Unknown or not reported | 18 | Male | Black | Unknown or not reported | Rifle | Acquaintance | Other arguments | 0 | 0 | 032389 | Hickory-Lenoir-Morganton, NC | ||
827985 | 202101005MDBPD00 | Baltimore city, MD | MDBPD00 | Maryland | Baltimore | Municipal police | MAP | No | 2021 | MARYLA | January | 5 | Normal update | Murder or Non-negligent manslaughter | Single victim/unknown offender(s) | 21 | Male | Black | Not of Hispanic Origin | 999 | Unknown | Unknown | Unknown or not reported | Handgun - pistol, revolver, etc | Relationship not determined | Circumstances undetermined | 0 | 0 | 122922 | Baltimore-Towson, MD |
The MAP dataset has more than ten times the number of data points as the Washington Post dataset, and different columns. It does not contain the victim’s names, though it does have information about the perpetrators, which the Post dataset does not.
Much like the Post dataset, the MAP dataset specifies unknown categorical values with the Unknown
value. Unknown numerical data, however, are recorded differently.
Cleaning and transformation
Cleaning the MAP data was not as straightforward as cleaning the Post data, so I decided to show more of the process of building its transformation function.
First, I found that the lengths of the datestrings were inconsistent. Most had six characters, which was good since those records adhered to a consistent datestring format, but some had four or five, and the remainder were empty.
So, I went about replacing the short datestrings with longer ones so that I could convert the dates to the datetime
type in one go later.
= map_df[
short_datestring_records "FileDate"].map(lambda d: 0 < len(d.strip()) < 6)
map_df["ID", "FileDate"]] ][[
short_datestring_records
ID | FileDate | |
---|---|---|
827265 | 202101001CA03700 | 10121 |
827966 | 202101005CA03711 | 10121 |
827971 | 202101005FL05901 | 1321 |
828009 | 202101006CA03711 | 10121 |
828045 | 202101007CA03711 | 10121 |
... | ... | ... |
848175 | 202112451FL04800 | 12321 |
848177 | 202112452FL04800 | 12821 |
848189 | 202112497FL04804 | 12421 |
848201 | 202112556FL05000 | 12121 |
848222 | 202112660FL05801 | 12921 |
2560 rows × 2 columns
2,560 rows is less than 1% of the total number of records in the MAP dataset, which is not bad (this isn’t the total number of invalid dates; since I’m only extracting datestrings I can change, I ignored all datestrings that are empty). Still, ideally all dates would be recorded accurately and in the same format, so I still think it’s worth taking some time to clean them up.
First, note that the MAP data dictionary [4] specifies the datestrings as having the format “mmddyy”; that is, the first two characters represent the month, the middle two represent the day, and the last two represent the year. However, not all the datetsrings have six characters, and in my effort to clean them up, I made two assumptions:
- That the last two characters always represents the year; and
- That the month always precedes the day.
The correct_datestrings()
function below is pretty simple: if the datestring has four characters, it assumes that the first two represent the month and day, and the last two represent the year (e.g. “4620” is April 6, 2020).
If, on the other hand, the datestring has five characters, things are a bit more complicated. For instance, the datestring “12417” could be corrected to either “012417” (January 24, 2017) or “120417” (December 4, 2017).
I think the most accurate way to figure out what the correct date is to compare the dates in records above and below the record in question. However, even that isn’t foolproof: the dates represented by the datestrings I’m correcting are the record file dates, which are updated when a change (or Adjustment
, as recorded in the ActionType
column) is made to the record. So, if a record is preceded by a case which was filed on January 23, 2017 and succeeded by a case filed January 25, 2017, the record could have a file date of December 4, 2017 if it was adjusted on that day.
So why not simply check the ActionType
column of the record to see if it was adjusted? Well, an adjustment could also have been made the same day the record was filed, so deciding on an interpretation of a datestring based purely on that would result in an incorrect date.
Also, I’m not sure that only the Adjustment
value of ActionType
corresponds to a change in the record, because the data dictionary doesn’t tell us what counts as a “normal” update and what counts as an adjustment.
To solve the five-character-datestring problem I used a combination of checks, in a series of if-elif-else statements. Here’s the logic behind them:
First, take the datestring “50216”. Obviously, “50” can’t be the month number since there are only twelve of those, so the correct datestring must be “050216” — May 2, 2016. This is one strategy I used to handle five-character datestrings: if the number formed by the first two characters exceeds 12, stick a zero at the front of it.
Next, all records in the MAP dataset have an ID, in which is encoded the year and month the record was initially filed, which can be used as a reference. For instance, a case with an ID like “202105045…” (the 45th incident in May, 2021) can’t have the datestring “11221” be corrected to “011221” (January 12, 2021) since a record can’t have been filed … before it was filed. So, I know to correct “11221” to “110221” (November 2, 2021).
All other datestrings I replaced with a ” “, since there is no purely logical way to verify which date they correspond to. For example, take ID”202101…” and datestring “10121”. Even the ActionType
doesn’t tell me much, since the MAP data dictionary doesn’t specify what, exactly, counts as a “normal” update and what counts as an adjustment.
def correct_datestrings(data):
= data.copy()
transformed_df
for i in transformed_df.index:
= transformed_df.loc[i, "FileDate"].strip()
date_string = transformed_df.loc[i, "ID"].strip()[4:6]
month_string
# Trivial case
if len(date_string) < 4:
"FileDate"] = " "
transformed_df.loc[i, # Easiest nontrivial case
elif len(date_string) == 4:
"FileDate"] = (
transformed_df.loc[i, "0" + date_string[0] + "0" + date_string[1:]
)# The month obviously can't exceed 12, so there must be a "0" in the beginning; covers dates 30 and 31.
elif int(date_string[:2]) > 12:
"FileDate"] = "0" + date_string
transformed_df.loc[i, # File month can't be less than record creation month
elif int("0" + date_string[0]) < int(month_string):
"FileDate"] = date_string[0:2] + "0" + date_string[2:]
transformed_df.loc[i, else:
"FileDate"] = " "
transformed_df.loc[i,
return transformed_df
= correct_datestrings(short_datestring_records) corrected_datestrings
"FileDate"].map(lambda d: len(d) < 6)] corrected_datestrings[corrected_datestrings[
ID | FileDate | |
---|---|---|
827265 | 202101001CA03700 | |
827966 | 202101005CA03711 | |
828009 | 202101006CA03711 | |
828045 | 202101007CA03711 | |
828233 | 202101014CA00107 | |
... | ... | ... |
828998 | 202101920CA03711 | |
829000 | 202101986CA03801 | |
829001 | 202101987CA03801 | |
829002 | 202101988CA03801 | |
829003 | 202101989CA03801 |
274 rows × 2 columns
Looks like that took care of almost all non-standard datestrings. Only 274 remain which are not six characters long. This is
274 / len(short_datestring_records)
0.10703125
or just over a tenth of all nonstandard datestrings which remain uncorrected. Before replacing the ones in the main DataFrame, though, let’s check the distribution of these dates and see that they make sense:
pd.to_datetime("FileDate"].map(lambda d: len(d) == 6)][
corrected_datestrings[corrected_datestrings["FileDate"
],format="%m%d%y",
).dt.month.value_counts().sort_index().plot.bar()
<Axes: xlabel='FileDate'>
Looks like homicide reports peak in summer months and are at a minimum in January and December, which makes sense: in addition to people being more confrontational in hot weather and just generally being out more, I suspect the holiday rush in January and December probably causes some administration delays.
I’m okay replacing the nonstandard datestrings in the main DataFrame with the corrected ones now. To do so, I’ll get a list of the indices of the datestrings I corrected:
= corrected_datestrings.index indices_to_replace
Now I can replace the datestrings in the main DataFrame with the new ones based on those indices:
"FileDate"] = corrected_datestrings.loc[
map_df.loc[indices_to_replace][
indices_to_replace"FileDate"] ][
Let’s see how many nonstandard datestrings now remain in the main DataFrame:
"FileDate"].map(lambda d: len(d.strip()) < 6).mean() map_df[
0.021325004946157543
Fewer than 2% of the records don’t have a valid FileDate
, which I think is pretty good (for now).
Next, the State
column lists the full name of the state rather than the two-letter abbreviation as the Post data does, so I created this dictionary to make conversion easier (again, this was an iterative process, and I modified code as I encountered and fixed errors and warnings):
= {
state_abbreviations "Alabama": "AL",
"Alaska": "AK",
"Arizona": "AZ",
"Arkansas": "AK",
"American Samoa": "AS",
"California": "CA",
"Colorado": "CO",
"Connecticut": "CT",
"Delaware": "DE",
"District of Columbia": "DC",
"Florida": "FL",
"Georgia": "GA",
"Guam": "GU",
"Hawaii": "HI",
"Idaho": "ID",
"Illinois": "IL",
"Indiana": "IN",
"Iowa": "IA",
"Kansas": "KS",
"Kentucky": "KY",
"Louisiana": "LA",
"Maine": "ME",
"Maryland": "MD",
"Massachusetts": "MA",
"Michigan": "MI",
"Minnesota": "MN",
"Mississippi": "MS",
"Missouri": "MO",
"Montana": "MT",
"Nebraska": "NE",
"Nevada": "NV",
"New Hampshire": "NH",
"New Jersey": "NJ",
"New Mexico": "NM",
"New York": "NY",
"North Carolina": "NC",
"North Dakota": "ND",
"Northern Mariana Islands": "MP",
"Ohio": "OH",
"Oklahoma": "OK",
"Oregon": "OR",
"Oregan": "OR", # There is an "Oregan" entry in the DataFrame
"Pennsylvania": "PA",
"Puerto Rico": "PR",
"Rhode Island": "RI",
"Rhodes Island": "RI", # There is a "Rhodes Island" entry in the DataFrame
"South Carolina": "SC",
"South Dakota": "SD",
"Tennessee": "TN",
"Texas": "TX",
"Trust Territories": "TT",
"Utah": "UT",
"Vermont": "VT",
"Virginia": "VA",
"Washington": "WA",
"West Virginia": "WV",
"Wisconsin": "WI",
"Wyoming": "WY",
}
That done, here is the final transformation/cleaning function. Note that I have renamed many of the columns to make them easier to work with.
def transform_map_dataframe(data):
# Avoid accidentally overriding the original
= pd.DataFrame()
transformed_df
"ID"] = data["ID"]
transformed_df[
"Agency Code"] = data["Ori"].astype("category")
transformed_df["Agency Name"] = data["Agency"].astype("category")
transformed_df["Agency Type"] = data["Agentype"].astype("category")
transformed_df[
"City"] = (
transformed_df["CNTYFIPS"].map(lambda s: s.split(",")[0]).astype("category")
data[
)"MSA"] = (
transformed_df["MSA"].map(lambda s: s.split(",")[0]).astype("category")
data[
)"State"] = (
transformed_df["State"].map(lambda s: state_abbreviations[s]).astype("category")
data[
)
"Date"] = pd.to_datetime(
transformed_df["FileDate"].map(
data[lambda d: dt.datetime.strptime(d, "%m%d%y")
if len(d.strip()) == 6
else np.datetime64("NaT")
)
)"Year"] = transformed_df["Date"].dt.year.astype(
transformed_df[="ignore"
np.int16, errors
)"Month"] = transformed_df["Date"].dt.month_name().astype("category")
transformed_df["Weekday"] = transformed_df["Date"].dt.day_name().astype("category")
transformed_df[
= transformed_df["Date"].max()
max_report_date "Delta Days"] = (
transformed_df[- transformed_df["Date"]).dt.total_seconds() / (60 * 60 * 24)
(max_report_date
).astype(pd.Int16Dtype())
"Occurrence Year"] = data["Year"].astype(int)
transformed_df["Occurrence Month"] = data["Month"].astype("category")
transformed_df[
"Incident"] = data["Incident"].astype(int)
transformed_df["Crime Type"] = data["Homicide"].astype("category")
transformed_df["Situation"] = data["Situation"].astype("category")
transformed_df[
"Action"] = data["ActionType"].astype("category")
transformed_df[
"Solved"] = (
transformed_df["Solved"].map(lambda s: True if s == "Yes" else False).astype(bool)
data[
)
"Victim Sex"] = data["VicSex"].astype("category")
transformed_df[# According to the MAP docs, an age value of 998 indicates an unknown age
"Victim Age"] = (
transformed_df["VicAge"]
data[map(lambda s: np.nan if not s.isnumeric() or int(s) > 900 else int(s))
.
.astype(pd.Int8Dtype())
)"Victim Race"] = data["VicRace"].astype("category")
transformed_df["Victim Ethnicity"] = data["VicEthnic"].astype("category")
transformed_df[
"Offender Sex"] = data["OffSex"].astype("category")
transformed_df[# According to the MAP docs, an age value of 998 indicates an unknown age
"Offender Age"] = (
transformed_df["OffAge"]
data[map(lambda s: np.nan if not s.isnumeric() or int(s) > 900 else int(s))
.
.astype(pd.Int8Dtype())
)"Offender Race"] = data["OffRace"].astype("category")
transformed_df["Offender Ethnicity"] = data["OffEthnic"].astype("category")
transformed_df[
"Weapon"] = data["Weapon"].astype("category")
transformed_df["Relationship"] = data["Relationship"].astype("category")
transformed_df[
"Circumstance"] = data["Circumstance"].astype("category")
transformed_df["Sub Circumstance"] = data["Subcircum"].astype("category")
transformed_df[
"Victim Count"] = data["VicCount"].astype(int)
transformed_df["Offender Count"] = data["OffCount"].astype(int)
transformed_df[
"Source"] = data["Source"].astype("category")
transformed_df[
return transformed_df
= transform_map_dataframe(map_df) map_data
map_data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 849144 entries, 0 to 849143
Data columns (total 34 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 ID 849144 non-null object
1 Agency Code 849144 non-null category
2 Agency Name 849144 non-null category
3 Agency Type 849144 non-null category
4 City 849144 non-null category
5 MSA 849144 non-null category
6 State 849144 non-null category
7 Date 831036 non-null datetime64[ns]
8 Year 831036 non-null float64
9 Month 831036 non-null category
10 Weekday 831036 non-null category
11 Delta Days 831036 non-null Int16
12 Occurrence Year 849144 non-null int32
13 Occurrence Month 849144 non-null category
14 Incident 849144 non-null int32
15 Crime Type 849144 non-null category
16 Situation 849144 non-null category
17 Action 849144 non-null category
18 Solved 849144 non-null bool
19 Victim Sex 849144 non-null category
20 Victim Age 836422 non-null Int8
21 Victim Race 849144 non-null category
22 Victim Ethnicity 849144 non-null category
23 Offender Sex 849144 non-null category
24 Offender Age 565529 non-null Int8
25 Offender Race 849144 non-null category
26 Offender Ethnicity 849144 non-null category
27 Weapon 849144 non-null category
28 Relationship 849144 non-null category
29 Circumstance 849144 non-null category
30 Sub Circumstance 849144 non-null category
31 Victim Count 849144 non-null int32
32 Offender Count 849144 non-null int32
33 Source 849144 non-null category
dtypes: Int16(1), Int8(2), bool(1), category(23), datetime64[ns](1), float64(1), int32(4), object(1)
memory usage: 61.8+ MB
=(16, 9)) map_data.hist(figsize
array([[<Axes: title={'center': 'Date'}>,
<Axes: title={'center': 'Year'}>,
<Axes: title={'center': 'Delta Days'}>],
[<Axes: title={'center': 'Occurrence Year'}>,
<Axes: title={'center': 'Incident'}>,
<Axes: title={'center': 'Victim Age'}>],
[<Axes: title={'center': 'Offender Age'}>,
<Axes: title={'center': 'Victim Count'}>,
<Axes: title={'center': 'Offender Count'}>]], dtype=object)
Interestingly, the report dates start on March 1, 1980 despite homicide occurrences being recorded in the late 1970s:
"Occurrence Year"].min() map_data[
1976
"Date"].min() map_data[
Timestamp('1980-03-01 00:00:00')
Initially I thought this was because the FBI didn’t start collecting homicide records until that date, but I could find no evidence to support this idea anywhere; indeed, there appear to be no events of significance related to homicide incident recording that occur on or around March 1, 1980.
Moreover, the criminal incident records filed in 1980 correspond almost entirely to numerous homicides from 1976 to 1979, and hardly any from 1980:
"Year"] == 1980][
map_data[map_data["Occurrence Year"
].value_counts().sort_index().plot.bar()
<Axes: xlabel='Occurrence Year'>
My instinct was that this kind of makes sense, since an incident has to have occurred before a report can be filed, but this looks like a pretty big discrepancy. That chart makes it look like there were over tenfold as many homicides in 1976 to 1979 each as there were in 1980. So, let’s see how much difference there is between occurrence year and reported year over the whole MAP dataset:
= map_data["Year"] - map_data["Occurrence Year"] reported_occurrence_year_diffs
reported_occurrence_year_diffs.describe()
count 831036.000000
mean 0.903335
std 0.933962
min -6.000000
25% 0.000000
50% 1.000000
75% 1.000000
max 25.000000
dtype: float64
Okay, that’s odd. Since the date the report is filed has to be after the date of the actual incident, the minimum difference should be 0
, not -6
. Let’s see which cases have a report year that occurs before the occurrence year:
"Year"] < map_data["Occurrence Year"]] map_data[map_data[
ID | Agency Code | Agency Name | Agency Type | City | MSA | State | Date | Year | Month | Weekday | Delta Days | Occurrence Year | Occurrence Month | Incident | Crime Type | Situation | Action | Solved | Victim Sex | Victim Age | Victim Race | Victim Ethnicity | Offender Sex | Offender Age | Offender Race | Offender Ethnicity | Weapon | Relationship | Circumstance | Sub Circumstance | Victim Count | Offender Count | Source | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
511997 | 200201108DCMPD00 | DCMPD00 | Washington | Municipal police | District of Columbia | Washington-Arlington-Alexandria | DC | 2001-03-01 | 2001.0 | March | Thursday | 7973 | 2002 | January | 108 | Murder or Non-negligent manslaughter | Single victim/unknown offender(s) | Normal update | False | Male | 15 | Black | Not of Hispanic Origin | Unknown | <NA> | Unknown | Unknown or not reported | Firearm, type not stated | Relationship not determined | Circumstances undetermined | 0 | 0 | MAP | |
511998 | 200201109DCMPD00 | DCMPD00 | Washington | Municipal police | District of Columbia | Washington-Arlington-Alexandria | DC | 2001-03-01 | 2001.0 | March | Thursday | 7973 | 2002 | January | 109 | Murder or Non-negligent manslaughter | Single victim/unknown offender(s) | Normal update | False | Male | 18 | Black | Not of Hispanic Origin | Unknown | <NA> | Unknown | Unknown or not reported | Firearm, type not stated | Relationship not determined | Circumstances undetermined | 0 | 0 | MAP | |
511999 | 200201110DCMPD00 | DCMPD00 | Washington | Municipal police | District of Columbia | Washington-Arlington-Alexandria | DC | 2001-03-01 | 2001.0 | March | Thursday | 7973 | 2002 | January | 110 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 18 | Black | Not of Hispanic Origin | Male | 28 | Black | Not of Hispanic Origin | Firearm, type not stated | Relationship not determined | Circumstances undetermined | 0 | 0 | MAP | |
512000 | 200201111DCMPD00 | DCMPD00 | Washington | Municipal police | District of Columbia | Washington-Arlington-Alexandria | DC | 2001-03-01 | 2001.0 | March | Thursday | 7973 | 2002 | January | 111 | Murder or Non-negligent manslaughter | Single victim/multiple offenders | Normal update | True | Male | 18 | Black | Not of Hispanic Origin | Male | 20 | Black | Not of Hispanic Origin | Firearm, type not stated | Relationship not determined | Circumstances undetermined | 0 | 1 | MAP | |
512001 | 200201112DCMPD00 | DCMPD00 | Washington | Municipal police | District of Columbia | Washington-Arlington-Alexandria | DC | 2001-03-01 | 2001.0 | March | Thursday | 7973 | 2002 | January | 112 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 22 | Black | Not of Hispanic Origin | Male | 23 | Black | Not of Hispanic Origin | Firearm, type not stated | Relationship not determined | Circumstances undetermined | 0 | 0 | MAP | |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
611218 | 200712516DCMPD00 | DCMPD00 | Washington | Municipal police | District of Columbia | Washington-Arlington-Alexandria | DC | 2001-03-01 | 2001.0 | March | Thursday | 7973 | 2007 | December | 516 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 26 | Black | Not of Hispanic Origin | Male | <NA> | Black | Not of Hispanic Origin | Knife or cutting instrument | Relationship not determined | Circumstances undetermined | 0 | 0 | MAP | |
611219 | 200712517DCMPD00 | DCMPD00 | Washington | Municipal police | District of Columbia | Washington-Arlington-Alexandria | DC | 2001-03-01 | 2001.0 | March | Thursday | 7973 | 2007 | December | 517 | Murder or Non-negligent manslaughter | Single victim/unknown offender(s) | Normal update | False | Male | 33 | Black | Not of Hispanic Origin | Unknown | <NA> | Unknown | Unknown or not reported | Firearm, type not stated | Relationship not determined | Circumstances undetermined | 0 | 0 | MAP | |
611220 | 200712518DCMPD00 | DCMPD00 | Washington | Municipal police | District of Columbia | Washington-Arlington-Alexandria | DC | 2001-03-01 | 2001.0 | March | Thursday | 7973 | 2007 | December | 518 | Murder or Non-negligent manslaughter | Single victim/unknown offender(s) | Normal update | False | Male | 35 | White | Hispanic Origin | Unknown | <NA> | Unknown | Unknown or not reported | Other or type unknown | Relationship not determined | Circumstances undetermined | 0 | 0 | MAP | |
611221 | 200712519DCMPD00 | DCMPD00 | Washington | Municipal police | District of Columbia | Washington-Arlington-Alexandria | DC | 2001-03-01 | 2001.0 | March | Thursday | 7973 | 2007 | December | 519 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Female | 51 | Black | Not of Hispanic Origin | Male | <NA> | Black | Not of Hispanic Origin | Knife or cutting instrument | Relationship not determined | Circumstances undetermined | 0 | 0 | MAP | |
611222 | 200712520DCMPD00 | DCMPD00 | Washington | Municipal police | District of Columbia | Washington-Arlington-Alexandria | DC | 2001-03-01 | 2001.0 | March | Thursday | 7973 | 2007 | December | 520 | Murder or Non-negligent manslaughter | Single victim/unknown offender(s) | Normal update | False | Male | 53 | Black | Not of Hispanic Origin | Unknown | <NA> | Unknown | Unknown or not reported | Firearm, type not stated | Relationship not determined | Circumstances undetermined | 0 | 0 | MAP |
1209 rows × 34 columns
Fascinating. Looks like the report filing date is March 1, 2001 for all of these:
"Year"] < map_data["Occurrence Year"]]["Date"].value_counts() map_data[map_data[
Date
2001-03-01 1209
Name: count, dtype: int64
They are also all from Washington:
"Year"] < map_data["Occurrence Year"]]["City"].unique() map_data[map_data[
['District of Columbia']
Categories (1798, object): ['Abbeville', 'Acadia', 'Accomack', 'Ada', ..., 'Yukon-Koyukuk', 'Yuma', 'Zapata', 'Zavala']
This is something I’d like to investigate further during the analysis phase; for now, it’s good enough to have all the data filled in and in the right format.
There’s one more thing I want to look at: the Offender Age
column.
In the initial version of this chapter, I used an outdated MAP dataset that went up to 2014. In that dataset, the number 0
appeared to be used as a placeholder for unknown or not reported offender age, in addition to the usual value of 998
. This latest dataset, which covers incidents up to 2021, doesn’t appear to have that problem:
"Offender Age"].hist(bins=99) map_data[
<Axes: >
Marvelous. The recorded offender ages are much more consistent in this dataset. Also, we can see spikes at age values that are multiples of 5, probably occuring due to age estimates being rounded (e.g. “he was about 45 years old”).
Much like the victims’ ages, the perpetrators’ ages are right-skewed.
There is a slight bump at the value 99
; according to the MAP data dictionary, a value of 99
represents the age being 99 or older. Normally this would be inconvenient, because this gives the data a categorical-like property, but in this case the number of data points with a Perpetrator Age
of 99
is so small on the large scale that the bump in the distribution’s tail can be put down to noise.
On the small scale, though, that “bump” becomes a skyscraper:
"Offender Age"] >= 90]["Offender Age"].hist(bins=10) map_data[map_data[
<Axes: >
There is also a bump at the value 0
, which, though not the three-times-as-long bar it was in the version of the dataset I used before, is still concerning. I would like to investigate further.
"Offender Age"] < 10]["Offender Age"].hist(bins=10) map_data[map_data[
<Axes: >
There are is unsually large number of offenders recorded as being aged zero and one. Are we really to believe that a newborn is more likely to kill someone than a four-year-old? There is nothing in the MAP data dictionary [3] that suggests an answer, but this is something I’d like to explore further duirng the analysis phase.
A clustering algorithm might also help better understand the characteristics of cases which have Offender Age
recorded as 0
or 1
.
For now, there are few enough cases like this that statistically I can safely pass it off as noise; the overall distribution of Offender Age
is still fairly clear.
Finally, I saved the cleaned MAP data to a local CSV so I don’t have to repeat these steps every time:
"./data/map-data.csv") map_data.to_csv(
Difficulty in combining the datasets
Both the Washington Post data and the Murder Accountability Project data have been analyzed quite thoroughly by others. So, I wanted to combine these datasets, using the specific information in the Post dataset to find the corresponding case in the MAP dataset and join the columns.
The trouble is that there is no guarantee the information is recorded the same (or at all) in the sources used to compile the Post and MAP datasets. For example, just because a characteristic (e.g. age, sex, etc.) is recorded in one dataset does not mean it will be recorded in the other.
Finally, despite the MAP dataset being rather comprehensive, there isn’t any guarantee that a case in the Post dataset even exists in the MAP dataset.
This means that there may be multiple or no MAP data points corresponding to each Post data point, and so I can’t just run a pandas.merge()
on the two DataFrames and assume it will work.
Let’s start by checking out the columns of each dataset and see which ones we can use to merge on:
print(post_data.columns)
print(map_data.columns)
Index(['uid', 'reported_date', 'victim_last', 'victim_first', 'victim_race',
'victim_age', 'victim_sex', 'city', 'state', 'lat', 'lon',
'disposition', 'year', 'month', 'weekday', 'delta_days'],
dtype='object')
Index(['ID', 'Agency Code', 'Agency Name', 'Agency Type', 'City', 'MSA',
'State', 'Date', 'Year', 'Month', 'Weekday', 'Delta Days',
'Occurrence Year', 'Occurrence Month', 'Incident', 'Crime Type',
'Situation', 'Action', 'Solved', 'Victim Sex', 'Victim Age',
'Victim Race', 'Victim Ethnicity', 'Offender Sex', 'Offender Age',
'Offender Race', 'Offender Ethnicity', 'Weapon', 'Relationship',
'Circumstance', 'Sub Circumstance', 'Victim Count', 'Offender Count',
'Source'],
dtype='object')
The city and state are the first filters I will apply to narrow things down.
After that, the case file date will narrow the time range.
Finally, information about the victim, including age, sex, and race should (under ideal circumstances, which never occur in reality because that would make life too easy) give me the exact case in the MAP data.
I suspect that data about the victim, as well as the year and month, are the most likely to be inconsistent in the two datasets, and geographic information most likely to be the same.
My reasoning is that information about the victim might not be known at the time the record was made by the agency from which the data was extracted in one dataset, but may have been known when the other dataset was compiled.
Additionally, the Post dataset and MAP dataset use dates differently: the Post dataset records the date the crime was reported, whereas the MAP dataset records the date of the homicide or recovery of the body. Those are not always the same.
For now, let’s write a function that performs a simple inner-join on the two datasets based on all of the discussed factors, just to see what we get:
def simple_merge():
= ["City", "State", "Date", "Victim Sex", "Victim Race", "Victim Age"]
map_cols = [
post_cols "city",
"state",
"reported_date",
"victim_sex",
"victim_race",
"victim_age",
]
= pd.merge(post_data, map_data, left_on=post_cols, right_on=map_cols)
filtered_df
return filtered_df
= simple_merge() simple_merged_data
simple_merged_data
uid | reported_date | victim_last | victim_first | victim_race | victim_age | victim_sex | city | state | lat | lon | disposition | year | month | weekday | delta_days | ID | Agency Code | Agency Name | Agency Type | City | MSA | State | Date | Year | Month | Weekday | Delta Days | Occurrence Year | Occurrence Month | Incident | Crime Type | Situation | Action | Solved | Victim Sex | Victim Age | Victim Race | Victim Ethnicity | Offender Sex | Offender Age | Offender Race | Offender Ethnicity | Weapon | Relationship | Circumstance | Sub Circumstance | Victim Count | Offender Count | Source | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Bal-700097 | 2017-04-21 | WILSON | DAVANTE | Black | 25 | Male | Baltimore | MD | 39.370844 | -76.591137 | Closed by arrest | 2017 | April | Friday | 254 | 201601004MD00301 | MD00301 | Baltimore County Police Department | County police | Baltimore | Baltimore-Towson | MD | 2017-04-21 | 2017.0 | April | Friday | 2078 | 2016 | January | 4 | Murder or Non-negligent manslaughter | Single victim/multiple offenders | Adjustment | True | Male | 25 | Black | Not of Hispanic Origin | Male | 20 | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Acquaintance | Robbery | 0 | 1 | FBI | |
1 | Bal-700144 | 2017-05-30 | BROWN | EVERETTE | Black | 35 | Male | Baltimore | MD | 39.304259 | -76.598341 | Open/No arrest | 2017 | May | Tuesday | 215 | 201607006MD00301 | MD00301 | Baltimore County Police Department | County police | Baltimore | Baltimore-Towson | MD | 2017-05-30 | 2017.0 | May | Tuesday | 2039 | 2016 | July | 6 | Murder or Non-negligent manslaughter | Single victim/multiple offenders | Adjustment | True | Male | 35 | Black | Unknown or not reported | Male | 26 | Black | Unknown or not reported | Knife or cutting instrument | Acquaintance | Circumstances undetermined | 0 | 1 | FBI | |
2 | Los-000440 | 2011-07-05 | Unknown | Unknown | Black | 21 | Male | Los Angeles | CA | 33.968800 | -118.291400 | Open/No arrest | 2011 | July | Tuesday | 2371 | 201101001CA01915 | CA01915 | Compton | Municipal police | Los Angeles | Los Angeles-Long Beach | CA | 2011-07-05 | 2011.0 | July | Tuesday | 4195 | 2011 | January | 1 | Murder or Non-negligent manslaughter | Single victim/unknown offender(s) | Adjustment | False | Male | 21 | Black | Not of Hispanic Origin | Unknown | <NA> | Unknown | Unknown or not reported | Handgun - pistol, revolver, etc | Relationship not determined | Juvenile gang killings | 0 | 0 | FBI | |
3 | Los-000496 | 2011-10-31 | Unknown | Unknown | Black | 29 | Male | Los Angeles | CA | 33.991600 | -118.267400 | Open/No arrest | 2011 | October | Monday | 2253 | 201105014CA01942 | CA01942 | Los Angeles | Municipal police | Los Angeles | Los Angeles-Long Beach | CA | 2011-10-31 | 2011.0 | October | Monday | 4077 | 2011 | May | 14 | Murder or Non-negligent manslaughter | Single victim/multiple offenders | Adjustment | True | Male | 29 | Black | Not of Hispanic Origin | Male | 32 | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Stranger | Juvenile gang killings | 0 | 1 | FBI | |
4 | Los-001293 | 2014-12-30 | Unknown | Unknown | Black | 20 | Male | Los Angeles | CA | 33.962700 | -118.311200 | Open/No arrest | 2014 | December | Tuesday | 1097 | 201404001CA01977 | CA01977 | Carson | Municipal police | Los Angeles | Los Angeles-Long Beach | CA | 2014-12-30 | 2014.0 | December | Tuesday | 2921 | 2014 | April | 1 | Murder or Non-negligent manslaughter | Single victim/unknown offender(s) | Adjustment | False | Male | 20 | Black | Not of Hispanic Origin | Unknown | <NA> | Unknown | Unknown or not reported | Handgun - pistol, revolver, etc | Relationship not determined | Juvenile gang killings | 0 | 0 | FBI | |
5 | Los-001293 | 2014-12-30 | Unknown | Unknown | Black | 20 | Male | Los Angeles | CA | 33.962700 | -118.311200 | Open/No arrest | 2014 | December | Tuesday | 1097 | 201411009CA01942 | CA01942 | Los Angeles | Municipal police | Los Angeles | Los Angeles-Long Beach | CA | 2014-12-30 | 2014.0 | December | Tuesday | 2921 | 2014 | November | 9 | Murder or Non-negligent manslaughter | Single victim/unknown offender(s) | Adjustment | False | Male | 20 | Black | Not of Hispanic Origin | Unknown | <NA> | Unknown | Unknown or not reported | Handgun - pistol, revolver, etc | Other - known to victim | Juvenile gang killings | 0 | 0 | FBI | |
6 | Los-001648 | 2015-11-18 | Unknown | Unknown | Black | 22 | Male | Los Angeles | CA | 33.957400 | -118.256400 | Open/No arrest | 2015 | November | Wednesday | 774 | 201509005CA01941 | CA01941 | Long Beach | Municipal police | Los Angeles | Los Angeles-Long Beach | CA | 2015-11-18 | 2015.0 | November | Wednesday | 2598 | 2015 | September | 5 | Murder or Non-negligent manslaughter | Single victim/unknown offender(s) | Adjustment | False | Male | 22 | Black | Not of Hispanic Origin | Unknown | <NA> | Unknown | Unknown or not reported | Handgun - pistol, revolver, etc | Relationship not determined | Juvenile gang killings | 0 | 0 | FBI | |
7 | Los-001851 | 2016-09-20 | Unknown | Unknown | Black | 27 | Male | Los Angeles | CA | 33.975700 | -118.329400 | Open/No arrest | 2016 | September | Tuesday | 467 | 201608017CA01942 | CA01942 | Los Angeles | Municipal police | Los Angeles | Los Angeles-Long Beach | CA | 2016-09-20 | 2016.0 | September | Tuesday | 2291 | 2016 | August | 17 | Murder or Non-negligent manslaughter | Single victim/multiple offenders | Adjustment | True | Male | 27 | Black | Not of Hispanic Origin | Male | 36 | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Other - known to victim | Juvenile gang killings | 0 | 1 | FBI | |
8 | Phi-000061 | 2007-10-04 | PLUMMER | SOLOMON | Black | 36 | Male | Philadelphia | PA | 39.993153 | -75.184912 | Closed by arrest | 2007 | October | Thursday | 3741 | 200707027PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2007-10-04 | 2007.0 | October | Thursday | 5565 | 2007 | July | 27 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 36 | Black | Not of Hispanic Origin | Male | 26 | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Other - known to victim | Other arguments | 0 | 0 | FBI | |
9 | Phi-000188 | 2015-07-17 | EPPS | DARIUS | Black | 23 | Male | Philadelphia | PA | 39.944246 | -75.226633 | Closed by arrest | 2015 | July | Friday | 898 | 201504003PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2015-07-17 | 2015.0 | July | Friday | 2722 | 2015 | April | 3 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 23 | Black | Not of Hispanic Origin | Male | 44 | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Other - known to victim | Narcotic drug laws | 0 | 0 | FBI | |
10 | Phi-000188 | 2015-07-17 | EPPS | DARIUS | Black | 23 | Male | Philadelphia | PA | 39.944246 | -75.226633 | Closed by arrest | 2015 | July | Friday | 898 | 201504005PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2015-07-17 | 2015.0 | July | Friday | 2722 | 2015 | April | 5 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 23 | Black | Not of Hispanic Origin | Male | 20 | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Other - known to victim | Other arguments | 0 | 0 | FBI | |
11 | Phi-000315 | 2013-01-03 | COLLICK | CHRISTOPHER | Black | 20 | Male | Philadelphia | PA | 39.977372 | -75.240524 | Closed by arrest | 2013 | January | Thursday | 1823 | 201210005PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2013-01-03 | 2013.0 | January | Thursday | 3647 | 2012 | October | 5 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 20 | Black | Not of Hispanic Origin | Male | <NA> | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Stranger | Other arguments | 0 | 0 | FBI | |
12 | Phi-000362 | 2014-06-10 | BERRY | RASHON | Black | 26 | Male | Philadelphia | PA | 39.967574 | -75.213340 | Closed by arrest | 2014 | June | Tuesday | 1300 | 201401010PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2014-06-10 | 2014.0 | June | Tuesday | 3124 | 2014 | January | 10 | Murder or Non-negligent manslaughter | Single victim/unknown offender(s) | Normal update | False | Male | 26 | Black | Not of Hispanic Origin | Unknown | <NA> | Unknown | Unknown or not reported | Handgun - pistol, revolver, etc | Relationship not determined | Other arguments | 0 | 0 | FBI | |
13 | Phi-000633 | 2012-10-01 | WESLEY | DONALD | Black | 19 | Male | Philadelphia | PA | 39.982382 | -75.182575 | Closed by arrest | 2012 | October | Monday | 1917 | 201207026PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2012-10-01 | 2012.0 | October | Monday | 3741 | 2012 | July | 26 | Murder or Non-negligent manslaughter | Single victim/unknown offender(s) | Normal update | False | Male | 19 | Black | Not of Hispanic Origin | Unknown | <NA> | Unknown | Unknown or not reported | Knife or cutting instrument | Relationship not determined | Other arguments | 0 | 0 | FBI | |
14 | Phi-000846 | 2011-07-07 | GUY | ALONZO | Black | 30 | Male | Philadelphia | PA | 40.021300 | -75.155858 | Closed by arrest | 2011 | July | Thursday | 2369 | 201102005PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2011-07-07 | 2011.0 | July | Thursday | 4193 | 2011 | February | 5 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 30 | Black | Not of Hispanic Origin | Male | 25 | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Other - known to victim | Narcotic drug laws | 0 | 0 | FBI | |
15 | Phi-000861 | 2011-09-29 | SMITH | MARQUISE | Black | 23 | Male | Philadelphia | PA | 39.972675 | -75.234330 | Closed by arrest | 2011 | September | Thursday | 2285 | 201107019PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2011-09-29 | 2011.0 | September | Thursday | 4109 | 2011 | July | 19 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 23 | Black | Not of Hispanic Origin | Male | <NA> | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Relationship not determined | Circumstances undetermined | 0 | 0 | FBI | |
16 | Phi-000908 | 2011-06-29 | LAWSON | DWAYNE | Black | 20 | Male | Philadelphia | PA | 39.930713 | -75.178759 | Closed by arrest | 2011 | June | Wednesday | 2377 | 201101025PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2011-06-29 | 2011.0 | June | Wednesday | 4201 | 2011 | January | 25 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 20 | Black | Not of Hispanic Origin | Male | 42 | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Relationship not determined | Other arguments | 0 | 0 | FBI | |
17 | Phi-000959 | 2011-03-03 | HICKS | JASON | Black | 40 | Male | Philadelphia | PA | 39.968747 | -75.221200 | Closed by arrest | 2011 | March | Thursday | 2495 | 201011010PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2011-03-03 | 2011.0 | March | Thursday | 4319 | 2010 | November | 10 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 40 | Black | Not of Hispanic Origin | Male | 20 | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Other - known to victim | Narcotic drug laws | 0 | 0 | FBI | |
18 | Phi-001026 | 2010-09-29 | STREET | KASEEM | Black | 24 | Male | Philadelphia | PA | 39.932188 | -75.184198 | Closed by arrest | 2010 | September | Wednesday | 2650 | 201006014PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2010-09-29 | 2010.0 | September | Wednesday | 4474 | 2010 | June | 14 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 24 | Black | Not of Hispanic Origin | Male | 17 | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Other - known to victim | Narcotic drug laws | 0 | 0 | FBI | |
19 | Phi-001052 | 2008-06-30 | MULLINS | STEVEN | Black | 24 | Male | Philadelphia | PA | 39.989631 | -75.163233 | Closed by arrest | 2008 | June | Monday | 3471 | 200804005PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2008-06-30 | 2008.0 | June | Monday | 5295 | 2008 | April | 5 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 24 | Black | Not of Hispanic Origin | Male | 31 | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Other - known to victim | Other arguments | 0 | 0 | FBI | |
20 | Phi-001052 | 2008-06-30 | MULLINS | STEVEN | Black | 24 | Male | Philadelphia | PA | 39.989631 | -75.163233 | Closed by arrest | 2008 | June | Monday | 3471 | 200804012PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2008-06-30 | 2008.0 | June | Monday | 5295 | 2008 | April | 12 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 24 | Black | Not of Hispanic Origin | Male | 21 | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Other - known to victim | Other arguments | 0 | 0 | FBI | |
21 | Phi-001424 | 2009-01-06 | BRADFORD | KHALIF | Black | 21 | Male | Philadelphia | PA | 39.959047 | -75.211088 | Closed by arrest | 2009 | January | Tuesday | 3281 | 200810009PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2009-01-06 | 2009.0 | January | Tuesday | 5105 | 2008 | October | 9 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 21 | Black | Not of Hispanic Origin | Male | 20 | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Other - known to victim | Other arguments | 0 | 0 | FBI | |
22 | Phi-001424 | 2009-01-06 | BRADFORD | KHALIF | Black | 21 | Male | Philadelphia | PA | 39.959047 | -75.211088 | Closed by arrest | 2009 | January | Tuesday | 3281 | 200810012PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2009-01-06 | 2009.0 | January | Tuesday | 5105 | 2008 | October | 12 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 21 | Black | Not of Hispanic Origin | Male | 20 | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Acquaintance | Other arguments | 0 | 0 | FBI | |
23 | Phi-001424 | 2009-01-06 | BRADFORD | KHALIF | Black | 21 | Male | Philadelphia | PA | 39.959047 | -75.211088 | Closed by arrest | 2009 | January | Tuesday | 3281 | 200810015PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2009-01-06 | 2009.0 | January | Tuesday | 5105 | 2008 | October | 15 | Murder or Non-negligent manslaughter | Single victim/unknown offender(s) | Normal update | False | Male | 21 | Black | Not of Hispanic Origin | Unknown | <NA> | Unknown | Unknown or not reported | Handgun - pistol, revolver, etc | Relationship not determined | Circumstances undetermined | 0 | 0 | FBI | |
24 | Phi-001543 | 2007-07-13 | HOLLOWAY | DAMIEN | Black | 27 | Male | Philadelphia | PA | 40.025431 | -75.042796 | Closed by arrest | 2007 | July | Friday | 3824 | 200702025PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2007-07-13 | 2007.0 | July | Friday | 5648 | 2007 | February | 25 | Murder or Non-negligent manslaughter | Single victim/multiple offenders | Normal update | True | Male | 27 | Black | Not of Hispanic Origin | Male | 22 | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Stranger | Felon killed by police | Felon killed in commission of a crime | 0 | 1 | FBI |
25 | Phi-001810 | 2007-07-13 | RAGIN | KAHILL | Black | 27 | Male | Philadelphia | PA | 40.048824 | -75.152000 | Closed by arrest | 2007 | July | Friday | 3824 | 200702025PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2007-07-13 | 2007.0 | July | Friday | 5648 | 2007 | February | 25 | Murder or Non-negligent manslaughter | Single victim/multiple offenders | Normal update | True | Male | 27 | Black | Not of Hispanic Origin | Male | 22 | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Stranger | Felon killed by police | Felon killed in commission of a crime | 0 | 1 | FBI |
26 | Phi-001685 | 2008-01-02 | COAXUM | EDWARD | Black | 18 | Male | Philadelphia | PA | 39.989347 | -75.167311 | Closed by arrest | 2008 | January | Wednesday | 3651 | 200709027PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2008-01-02 | 2008.0 | January | Wednesday | 5475 | 2007 | September | 27 | Murder or Non-negligent manslaughter | Single victim/multiple offenders | Normal update | True | Male | 18 | Black | Not of Hispanic Origin | Male | 19 | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Other - known to victim | Robbery | 0 | 1 | FBI | |
27 | Phi-004043 | 2008-06-30 | WESTON | ARI | Black | 19 | Male | Philadelphia | PA | 40.034626 | -75.114981 | Open/No arrest | 2008 | June | Monday | 3471 | 200804010PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2008-06-30 | 2008.0 | June | Monday | 5295 | 2008 | April | 10 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 19 | Black | Not of Hispanic Origin | Male | <NA> | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Other - known to victim | Other arguments | 0 | 0 | FBI | |
28 | Phi-004043 | 2008-06-30 | WESTON | ARI | Black | 19 | Male | Philadelphia | PA | 40.034626 | -75.114981 | Open/No arrest | 2008 | June | Monday | 3471 | 200804021PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2008-06-30 | 2008.0 | June | Monday | 5295 | 2008 | April | 21 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 19 | Black | Not of Hispanic Origin | Male | 26 | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Relationship not determined | Circumstances undetermined | 0 | 0 | FBI | |
29 | Phi-004043 | 2008-06-30 | WESTON | ARI | Black | 19 | Male | Philadelphia | PA | 40.034626 | -75.114981 | Open/No arrest | 2008 | June | Monday | 3471 | 200804022PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2008-06-30 | 2008.0 | June | Monday | 5295 | 2008 | April | 22 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 19 | Black | Not of Hispanic Origin | Male | <NA> | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Relationship not determined | Circumstances undetermined | 0 | 0 | FBI | |
30 | Phi-004148 | 2009-08-17 | MAKADJI | MAMADOU | Black | 22 | Male | Philadelphia | PA | 39.914546 | -75.237125 | Open/No arrest | 2009 | August | Monday | 3058 | 200905006PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2009-08-17 | 2009.0 | August | Monday | 4882 | 2009 | May | 6 | Murder or Non-negligent manslaughter | Single victim/multiple offenders | Normal update | True | Male | 22 | Black | Not of Hispanic Origin | Male | 23 | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Other - known to victim | Other arguments | 0 | 1 | FBI | |
31 | Phi-004148 | 2009-08-17 | MAKADJI | MAMADOU | Black | 22 | Male | Philadelphia | PA | 39.914546 | -75.237125 | Open/No arrest | 2009 | August | Monday | 3058 | 200905007PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2009-08-17 | 2009.0 | August | Monday | 4882 | 2009 | May | 7 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 22 | Black | Not of Hispanic Origin | Male | 18 | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Relationship not determined | Robbery | 0 | 0 | FBI | |
32 | Phi-004148 | 2009-08-17 | MAKADJI | MAMADOU | Black | 22 | Male | Philadelphia | PA | 39.914546 | -75.237125 | Open/No arrest | 2009 | August | Monday | 3058 | 200905017PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2009-08-17 | 2009.0 | August | Monday | 4882 | 2009 | May | 17 | Murder or Non-negligent manslaughter | Single victim/unknown offender(s) | Normal update | False | Male | 22 | Black | Not of Hispanic Origin | Unknown | <NA> | Unknown | Unknown or not reported | Handgun - pistol, revolver, etc | Relationship not determined | Circumstances undetermined | 0 | 0 | FBI | |
33 | Phi-004183 | 2009-12-01 | MICKENS | HAALA | Black | 22 | Female | Philadelphia | PA | 40.026860 | -75.171191 | Open/No arrest | 2009 | December | Tuesday | 2952 | 200909010PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2009-12-01 | 2009.0 | December | Tuesday | 4776 | 2009 | September | 10 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Female | 22 | Black | Not of Hispanic Origin | Male | 23 | Black | Not of Hispanic Origin | Blunt object - hammer, club, etc | Girlfriend | Other arguments | 0 | 0 | FBI | |
34 | Phi-004363 | 2011-08-30 | JACKSON | KHALIL | Black | 22 | Male | Philadelphia | PA | 39.919114 | -75.231403 | Open/No arrest | 2011 | August | Tuesday | 2315 | 201106002PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2011-08-30 | 2011.0 | August | Tuesday | 4139 | 2011 | June | 2 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 22 | Black | Not of Hispanic Origin | Male | 21 | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Other - known to victim | Other arguments | 0 | 0 | FBI | |
35 | Phi-004363 | 2011-08-30 | JACKSON | KHALIL | Black | 22 | Male | Philadelphia | PA | 39.919114 | -75.231403 | Open/No arrest | 2011 | August | Tuesday | 2315 | 201106003PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2011-08-30 | 2011.0 | August | Tuesday | 4139 | 2011 | June | 3 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 22 | Black | Not of Hispanic Origin | Male | <NA> | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Other - known to victim | Other arguments | 0 | 0 | FBI | |
36 | Phi-004363 | 2011-08-30 | JACKSON | KHALIL | Black | 22 | Male | Philadelphia | PA | 39.919114 | -75.231403 | Open/No arrest | 2011 | August | Tuesday | 2315 | 201106022PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2011-08-30 | 2011.0 | August | Tuesday | 4139 | 2011 | June | 22 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 22 | Black | Not of Hispanic Origin | Male | <NA> | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Other - known to victim | Other arguments | 0 | 0 | FBI | |
37 | Phi-004436 | 2012-03-01 | SIPES | SEAN | Black | 30 | Male | Philadelphia | PA | 39.987300 | -75.163270 | Open/No arrest | 2012 | March | Thursday | 2131 | 201112002PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2012-03-01 | 2012.0 | March | Thursday | 3955 | 2011 | December | 2 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 30 | Black | Not of Hispanic Origin | Male | 25 | White | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Stranger | Felon killed by police | Felon attacked police officer | 0 | 0 | FBI |
38 | Phi-004538 | 2013-01-03 | LATHEM | JAMES | Black | 39 | Male | Philadelphia | PA | 40.004943 | -75.160081 | Open/No arrest | 2013 | January | Thursday | 1823 | 201210010PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2013-01-03 | 2013.0 | January | Thursday | 3647 | 2012 | October | 10 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 39 | Black | Not of Hispanic Origin | Male | <NA> | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Other - known to victim | Narcotic drug laws | 0 | 0 | FBI | |
39 | Phi-004550 | 2013-03-04 | BRYANT | SABIR | Black | 19 | Male | Philadelphia | PA | 39.907209 | -75.246154 | Open/No arrest | 2013 | March | Monday | 1763 | 201212017PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2013-03-04 | 2013.0 | March | Monday | 3587 | 2012 | December | 17 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 19 | Black | Not of Hispanic Origin | Male | 20 | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Other - known to victim | Other arguments | 0 | 0 | FBI | |
40 | Phi-004550 | 2013-03-04 | BRYANT | SABIR | Black | 19 | Male | Philadelphia | PA | 39.907209 | -75.246154 | Open/No arrest | 2013 | March | Monday | 1763 | 201212018PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2013-03-04 | 2013.0 | March | Monday | 3587 | 2012 | December | 18 | Murder or Non-negligent manslaughter | Single victim/unknown offender(s) | Normal update | False | Male | 19 | Black | Not of Hispanic Origin | Unknown | <NA> | Unknown | Unknown or not reported | Handgun - pistol, revolver, etc | Relationship not determined | Circumstances undetermined | 0 | 0 | FBI | |
41 | Phi-004767 | 2015-01-08 | DACRES | KAREEM | Black | 22 | Male | Philadelphia | PA | 40.012882 | -75.155447 | Open/No arrest | 2015 | January | Thursday | 1088 | 201409009PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2015-01-08 | 2015.0 | January | Thursday | 2912 | 2014 | September | 9 | Murder or Non-negligent manslaughter | Single victim/unknown offender(s) | Normal update | False | Male | 22 | Black | Not of Hispanic Origin | Unknown | <NA> | Unknown | Unknown or not reported | Handgun - pistol, revolver, etc | Relationship not determined | Circumstances undetermined | 0 | 0 | FBI | |
42 | Phi-004768 | 2015-01-08 | FLUELLEN | DARREN | Black | 26 | Male | Philadelphia | PA | 39.946337 | -75.236656 | Open/No arrest | 2015 | January | Thursday | 1088 | 201409010PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2015-01-08 | 2015.0 | January | Thursday | 2912 | 2014 | September | 10 | Murder or Non-negligent manslaughter | Multiple victims/single offender | Normal update | True | Male | 26 | Black | Not of Hispanic Origin | Male | 44 | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Relationship not determined | Circumstances undetermined | 1 | 0 | FBI | |
43 | Phi-004768 | 2015-01-08 | FLUELLEN | DARREN | Black | 26 | Male | Philadelphia | PA | 39.946337 | -75.236656 | Open/No arrest | 2015 | January | Thursday | 1088 | 201409019PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2015-01-08 | 2015.0 | January | Thursday | 2912 | 2014 | September | 19 | Murder or Non-negligent manslaughter | Multiple victims/unknown offender(s) | Normal update | False | Male | 26 | Black | Not of Hispanic Origin | Unknown | <NA> | Unknown | Unknown or not reported | Handgun - pistol, revolver, etc | Relationship not determined | Circumstances undetermined | 1 | 0 | FBI | |
44 | Phi-004889 | 2015-11-30 | CLAIBORNE | MALCOLM | Black | 25 | Male | Philadelphia | PA | 39.989992 | -75.117623 | Open/No arrest | 2015 | November | Monday | 762 | 201509025PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2015-11-30 | 2015.0 | November | Monday | 2586 | 2015 | September | 25 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 25 | Black | Not of Hispanic Origin | Male | 28 | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Other - known to victim | Other arguments | 0 | 0 | FBI | |
45 | Phi-004984 | 2016-07-15 | PETERSON | GERALD | Black | 19 | Male | Philadelphia | PA | 40.004553 | -75.180995 | Open/No arrest | 2016 | July | Friday | 534 | 201602003PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2016-07-15 | 2016.0 | July | Friday | 2358 | 2016 | February | 3 | Murder or Non-negligent manslaughter | Single victim/unknown offender(s) | Normal update | False | Male | 19 | Black | Not of Hispanic Origin | Unknown | <NA> | Unknown | Unknown or not reported | Handgun - pistol, revolver, etc | Relationship not determined | Circumstances undetermined | 0 | 0 | FBI | |
46 | Phi-004986 | 2016-07-18 | WILLS | BREON | Black | 20 | Male | Philadelphia | PA | 39.960402 | -75.223485 | Open/No arrest | 2016 | July | Monday | 531 | 201603012PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2016-07-18 | 2016.0 | July | Monday | 2355 | 2016 | March | 12 | Murder or Non-negligent manslaughter | Single victim/multiple offenders | Normal update | True | Male | 20 | Black | Not of Hispanic Origin | Male | 20 | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Other - known to victim | Other arguments | 0 | 1 | FBI | |
47 | Phi-004997 | 2016-07-28 | BATTE | AARON | Black | 26 | Male | Philadelphia | PA | 40.012867 | -75.153986 | Open/No arrest | 2016 | July | Thursday | 521 | 201605013PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2016-07-28 | 2016.0 | July | Thursday | 2345 | 2016 | May | 13 | Murder or Non-negligent manslaughter | Single victim/unknown offender(s) | Normal update | False | Male | 26 | Black | Not of Hispanic Origin | Unknown | <NA> | Unknown | Unknown or not reported | Handgun - pistol, revolver, etc | Relationship not determined | Circumstances undetermined | 0 | 0 | FBI | |
48 | Phi-005018 | 2016-08-25 | DAVIS | TERRELL | Black | 21 | Male | Philadelphia | PA | 39.997470 | -75.115172 | Open/No arrest | 2016 | August | Thursday | 493 | 201606004PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2016-08-25 | 2016.0 | August | Thursday | 2317 | 2016 | June | 4 | Murder or Non-negligent manslaughter | Single victim/unknown offender(s) | Normal update | False | Male | 21 | Black | Not of Hispanic Origin | Unknown | <NA> | Unknown | Unknown or not reported | Handgun - pistol, revolver, etc | Relationship not determined | Narcotic drug laws | 0 | 0 | FBI | |
49 | Ric-000251 | 2008-01-30 | SCROGGINS | LOUIS WILLIAM | Black | <NA> | Male | Richmond | VA | 37.531644 | -77.418803 | Closed by arrest | 2008 | January | Wednesday | 3623 | 200605002VA12200 | VA12200 | Richmond | Municipal police | Richmond | Richmond | VA | 2008-01-30 | 2008.0 | January | Wednesday | 5447 | 2006 | May | 2 | Murder or Non-negligent manslaughter | Multiple victims/unknown offender(s) | Normal update | False | Male | <NA> | Black | Unknown or not reported | Unknown | <NA> | Unknown | Unknown or not reported | Shotgun | Relationship not determined | Other | 1 | 0 | FBI | |
50 | SF-000083 | 2007-10-03 | SMITH | RAMADAN | Black | 29 | Male | San Francisco | CA | 37.784319 | -122.439755 | Open/No arrest | 2007 | October | Wednesday | 3742 | 200707003CA03801 | CA03801 | San Francisco | Municipal police | San Francisco | San Francisco-Oakland-Fremont | CA | 2007-10-03 | 2007.0 | October | Wednesday | 5566 | 2007 | July | 3 | Murder or Non-negligent manslaughter | Single victim/unknown offender(s) | Adjustment | False | Male | 29 | Black | Not of Hispanic Origin | Unknown | <NA> | Unknown | Unknown or not reported | Firearm, type not stated | Relationship not determined | Juvenile gang killings | 0 | 0 | FBI | |
51 | SF-000514 | 2015-03-09 | GAINES | DONZEL | Black | 19 | Male | San Francisco | CA | 37.712707 | -122.432579 | Closed by arrest | 2015 | March | Monday | 1028 | 201411001CA03801 | CA03801 | San Francisco | Municipal police | San Francisco | San Francisco-Oakland-Fremont | CA | 2015-03-09 | 2015.0 | March | Monday | 2852 | 2014 | November | 1 | Murder or Non-negligent manslaughter | Single victim/unknown offender(s) | Adjustment | False | Male | 19 | Black | Not of Hispanic Origin | Unknown | <NA> | Unknown | Unknown or not reported | Handgun - pistol, revolver, etc | Relationship not determined | Circumstances undetermined | 0 | 0 | FBI | |
52 | StL-001294 | 2010-08-27 | MONEY | JERMAINE | Black | 25 | Male | St. Louis | MO | 38.620397 | -90.251929 | Closed by arrest | 2010 | August | Friday | 2683 | 201006001MO09561 | MO09561 | Pine Lawn | Municipal police | St. Louis | St. Louis | MO | 2010-08-27 | 2010.0 | August | Friday | 4507 | 2010 | June | 1 | Murder or Non-negligent manslaughter | Single victim/multiple offenders | Adjustment | True | Male | 25 | Black | Unknown or not reported | Male | <NA> | Black | Unknown or not reported | Firearm, type not stated | Relationship not determined | Burglary | 0 | 2 | FBI | |
53 | StL-001458 | 2011-09-16 | COLE | DIAMONIQUE | Black | 20 | Female | St. Louis | MO | 38.680476 | -90.219231 | Closed by arrest | 2011 | September | Friday | 2298 | 201105001MO09565 | MO09565 | St. Ann | Municipal police | St. Louis | St. Louis | MO | 2011-09-16 | 2011.0 | September | Friday | 4122 | 2011 | May | 1 | Murder or Non-negligent manslaughter | Single victim/single offender | Adjustment | True | Female | 20 | Black | Unknown or not reported | Male | 22 | Black | Unknown or not reported | Handgun - pistol, revolver, etc | Girlfriend | Other arguments | 0 | 0 | FBI | |
54 | StL-002124 | 2016-03-14 | MONTGOMERY | ANDRE | Black | 21 | Male | St. Louis | MO | 38.664420 | -90.224930 | Open/No arrest | 2016 | March | Monday | 657 | 201510001MO09500 | MO09500 | St. Louis County Police Department | County police | St. Louis | St. Louis | MO | 2016-03-14 | 2016.0 | March | Monday | 2481 | 2015 | October | 1 | Murder or Non-negligent manslaughter | Single victim/single offender | Adjustment | True | Male | 21 | Black | Not of Hispanic Origin | Male | 21 | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Friend | Other arguments | 0 | 0 | FBI | |
55 | StL-002124 | 2016-03-14 | MONTGOMERY | ANDRE | Black | 21 | Male | St. Louis | MO | 38.664420 | -90.224930 | Open/No arrest | 2016 | March | Monday | 657 | 201510004MO09500 | MO09500 | St. Louis County Police Department | County police | St. Louis | St. Louis | MO | 2016-03-14 | 2016.0 | March | Monday | 2481 | 2015 | October | 4 | Murder or Non-negligent manslaughter | Single victim/unknown offender(s) | Adjustment | False | Male | 21 | Black | Not of Hispanic Origin | Unknown | <NA> | Unknown | Unknown or not reported | Handgun - pistol, revolver, etc | Relationship not determined | Circumstances undetermined | 0 | 0 | FBI |
It’s easy to see which column came from which DataFrame based on the case of the first letter of the column name and whether words are separated by spaces or underscores.
There are only 55 records in which the city, state, report filing date, and victim characteristics in the Post dataset correspond to at least one record in the MAP dataset, which is about a thousandth of the total number of cases in the Post dataset.
Additionally, there are some duplicates, indicating that some cases in the Post dataset correspond to multiple cases in the MAP dataset:
"uid"].duplicated().sum() simple_merged_data[
14
len(simple_merged_data["uid"].unique())
42
"uid"].duplicated()] simple_merged_data[simple_merged_data[
uid | reported_date | victim_last | victim_first | victim_race | victim_age | victim_sex | city | state | lat | lon | disposition | year | month | weekday | delta_days | ID | Agency Code | Agency Name | Agency Type | City | MSA | State | Date | Year | Month | Weekday | Delta Days | Occurrence Year | Occurrence Month | Incident | Crime Type | Situation | Action | Solved | Victim Sex | Victim Age | Victim Race | Victim Ethnicity | Offender Sex | Offender Age | Offender Race | Offender Ethnicity | Weapon | Relationship | Circumstance | Sub Circumstance | Victim Count | Offender Count | Source | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
5 | Los-001293 | 2014-12-30 | Unknown | Unknown | Black | 20 | Male | Los Angeles | CA | 33.962700 | -118.311200 | Open/No arrest | 2014 | December | Tuesday | 1097 | 201411009CA01942 | CA01942 | Los Angeles | Municipal police | Los Angeles | Los Angeles-Long Beach | CA | 2014-12-30 | 2014.0 | December | Tuesday | 2921 | 2014 | November | 9 | Murder or Non-negligent manslaughter | Single victim/unknown offender(s) | Adjustment | False | Male | 20 | Black | Not of Hispanic Origin | Unknown | <NA> | Unknown | Unknown or not reported | Handgun - pistol, revolver, etc | Other - known to victim | Juvenile gang killings | 0 | 0 | FBI | |
10 | Phi-000188 | 2015-07-17 | EPPS | DARIUS | Black | 23 | Male | Philadelphia | PA | 39.944246 | -75.226633 | Closed by arrest | 2015 | July | Friday | 898 | 201504005PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2015-07-17 | 2015.0 | July | Friday | 2722 | 2015 | April | 5 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 23 | Black | Not of Hispanic Origin | Male | 20 | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Other - known to victim | Other arguments | 0 | 0 | FBI | |
20 | Phi-001052 | 2008-06-30 | MULLINS | STEVEN | Black | 24 | Male | Philadelphia | PA | 39.989631 | -75.163233 | Closed by arrest | 2008 | June | Monday | 3471 | 200804012PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2008-06-30 | 2008.0 | June | Monday | 5295 | 2008 | April | 12 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 24 | Black | Not of Hispanic Origin | Male | 21 | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Other - known to victim | Other arguments | 0 | 0 | FBI | |
22 | Phi-001424 | 2009-01-06 | BRADFORD | KHALIF | Black | 21 | Male | Philadelphia | PA | 39.959047 | -75.211088 | Closed by arrest | 2009 | January | Tuesday | 3281 | 200810012PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2009-01-06 | 2009.0 | January | Tuesday | 5105 | 2008 | October | 12 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 21 | Black | Not of Hispanic Origin | Male | 20 | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Acquaintance | Other arguments | 0 | 0 | FBI | |
23 | Phi-001424 | 2009-01-06 | BRADFORD | KHALIF | Black | 21 | Male | Philadelphia | PA | 39.959047 | -75.211088 | Closed by arrest | 2009 | January | Tuesday | 3281 | 200810015PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2009-01-06 | 2009.0 | January | Tuesday | 5105 | 2008 | October | 15 | Murder or Non-negligent manslaughter | Single victim/unknown offender(s) | Normal update | False | Male | 21 | Black | Not of Hispanic Origin | Unknown | <NA> | Unknown | Unknown or not reported | Handgun - pistol, revolver, etc | Relationship not determined | Circumstances undetermined | 0 | 0 | FBI | |
28 | Phi-004043 | 2008-06-30 | WESTON | ARI | Black | 19 | Male | Philadelphia | PA | 40.034626 | -75.114981 | Open/No arrest | 2008 | June | Monday | 3471 | 200804021PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2008-06-30 | 2008.0 | June | Monday | 5295 | 2008 | April | 21 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 19 | Black | Not of Hispanic Origin | Male | 26 | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Relationship not determined | Circumstances undetermined | 0 | 0 | FBI | |
29 | Phi-004043 | 2008-06-30 | WESTON | ARI | Black | 19 | Male | Philadelphia | PA | 40.034626 | -75.114981 | Open/No arrest | 2008 | June | Monday | 3471 | 200804022PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2008-06-30 | 2008.0 | June | Monday | 5295 | 2008 | April | 22 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 19 | Black | Not of Hispanic Origin | Male | <NA> | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Relationship not determined | Circumstances undetermined | 0 | 0 | FBI | |
31 | Phi-004148 | 2009-08-17 | MAKADJI | MAMADOU | Black | 22 | Male | Philadelphia | PA | 39.914546 | -75.237125 | Open/No arrest | 2009 | August | Monday | 3058 | 200905007PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2009-08-17 | 2009.0 | August | Monday | 4882 | 2009 | May | 7 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 22 | Black | Not of Hispanic Origin | Male | 18 | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Relationship not determined | Robbery | 0 | 0 | FBI | |
32 | Phi-004148 | 2009-08-17 | MAKADJI | MAMADOU | Black | 22 | Male | Philadelphia | PA | 39.914546 | -75.237125 | Open/No arrest | 2009 | August | Monday | 3058 | 200905017PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2009-08-17 | 2009.0 | August | Monday | 4882 | 2009 | May | 17 | Murder or Non-negligent manslaughter | Single victim/unknown offender(s) | Normal update | False | Male | 22 | Black | Not of Hispanic Origin | Unknown | <NA> | Unknown | Unknown or not reported | Handgun - pistol, revolver, etc | Relationship not determined | Circumstances undetermined | 0 | 0 | FBI | |
35 | Phi-004363 | 2011-08-30 | JACKSON | KHALIL | Black | 22 | Male | Philadelphia | PA | 39.919114 | -75.231403 | Open/No arrest | 2011 | August | Tuesday | 2315 | 201106003PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2011-08-30 | 2011.0 | August | Tuesday | 4139 | 2011 | June | 3 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 22 | Black | Not of Hispanic Origin | Male | <NA> | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Other - known to victim | Other arguments | 0 | 0 | FBI | |
36 | Phi-004363 | 2011-08-30 | JACKSON | KHALIL | Black | 22 | Male | Philadelphia | PA | 39.919114 | -75.231403 | Open/No arrest | 2011 | August | Tuesday | 2315 | 201106022PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2011-08-30 | 2011.0 | August | Tuesday | 4139 | 2011 | June | 22 | Murder or Non-negligent manslaughter | Single victim/single offender | Normal update | True | Male | 22 | Black | Not of Hispanic Origin | Male | <NA> | Black | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Other - known to victim | Other arguments | 0 | 0 | FBI | |
40 | Phi-004550 | 2013-03-04 | BRYANT | SABIR | Black | 19 | Male | Philadelphia | PA | 39.907209 | -75.246154 | Open/No arrest | 2013 | March | Monday | 1763 | 201212018PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2013-03-04 | 2013.0 | March | Monday | 3587 | 2012 | December | 18 | Murder or Non-negligent manslaughter | Single victim/unknown offender(s) | Normal update | False | Male | 19 | Black | Not of Hispanic Origin | Unknown | <NA> | Unknown | Unknown or not reported | Handgun - pistol, revolver, etc | Relationship not determined | Circumstances undetermined | 0 | 0 | FBI | |
43 | Phi-004768 | 2015-01-08 | FLUELLEN | DARREN | Black | 26 | Male | Philadelphia | PA | 39.946337 | -75.236656 | Open/No arrest | 2015 | January | Thursday | 1088 | 201409019PAPEP00 | PAPEP00 | Philadelphia | Municipal police | Philadelphia | Philadelphia-Camden-Wilmington | PA | 2015-01-08 | 2015.0 | January | Thursday | 2912 | 2014 | September | 19 | Murder or Non-negligent manslaughter | Multiple victims/unknown offender(s) | Normal update | False | Male | 26 | Black | Not of Hispanic Origin | Unknown | <NA> | Unknown | Unknown or not reported | Handgun - pistol, revolver, etc | Relationship not determined | Circumstances undetermined | 1 | 0 | FBI | |
55 | StL-002124 | 2016-03-14 | MONTGOMERY | ANDRE | Black | 21 | Male | St. Louis | MO | 38.664420 | -90.224930 | Open/No arrest | 2016 | March | Monday | 657 | 201510004MO09500 | MO09500 | St. Louis County Police Department | County police | St. Louis | St. Louis | MO | 2016-03-14 | 2016.0 | March | Monday | 2481 | 2015 | October | 4 | Murder or Non-negligent manslaughter | Single victim/unknown offender(s) | Adjustment | False | Male | 21 | Black | Not of Hispanic Origin | Unknown | <NA> | Unknown | Unknown or not reported | Handgun - pistol, revolver, etc | Relationship not determined | Circumstances undetermined | 0 | 0 | FBI |
Of the 55 rows in the merged dataset, 42 are unique cases in the Post dataset. Given that we used every possible commonality to merge the two datasets, this means that numerous cases in the Post dataset correspond to more than one case in the MAP dataset. Interestingly, almost all the duplicates are cases in Philadelphia, and all are killings of young Black men, using a handgun.
There is one more thing to note: in the previous version of this project, the outdated MAP dataset I used didn’t have a File Date
column, so the simple_merge()
function merged the datasets using the year and month of the report date in the Post dataset and those of the occurrence date in the MAP dataset, and the resulting merged dataset had some 50,000 rows.
One case that existed in that dataset but does not occur in the simple_merge
d dataset here is the murder of Amanda Douglas in November 2014 [5], which has a Post dataset uid
of Tul-000655
. I want to see why no matches to this case’s parameters showed up in the merged dataset:
"uid"] == "Tul-000655"] simple_merged_data[simple_merged_data[
uid | reported_date | victim_last | victim_first | victim_race | victim_age | victim_sex | city | state | lat | lon | disposition | year | month | weekday | delta_days | ID | Agency Code | Agency Name | Agency Type | City | MSA | State | Date | Year | Month | Weekday | Delta Days | Occurrence Year | Occurrence Month | Incident | Crime Type | Situation | Action | Solved | Victim Sex | Victim Age | Victim Race | Victim Ethnicity | Offender Sex | Offender Age | Offender Race | Offender Ethnicity | Weapon | Relationship | Circumstance | Sub Circumstance | Victim Count | Offender Count | Source |
---|
So, here’s the case record in the Post dataset:
"uid"] == "Tul-000655"] post_data[post_data[
uid | reported_date | victim_last | victim_first | victim_race | victim_age | victim_sex | city | state | lat | lon | disposition | year | month | weekday | delta_days | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
50705 | Tul-000655 | 2014-11-26 | DOUGLAS | AMANDA | White | 26 | Female | Tulsa | OK | 36.100169 | -95.851061 | Closed by arrest | 2014 | November | Wednesday | 1131 |
And here are all possible matches in the MAP dataset, using what we know about the Amanda Douglas case except the report date:
= (
conditions "Occurrence Year"] == 2014)
(map_data[& (map_data["Occurrence Month"] == "November")
& (map_data["Victim Sex"] == "Female")
& (map_data["Victim Race"] == "White")
& (map_data["Victim Age"] == 26)
& (map_data["State"] == "OK")
& (map_data["City"] == "Tulsa")
)
map_data[conditions]
ID | Agency Code | Agency Name | Agency Type | City | MSA | State | Date | Year | Month | Weekday | Delta Days | Occurrence Year | Occurrence Month | Incident | Crime Type | Situation | Action | Solved | Victim Sex | Victim Age | Victim Race | Victim Ethnicity | Offender Sex | Offender Age | Offender Race | Offender Ethnicity | Weapon | Relationship | Circumstance | Sub Circumstance | Victim Count | Offender Count | Source | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
717312 | 201411102OK07205 | OK07205 | Tulsa | Municipal police | Tulsa | Tulsa | OK | 2015-02-02 | 2015.0 | February | Monday | 2887 | 2014 | November | 102 | Murder or Non-negligent manslaughter | Single victim/single offender | Adjustment | True | Female | 26 | White | Not of Hispanic Origin | Male | 3 | White | Not of Hispanic Origin | Handgun - pistol, revolver, etc | Mother | Circumstances undetermined | 0 | 0 | FBI | |
717315 | 201411103OK07205 | OK07205 | Tulsa | Municipal police | Tulsa | Tulsa | OK | 2015-02-02 | 2015.0 | February | Monday | 2887 | 2014 | November | 103 | Murder or Non-negligent manslaughter | Multiple victims/unknown offender(s) | Adjustment | False | Female | 26 | White | Not of Hispanic Origin | Unknown | <NA> | Unknown | Unknown or not reported | Handgun - pistol, revolver, etc | Relationship not determined | Other | 1 | 0 | FBI |
Much like in the previous version, there are two matches: the first is the accidental killing of a 26-year-old woman by her three-year-old son [6], and the second is the murder of Amanda Douglas. Now we can see why neither of these showed up in the merged dataset: the File Date
is different, and corresponds to an Adjustment
, according to the Action
column. Interestingly, Douglas’ murder is still recorded as unsolved in the MAP dataset as of February 2, 2015, despite the man believed to have killed Douglas along with another woman, Jennifer Sanders, having been arrested in January 2015, according to the linked news story [5].
Discussion
This is hardly an in-depth analysis, but looking into the Post and MAP datasets and seeing their overlap was surprisingly informative and raised a few questions.
Incompleteness
The discrepancy between the Post data and the MAP data highlights the fact that even “complete” records need to be regularly updated based on official records, for instance, using the FBI Crime Data Explorer [7]. In a future project, I want to see if I can get data from the CDE to compare with the MAP dataset and/or update outdated information, like the fact that the record of the murder of Amanda Douglas was not updated as closed in the MAP dataset, despite the perpetrator having been arrested in 2015.
In addition, Carl Bialik published an article [8] on FiveThirtyEight in January 2016 describing the inconsistent incident counting and reocrding practices among states and cities in the US, including how shootings are counted, and problems that can arise in the early days of the new year, when organizations report the previous year’s crime totals. This may account for some of the missing file dates in the MAP dataset.
Similarly, a 2020 article [9] written by John Gramlich at the Pew Research Center says that not every agency in the United States contributes homicide data to the FBI, so even the FBI CDE is likely to have incomplete data. This could explain some of the discrepancy between the Post and MAP datasets: the Post data was accessed “by hand,” so to speak, so it may reference sources beyond FBI records, whereas the MAP dataset relies heavily on FBI records for its data.
This is not too relevant at this data cleaning stage, but will become extremely important in stages involving analysis of data.
It appears that the dataset compiled by the Washington Post team is an anomaly; indeed, the GitHub page which hosts the dataset [1] indicates that the data came from a variety of sources, digital and analog. Compiling it all into one dataset must have taken an enormous amount of effort. It does, however, make one wonder what stands in the way of building a robust, consistent system of homicide (or any other crime) data collection, both on national and international levels, and what the social impacts of such a system would be.
Canada
This project (and, indeed, the entire series of projects) focuses on homicide data from the United States rather than Canada, despite the fact that I am a Canadian citizen and not a citizen of the US; this is because no incident-based crime database exists in Canada.
From Uniform Crime Reporting (UCR) survey page [10]:
Statistics Canada is prohibited by law from releasing any data which would divulge information obtained under the Statistics Act that relates to any identifiable person, business or organization without the prior knowledge or the consent in writing of that person, business or organization. Various confidentiality rules are applied to all data that are released or published to prevent the publication or disclosure of any information deemed confidential. If necessary, data are suppressed to prevent direct or residual disclosure of identifiable data.
More specifically, CCJCSS does not release any tables or cross-tabulations that may identify a particular victim or suspect by the specifics of an offence.
In general, Canadian privacy laws (specified in the Privacy Act [11]) strongly restrict the dispersal of information about specific individuals or which can be linked to specific individuals; since criminal incident information can be linked to individuals by means of media coverage and internet searches, as with Amanda Douglas above, my (admittedly limited) understanding is that no such government-maintained criminal incident database can exist in Canada without breaking federal law.
A dataset like the one compiled by The Washington Post team, however, would probably be fine, using information collected from public records like news reports. No such dataset exists that I could find, however, so I stuck with US homicide data.
Future work
As the title implies, this is only the first entry in a small series of projects dealing with homicide data. In future entries, I plan to:
- Perform formal analyses on the Washington Post and Murder Accountability Project datasets using statistical and machine learning methods;
- Evaluate the efficacy of machine learning in the prediction of unknown information, like the missing dates;
- Attempt building a local criminal incident data source using the FBI CDE API; and
- Explore the questions raised here that can’t be answered by analysis of the data using.
Conclusion
This was originally meant to be a rudimentary data analysis project, but it turned out that there was so much to be learned just by comparing two datasets and following the thread of curiosity that it turned into an even more rudimentary project about data source and quality, which I think highlights quite well the fact that data, on its own, is not inherently useful. Indeed, one might go as far as to say that data is only as useful as one’s understanding of how it was collected.
References
I’m still working on a reference and citation format for this series that’s both understandable and easy to format, since it can tedious to format, say, the negative first-line indent of the APA and MLA formats using Markdown (this document is actually a Jupyter Notebook). The projects I’d like to work on should use a reference format like IEEE or AIP, so those are probably what I’ll model the final format on.
- Rich, S., Mellnik, T., Kelly, K., and Lowery, W. How The Post mapped unsolved murders. (2018). Github repository. https://github.com/washingtonpost/data-homicides.
- Lowery, W., Kelly, K., Mellnik, T., and Rich, S. (2018, June 6). Murder with Impunity: Where killings go unsolved. https://www.washingtonpost.com/graphics/2018/investigations/where-murders-go-unsolved/.
- Data and Docs. (2019). Murder Accountability Project. https://www.murderdata.org/p/data-docs.html.
- Murder Accountability Project Data Dictionary for amplified Supplementary Homicide Report. (2019). Murder Accountability Project. https://www.dropbox.com/s/lo6tgo8nnbpqeru/MAPdefinitionsSHR.pdf.
- Police: Tulsa Man Charged With Double Murder Was Hired Killer. (2015, Febraury 19). News On 6. https://www.newson6.com/story/5e3625222f69d76f62048443/police:-tulsa-man-charged-with-double-murder-was-hired-killer.
- Maune, T. (2014, November 24). Tulsa Police: 3-Year-Old Accidentally Shot, Killed Mother. News On 6. https://www.newson6.com/story/5e3629382f69d76f6204ae65/tulsa-police:-3yearold-accidentally-shot-killed-mother.
- Crime Data Explorer. FBI. https://cde.ucr.cjis.gov/LATEST/webapp/#/pages/home.
- Bialik, Carl. How To Make Sense Of Conflicting, Confusing And Misleading Crime Statistics. (2016). FiveThirtyEight. https://fivethirtyeight.com/features/how-to-make-sense-of-conflicting-confusing-and-misleading-crime-statistics/.
- Gramlich, John. What the data says (and doesn’t say) about crime in the United States. (2020). Pew Research Center. https://www.pewresearch.org/short-reads/2020/11/20/facts-about-crime-in-the-u-s/.
- Uniform Crime Reporting Survey (UCR). (2022). Statistics Canada. https://www23.statcan.gc.ca/imdb/p2SV.pl?Function=getSurvey&SDDS=3302#a2.
- Privacy Act, RSC 1985, c P-21.