python

Crime in Los Angeles from 2020-2023

What's crime like in Los Angeles, CA? I have no clue, but thankfully data.gov has our backs.

Introduction

Data.gov is an incredible resource for datasets published by the government (federal, state, and local). One of their most viewed datasets currently comes from the City of Los Angeles which contains data reflecting incidents of crime within the city dating from January 2020 to June 2023. Living just one county over, and not having much experience working with large datasets (this one clocks in over 748,000 rows), I thought I’d do some data exploration.

This dataset contains 28 attributes. However, I’ll mainly be looking at seven:

DATE OCC – date the crime occurred
AREA NAME – area in which the crime occurred (patrol divisions)
Crm Cd Desc – description of the crime
Premis Desc – description of the premise on which the crime occurred
Weapon Desc – description of the weapon used, if any
LAT – latitude where crime occurred
LON – longitude where crime occurred

Analysis

I’ll begin by looking at how many crimes occurred in each area of Los Angeles.

1# Which area has the most crime
2area_name_counts = data['AREA NAME'].value_counts()
3area_name_counts = area_name_counts.sort_values(ascending = False)
4area_name_counts.plot(kind = 'bar')
5plt.xlabel('Area of Los Angeles')
6plt.ylabel('Number of Crimes')
7plt.title('Number of Crimes per Los Angeles Area\nJan 2020-June 2023')
8plt.xticks(fontsize=8)
9plt.tight_layout()
10plt.show()

Central, 77th Street, and Pacific patrol divisions are the most common places crimes occur while Mission, Hollenbeck, and Foothill have the least crime.

We’ll move on next to the most common types of crimes committed. Here’s the top 25:

1# Which type of crime is the most common.
2crime_description_counts = data['Crm Cd Desc'].value_counts().head(25)
3crime_description_counts = crime_description_counts.sort_values(ascending = False)
4crime_description_counts.plot(kind = 'bar', figsize=(12, 8))
5plt.xlabel('Crime Description')
6plt.ylabel('Number of Instances')
7plt.title('25 Most Common Crime Descriptions in Los Angeles\nJan 2020-June 2023')
8plt.xticks(fontsize=8)
9plt.tight_layout()
10plt.show()

This graph actually isn’t that informative though. Take the fourth (burglary from vehicle) and sixth (burglary) most common crimes. They’re essentially the same thing – theft – just of a slightly different flavor. I think it would be best to group similar crimes together to get a better picture of the most common types of crimes that occur in Los Angeles. For this project, I’m going to group them manually, but in the future I’d love to learn some basic natural language processing to automate this process.

1# Grouping the crimes into broader categories
2grouping_crimes = {
3    'Arson': [r'ARSON'],
4    'Assault': [r'ASSAULT', r'BRANDISH', r'CRIMINAL THREATS', r'AGGRAVATED ASSAULT', r'SIMPLE ASSAULT', 
5                r'OTHER ASSAULT', r'THREATENING PHONE CALLS', r'THROWING', r'STALKING', r'PROWLER', 
6                r'CRIMINAL HOMICIDE', r'MANSLAUGHTER', r'BATTERY', r'LYNCHING'],
7    'Theft': [r'THEFT', r'ROBBERY', r'STOLEN', r'BURGLARY', r'GRAND THEFT', r'PETTY THEFT', r'PICKPOCKET', 
8              r'PURSE SNATCHING', r'ATTEMPT STOLEN', r'SHOPLIFTING'],
9    'Bunco': [r'BUNCO'],
10    'Crimes against Children': [r'CHILD', r'CHLD', r'KIDNAPPING'],
11    'Crimes against Animals': [r'CRUELTY TO ANIMALS', r'BEASTIALITY'],
12    'Fraud': [r'COUNTERFEIT', r'CREDIT CARDS', r'DEFRAUDING'],
13    'Firearm Discharge': [r'DISCHARGE FIREARMS', r'SHOTS FIRED'],
14    'Drugs': [r'DRUGS'],
15    'Driving offenses': [r'RECKELSS DRIVING', r'DRIVING WITHOUT OWNER CONSENT', r'RECKLESS DRIVING'],
16    'Sex crimes': [r'HUMAN TRAFFICKING', r'INCEST', r'INDECENT EXPOSURE', r'LETTERS, LEWD', r'RAPE', r'SEX', 
17                   r'PEEPING TOM', r'SODOMY', r'LEWD CONDUCT', r'ORAL', r'PIMPING'],
18    'Trespassing': [r'TRESPASSING'],
19    'Vandalism': [r'VANDALISM'],
20    'Violation of Court Orders': [r'VIOLATION OF COURT ORDER', r'VIOLATION OF RESTRAINING ORDER', 
21                                  r'VIOLATION OF TEMPORARY'],
22    'Resisting arrest': [r'RESISTING ARREST', r'FAILURE TO YIELD']
23    # 'Other': [r'BIGAMY', r'BOAT', r'BOMB', r'BRIBERY', r'CONSPIRACY', r'CONTEMPT', r'FALSE POLICE', 
24    #           r'DOCUMENT', r'EMBEZZLEMENT', r'EXTORTION', r'FAILURE TO DISPERSE', r'OTHER MESCELLANEOUS', 
25    #           r'OTHER MISCELLANEOUS CRIME', r'UNAUTHORIZED COMPUTER ACCESS', r'FALSE IMPRISONMENT', 
26    #           r'ILLEGAL DUMPING', r'DISRUPT SCHOOL', r'DISTURBING THE PEACE', r'DISHONEST EMPLOYEE']
27}
28
29def map_crime_category(description):
30    description = description.upper()
31    for category, crimes in grouping_crimes.items():
32        for crime in crimes:
33            if re.search(crime, description):
34                return category
35    return 'Other'
36
37data['category of crime'] = data['Crm Cd Desc'].map(map_crime_category)

With the crimes grouped into sixteen categories, let’s make another visual with the most common crime types.

1# Plot crime categories
2crime_categories_broad = data['category of crime'].value_counts()
3crime_categories_broad = crime_categories_broad.sort_values(ascending = False)
4crime_categories_broad.plot(kind = 'bar')
5plt.xlabel('Crime Category')
6plt.ylabel('Number of Instances')
7plt.title('Number of Crimes by Category\nJan 2020-June 2023')
8plt.xticks(fontsize=8)
9plt.tight_layout()
10plt.show()

That’s much better. Now it’s obvious that the most common crime type in Los Angeles is some type of theft. Next, but by a wide margin, is assault and then vandalism.

I think it would be worthwhile to also categorize the crimes into violent and non-violent offenses. I’ll again do this manually, but natural language processing would be a big help to learn in the future.

1# Amount of violent vs non-violent crimes.
2grouping_violent_crime = {
3    'Violent': [r'ARSON', r'ASSAULT', r'BRANDISH', r'CRIMINAL THREATS', r'AGGRAVATED ASSAULT', 
4                r'SIMPLE ASSAULT', r'OTHER ASSAULT', r'THREATENING PHONE CALLS', r'THROWING', r'STALKING', 
5                r'PROWLER', r'DISCHARGE FIREARMS', r'SHOTS FIRED', r'CRIMINAL HOMICIDE', r'MANSLAUGHTER', 
6                r'BATTERY', r'LYNCHING', r'CHILD', r'CHLD', r'KIDNAPPING', r'HUMAN TRAFFICKING', r'INCEST', 
7                r'INDECENT EXPOSURE', r'LETTERS, LEWD', r'RAPE', r'SEX', r'PEEPING TOM', r'SODOMY', 
8                r'LEWD CONDUCT', r'ORAL', r'PIMPING',r'CRUELTY TO ANIMALS', r'BEASTIALITY']
9}
10
11def map_crime_type_category(description):
12    description = description.upper()
13    for category, crimes in grouping_violent_crime.items():
14        for crime in crimes:
15            if re.search(crime, description):
16                return category
17    return 'Non-Violent'
18
19data['crime_type'] = data['Crm Cd Desc'].map(map_crime_type_category)

1# Plot crime categories
2crime_type = data['crime_type'].value_counts()
3crime_type= crime_type.sort_index()
4crime_type.plot(kind = 'bar')
5plt.xlabel('Type of Crime')
6plt.ylabel('Number of Instances')
7plt.title('Number of Non-Violent and Violent Crimes in Los Angeles\nJan 2020-June 2023')
8plt.xticks(fontsize=8)
9plt.tight_layout()
10plt.show()

Non-violent crimes far outpace violent ones.

I’m also curious about the breakdown of non-violent and violent crimes by area.

1# Violent and non-violent crimes by area
2crime_type_by_area = data.groupby([data['AREA NAME'], data['crime_type']]).size().unstack(fill_value=0)
3crime_type_by_area = crime_type_by_area.sort_index()  
4crime_type_by_area.plot(kind='line')
5plt.xlabel('Area of Los Angeles')
6plt.ylabel('Number of Instances')
7plt.title('Number of Violent and Non-Violent Crimes by Area\nJan 2020-June 2023')
8plt.legend(title='Crime Type')
9plt.xticks(range(len(crime_type_by_area.index)), crime_type_by_area.index, fontsize=8, rotation=90)
10plt.tight_layout()
11plt.show()

The three most violent places in Los Angeles are 77th Street, Central, and Southeast.

Next, where did all of these crimes take place the most frequently? In a house? At work? On the street?

1# What were the most common premise description locations?
2premise_description_count = data['Premis Desc'].value_counts().head(30)
3premise_description_count = premise_description_count.sort_values(ascending = False)
4premise_description_count.plot(kind = 'bar', figsize=(12, 8))
5plt.xlabel('Premise Location')
6plt.ylabel('Number of Instances')
7plt.title('30 Most Common Premise Locations for Crimes in Los Angeles\nJan 2020-June 2023')
8plt.xticks(fontsize=8)
9plt.tight_layout()
10plt.show()

I’d love to group these as well, but there’s so many different locations it would take me manually doing it a long time. When I learn some natural language processing, this will be the first thing I come back to!

What about the five most common weapons used?

1# Most common weapons used?
2weapon_used = data['Weapon Desc'].value_counts().head(5)
3weapon_used = weapon_used.sort_values(ascending = False)
4weapon_used.plot(kind = 'bar', figsize=(12, 8))
5plt.xlabel('Weapon Involved')
6plt.ylabel('Number of Instances')
7plt.title('5 Most Common Weapons Used in Los Angeles\nJan 2020-June 2023')
8plt.xticks(fontsize=8)
9plt.tight_layout()
10plt.show()

Here we run into the same problem as the type of crime committed – a handgun and a semi-automatic pistol are both versions of a gun. I’ll group the weapon types into broader buckets and see what we get.

1# Grouping the weapons into broader categories
2grouping_weapons = {
3    'Airsoft/BB Gun': [r'AIR PISTOL'],
4    'Gun': [r'ANTIQUE FIREARM', r'ASSAULT WEAPON', r'AUTOMATIC WEAPON', r'HAND GUN', r'HECKLER', r'M-14', 
5            r'M1-1', r'MAC', r'OTHER FIREARM', r'RELIC FIREARM', r'REVOLVER', r'RIFLE', r'SHOTGUN', 
6            r'SEMI-AUTOMATIC', r'SIMULATED GUN', r'PISTOL', r'STUN GUN', r'TOY GUN', r'UNK TYPE', 
7            r'UNKNOWN FIREARM', r'UZI'],
8    'Knife': [r'BOWIE KNIFE', r'CLEAVER', r'DIRK', r'FOLDING KNIFE', r'KNIFE', r'OTHER CUTTING', 
9              r'SWITCH BLADE', r'SWORD', r'UNKNOWN TYPE CUTTING INSTRUMENT'],
10    'Tools': [r'AXE', r'BELT', r'BLUNT', r'BOARD', r'BOW AND ARROW', r'BRASS KNUCKLES', r'CLUB', r'BAT', 
11              r'CONCRETE BLOCK', r'GLASS', r'HAMMER', r'ICE PICK', r'MACE', r'MACHETE', r'MARTIAL ARTS', 
12              r'OTHER KNIFE', r'PIPE', r'RAZOR', r'ROCK', r'ROPE', r'SCISSORS', r'SCREWDRIVER'],
13    'Bomb Threat': [r'BOMB', r'EXPLOSIVE', r'STICK', r'TIRE IRON'],
14    'Person': [r'STRONG-ARM', r'PHYSICAL PRESENCE'],
15    'Fire': [r'FIRE'],
16    'Vehicle': [r'VEHICLE'],
17    'Verbal Threat': [r'VERBAL THREAT'],
18    'Other': [r'BLACKJACK', r'CHEMICAL', r'DEMAND NOTE', r'DOG', r'FIXED OBJECT', r'SYRINGE', r'UNKNOWN WEAPON']
19}
20
21def map_weapon_category(description):
22    if isinstance(description, str):
23        description = description.upper()
24        for category, crimes in grouping_weapons.items():
25            for crime in crimes:
26                if re.search(crime, description):
27                    return category
28    return 'No Weapon Used'
29
30data['category of weapon'] = data['Weapon Desc'].map(map_weapon_category)

1# Plot weapon categories
2weapon_category = data['category of weapon'].value_counts()
3weapon_category = weapon_category.sort_values(ascending = False)
4weapon_category.plot(kind = 'bar')
5plt.xlabel('Weapon Category')
6plt.ylabel('Number of Instances')
7plt.title('Number of Crimes by Weapon Category in Los Angeles\nJan 2020-June 2023')
8plt.xticks(fontsize=8)
9plt.tight_layout()
10plt.show()

That’s much more informative. Most people simply don’t use a weapon. And this makes sense considering that we learned the most common crime type was some form of theft.

With a good general overview of the types of crimes committed in Los Angeles since 2020, let’s do something more cross-sectional. How has crime changed over time in Los Angeles?

1# Let's see how crime has changed over the years 2020-2023
2data['date'] = pd.to_datetime(data['DATE OCC'], format = '%m/%d/%Y %I:%M:%S %p')
3data['year'] = data['date'].dt.year
4
5crimes_per_year = data['year'].value_counts()
6crimes_per_year = crimes_per_year.sort_index()  
7print(crimes_per_year)
8crimes_per_year.plot(kind = 'bar')
9plt.xlabel('Year')
10plt.ylabel('Number of Crimes')
11plt.title('Number of Crimes by Year in Los Angeles\nJan 2020-June 2023')
12plt.xticks(fontsize=8)
13plt.tight_layout()
14plt.show()

Crime has steadily increased year over year, except for in 2023 but that’s only because we have half a year’s data; however, at the pace Los Angeles is on in 2023, we could expect it to land somewhere in between 2021 and 2022 levels.

I’m also curious if this upward trend in crime is simply a return to their pre-COVID levels. With so many people at home due to COVID precautions in 2020 and now their easing, there could simply be more opportunity for people to commit crimes out of the house. Unfortunately we can’t really tell if this is the case or not from this dataset because we only have data going back to 2020.

I’m also curious about the trends in violent and non-violent crimes over the years.

1# What about by violent/non-violent
2crime_type_by_year = data.groupby([data['year'], data['crime_type']]).size().unstack(fill_value=0)
3crime_type_by_year = crime_type_by_year.sort_index()  
4print(crime_type_by_year)
5ax = crime_type_by_year.plot(kind='line')
6ax.xaxis.set_major_locator(ticker.MultipleLocator(base = 1))
7plt.xlabel('Year')
8plt.ylabel('Number of Crimes')
9plt.title('Number of Violent and Non-Violent Crimes by Year in Los Angeles\nJan 2020-June 2023')
10plt.legend(title='Crime Type')
11plt.xticks(rotation=45)
12plt.tight_layout()
13plt.show()

In a world that has seemingly become more violent, that is actually not the case in Los Angeles. The reason for the increase in crime is mostly due to the rise of non-violent crime. In fact, if the rates hold over the course of this year, 2023 will be the least violent year for the data we have on record in this dataset.

Mapping the Crime Locations

Lastly, and mainly just for fun, we can plot every crime on a map using their latitudes and longitudes. Because this dataset is so large I’m only going to plot one out of every 1,000 crimes. Hopefully I’ll save my computer from spontaneously combusting.

1# Plotting locations of crimes on map.
2map_center = [34.0522, -118.2437]  # Latitude and longitude of Los Angeles
3map_zoom = 10  
4crime_map = folium.Map(location=map_center, zoom_start=map_zoom)
5
6for index, row in data.iterrows():
7    lat = row['LAT']
8    lon = row['LON']
9    crime_description = row['Crm Cd Desc']
10    date = row['Date Rptd']
11    if index % 1000 == 0:
12        popup_content = f"Crime Description: {crime_description}<br>Date: {date}"
13        folium.Marker(location=[lat, lon], popup=popup_content).add_to(crime_map)
14
15# Saving the map as .html
16crime_map.save('crime_map.html')

You can also click on each pinned location to get the crime description and date/time it occurred.

Conclusion

I’ve only scratched the surface of this dataset. I’d love to come back and do some more digging especially with natural language processing like I mentioned before and maybe even some machine learning to make some models.

Overall, I gained some valuable experience working with a larger dataset, learning to map attributes to broader categories, and making data visualizations with matplotlib.

Learning Take-Aways

APIs will be important. While I downloaded this full dataset, it's also available via an API and is updated every week. It would be a great skillset to be able to access the data this way to create dashboards to show recent trends. I'll make sure to center my next project around learning this process.

Subject matter experts are a valuable resource. While I gained a lot of insight into crime in Los Angeles from this dataset, I'm still left wondering about the whys and hows. Like, why had crime increased from 2020-2022, and was this increase mostly due to an more non-violent crimes? How was this related to COVID, or not? Maybe new policing policies? Data only goes so far until you need to talk to someone who knows about the general topic.