Member-only story

Measuring Distances in Legal Texts — A Work-In-Progress, Part 2

Justin Chae
3 min readJun 10, 2020

--

Why my first attempt at measuring distances between legal texts with cosine similarity is way off the mark. How I was wrong and what I’m thinking about next in natural language processing (“NLP”).

Photo by Jamie Street on Unsplash

I took a super simple approach to a very complex problem and thought I was quite clever — oh how wrong I was.

Background

In a previous post on math and law in Python, I explored a preliminary finding for a method to measure the distance between legal texts. For the long story, go here. For the short story, the project boils down to whether an algorithm can pick up the difference in meaning between “may” and “shall not.”

In these experiments, I have a known baseline — D.C. has a rule that impacts its lawyers in a way that impacts no other state. But getting an algorithm to detect that is tricky. In terms of text, the difference of a few words and letters is almost insignificant. However, from a legal context, the permission to do something with “may” versus the prohibition against it with “shall not” is substantial.

What I Got Wrong

I created a first mini-experiment to see how the same algorithm works if I change “shall not” to “may” and determined the method I used returned the right answer for the…

--

--

Justin Chae
Justin Chae

Written by Justin Chae

Justin writes about technology, programming, and general interest topics.

No responses yet