Member-only story

Measuring Distances in Legal Texts — A Work-In-Progress, Part 2

3 min readJun 10, 2020

Why my first attempt at measuring distances between legal texts with cosine similarity is way off the mark. How I was wrong and what I’m thinking about next in natural language processing (“NLP”).

I took a super simple approach to a very complex problem and thought I was quite clever — oh how wrong I was.

Background

In a previous post on math and law in Python, I explored a preliminary finding for a method to measure the distance between legal texts. For the long story, go here. For the short story, the project boils down to whether an algorithm can pick up the difference in meaning between “may” and “shall not.”

In these experiments, I have a known baseline — D.C. has a rule that impacts its lawyers in a way that impacts no other state. But getting an algorithm to detect that is tricky. In terms of text, the difference of a few words and letters is almost insignificant. However, from a legal context, the permission to do something with “may” versus the prohibition against it with “shall not” is substantial.

What I Got Wrong

I created a first mini-experiment to see how the same algorithm works if I change “shall not” to “may” and determined the method I used returned the right answer for the…

Measuring Distances in Legal Texts — A Work-In-Progress, Part 2

Background

What I Got Wrong

Written by Justin Chae

No responses yet