Revisiting a Pioneering Legal AI Project
How We Modeled the Law Through Court Reasoning to Retrieve Relevant Laws
tl;dr: In 2019, at Stacks, we trained a first-of-its-kind specialized machine learning model to learn the law based on how it is applied under various conditions by the court. The patent application was widely cited for the future work of other prestigious players in the field, such as Thomson Reuters, RELX (LexisNexis), Microsoft, Baidu, Adobe, NEC, Huawei, and many more. This work was founded on two key complementary pillars: (1) the common law principle that courts set precedent and define what a law is, says and means and (2) the Distributional Hypothesis from Natural Language Processing (NLP) and Machine Learning (ML), which states, "you shall know a word by the company it keeps." By extracting the ways laws have been interpreted and applied by courts, we built a training dataset and trained a model to learn legal meaning from the court's perspective. When a user queries the trained model with a legal issue, the model identifies specific legal patterns in the issue and returns the most relevant laws accordingly, again from the perspective of the court.
The Business Need
In the United States and other common-law countries, prior judicial decisions set a precedent for future issues and cases. A challenge for legal practitioners is to find relevant existing case law, rules, and statutes applicable to their circumstances. Due to the sheer amount of case laws, rules, and statutes, however, it is practically impossible for a person to know all the laws, or go through all published judicial decisions to identify what laws were applied.
The Old Way
Traditionally, legal research relied on databases containing all prior cases, rules, and statutes. When a user searched for relevant laws pertaining to a situation, the system performed both semantic and lexical searches across the database and retrieved the most relevant entries, often after a reranking step. The resulting outputs were case laws, statutes, or rules that merely contained the user's query terms.
The Common Law Meets Machine Learning: The Two Pillars of Our Work
Pillar 1: The Common Law
In common law countries, such as the United States, England, Ireland, Australia, and Singapore, court rulings establish legal precedent. Moreover, even when dealing with rules, codes, and statutes, it is the court system that interprets and explains how these written laws are to be applied. In essence, courts do not merely apply the law; they define its meaning and scope through their decisions. To learn more about common-law, please read this.
Pillar 2: The Distributional Hypothesis
This well-known and powerful hypothesis in Natural Language Processing (NLP) and Machine Learning (ML) posits that the context in which a word appears reveals much about its meaning. It provides a data-driven approach to generating numerical and semantic representations of words based on the distribution of their contexts. You can read more about this hypothesis here.
Our New Way: Where Two Pillars Come Together
Our approach brings both pillars into a unified framework:
- Collect all judicial opinions written by the courts.
- Identify every case law, rule, code, and statute cited within those opinions as the foundation of each ruling.
- For each cited law, extract the specific context in which it was applied within the ruling. Using these (context, law) pairs, train a machine learning model that learns the relationship between the context and the law, so that, given a new context, the model can predict which law applies.
This methodology elegantly merges the two pillars: it uses context to understand the meaning of laws (as in the Distributional Hypothesis) and leverages court opinions as authoritative interpretations of those laws (as in the Common Law principle).
Example
Consider the landmark U.S. Supreme Court case Miranda v. Arizona, 384 U.S. 436 (1966). This decision established that statements made by a defendant during custodial interrogation are admissible at trial only if the prosecution can demonstrate that the defendant was informed of their right to consult with an attorney beforehand—and that the defendant not only understood these rights but also voluntarily waived them[1].
Below are a few examples of how courts across the United States have applied Miranda v. Arizona:
- "A defendant's statements during custodial interrogation are presumptively compelled in violation of the Fifth Amendment and are inadmissible unless the Government shows that law enforcement officers informed the defendant of his rights pursuant to Miranda v. Arizona, 384 U.S. 436 (1966), and obtained a waiver of those rights."
- "The district court also properly denied McElveen's motion to suppress statements made to police because McElveen had waived his rights under Miranda v. Arizona, 384 U.S. 436 (1966)."
- "Statements obtained from a defendant during custodial interrogation are admissible only if the Government shows that law enforcement officers adequately informed the defendant of his rights under Miranda v. Arizona, 384 U.S. 436 (1966), and obtained a waiver of those rights."
- "At the start of the interview, the officers informed Henley of his rights under Miranda v. Arizona, 384 U.S. 436 (1966)."
Now let's create the following pairs of the training dataset:
| Input (x) | Output (y) |
|---|---|
| A defendant's statements during custodial interrogation are presumptively compelled in violation of the Fifth Amendment and are inadmissible unless the Government shows that law enforcement officers informed the defendant of his rights pursuant to Miranda v. Arizona, 384 U.S. 436 (1966), and obtained a waiver of those rights. | Miranda v. Arizona, 384 U.S. 436 (1966). |
| The district court also properly denied McElveen's motion to suppress statements made to police because McElveen had waived his rights under Miranda v. Arizona, 384 U.S. 436 (1966). | Miranda v. Arizona, 384 U.S. 436 (1966). |
| Statements obtained from a defendant during custodial interrogation are admissible only if the Government shows that law enforcement officers adequately informed the defendant of his rights under Miranda v. Arizona, 384 U.S. 436 (1966), and obtained a waiver of those rights. | Miranda v. Arizona, 384 U.S. 436 (1966). |
| At the start of the interview, the officers informed Henley of his rights under Miranda v. Arizona, 384 U.S. 436 (1966). | Miranda v. Arizona, 384 U.S. 436 (1966). |
As shown, we provide the context in which the Miranda case law is applied, while masking its direct reference within that context. This serves as input to the machine learning model, alongside similar contexts for all other laws. Through training, the model learns the specific contextual patterns that signal how different laws are applied and predict the correct relevant laws for those contexts. Once trained, the model can function as a powerful legal research tool: when a user provides their case, the model predicts the most relevant laws based on the perspective and reasoning patterns of the courts.
- There is no longer a need to search through a database. Instead, the model directly identifies relevant laws based on the legal and factual patterns present in the user's query. The selection and interpretation of laws occur from the court's perspective. In effect, we are modeling the court's reasoning process.
- One potential limitation of this method is that it is confined to case laws, rules, and statutes that have been previously cited by courts. If a particular law has never been applied or discussed in a judicial opinion, the model will lack the training data necessary to learn its application or scope.
- However, most fundamental and influential laws are applied frequently by the courts, meaning there is usually sufficient contextual data for the model to learn from and accurately represent important legal principles.
You can review the entire patent application here. In a future more technical post, we will get into more technical details in terms of ML and the techniques we used.
[1] It is impractical to create and maintain a formal, comprehensive definition for every case decision or to distill each statute into a simple definition stored in a lookup table mapping laws to keywords or issues. The legal landscape is vast and dynamic: there are too many laws, each can give rise to multiple rules, new laws are continually introduced, and—most importantly—the interpretation and precedent established by any given law can evolve over time.