This article reports the results of an effort to enable computers to segment U.S. adjudicatory decisions into sentences.
The project created a data set of 80 court decisions from four different domains. Findings indicate that legal decisions are more challenging for existing sentence boundary detection systems than for non-legal texts. Existing sentence boundary detection systems are based on a number of assumptions that do not hold for legal texts; hence their performance is impaired. The project indicates that a general statistical sequence labeling model is capable of learning the definition more efficiently. The project trained a number of conditional random fields models that outperform the traditional sentence boundary detection systems when applied to adjudicatory decisions. (publisher abstract modified)
Downloads
Similar Publications
- Discoveries From the Forensic Anthropology Data Base: Modern American Skeletal Change & the Case of Amelia Earhart
- Random Forest Processing of Direct Analysis in Real-Time Mass Spectrometric Data Enables Species Identification of Psychoactive Plants From Their Headspace Chemical Signatures
- GC-MS Analysis of Acylated Derivatives of the Side Chain Regioisomers of 4-Methoxy-3-methyl-phenethylamines Related to Methylenedioxymethamphetamine