U.S. flag

An official website of the United States government, Department of Justice.

NCJRS Virtual Library

The Virtual Library houses over 235,000 criminal justice resources, including all known OJP works.
Click here to search the NCJRS Virtual Library

Software System for Information Extraction in Criminal Justice Information Systems

NCJ Number
217681
Author(s)
Tianhao Wu; Stephen V. Zanias; William M. Pottenger
Date Published
2006
Length
178 pages
Annotation
This federally supported report provides extensive description and background on information extraction (IE) and reviews several commercial IE systems.
Abstract
The purpose of this project was to build an information extraction system that automatically extracts features from textual data commonly used by law enforcement agencies. Such valuable information, highly useful in criminal investigations, is often not stored in a database in relational form. This project’s technology is capable of automatically extracting such information from the source text and automatically entering the information into a fielded, relational database. The extracted information can thus be readily retrieved and compared with other database records using modern computer-based information retrieval systems. The technique used significantly shortens the time needed to train an information extraction system of this nature. This approach enables the extraction of such features for use in everyday search and retrieval applications such as suspect identification. This system will provide input to advanced text mining algorithms for pattern detection. Such algorithms can be used, for example, to map modus operandi to physical descriptions of criminal suspects. Based on information extraction technology, Leigh has developed a software system name the BPD_IE System (Bethlehem Police Department Information Extraction System) that automatically extracts key items of information from narrative textual data and links unsolved criminal cases to solved cases, providing investigators with valuable leads. The technology automatically obtains modus operandi and physical descriptions from these textual documents. This data is then stored in fielded, relational databases which can be easily searched. Figures, references and appendixes A-B