Download PDFOpen PDF in browser

Challenges in Machine Understanding of Legal Text

EasyChair Preprint no. 3506, version 2

Versions: 12history
10 pagesDate: May 30, 2020


The development of good models for representing legal text in order to make them suitable for machine-understanding and of models that incorporate human legal expertise into automatic tools, still pose great difficulties. In this research, we tackled the specific task of (a) creating a structured body of court judgments by annotating with key markup, legal citations and legal terms and (b) the problem of classifying court judgments according to the specific legal points. We document the creation of a corpus of Malawi criminal judgments (MWCC) and highlight opportunities and challenges in constructing a machine understanding of this text. We developed a pipeline which takes scanned images of criminal court judgments and creates structured documents in TEI format containing markups such as case name, case number, parties, coram and annotations of references to laws and other court cases which can be hyperlinked. We discuss the possibility of using these annotations and the International Classification for Crime Statistics to build an ontology for criminal cases useful for topic discovery and classification. The tools we used are Sketchengine, Spacy, Scikit-learn and Gensim.

Keyphrases: Annotations, Case Citations, Case Metadata, Classification of Crime, corpus, Criminal Judgments, ICCS, Law Citations, legal text, machine understanding, malawi law, spaCy, topic classification, topic extraction

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
  author = {Amelia Taylor and Eva Mfutso-Bengo},
  title = {Challenges in Machine Understanding of Legal Text},
  howpublished = {EasyChair Preprint no. 3506},

  year = {EasyChair, 2020}}
Download PDFOpen PDF in browser