Claims Dataset: Domestic Violence Is the Leading Cause of Injury to Women

Claims of the general form “domestic violence is the largest cause of injury to women” published in news sources, law journals, congressional documents, court opinions, and web pages, from 1986 to 2010.

The Domestic Violence Claims Dataset (DVCD) records instances of claims of the type “domestic violence is the leading cause of injury to women.” More technically, the DVCD is a dataset of instances of V, where V is a text string with the ordered form ({leading} or {largest} or {greatest}) and {cause} and {injury} and {women}, with the tokens occurring in near proximity (within ten words or the same sentence). The dataset includes V instances found in five communicative fields: news articles, law journals, web texts, Congressional records, and federal and state court opinions.

For fields other than web texts, the dataset includes counts of V instances by year from 1985 to the present. The news article V counts do not cover the whole universe of U.S. news publications, but they do cover a large sample of newspapers (including highly prominent newspapers) with relatively constant coverage across years, at least after the mid-1990s. The counts for law journals, Congressional records (transcripts of Congressional testimony and publication in the Congressional Record), and for federal and state court opinions attempt to capture all such instances in the relevant field. Each V instance is recorded in the dataset as a separate record. For news articles, V instances are individually recorded only from 1996 to 2005.

The dataset of web texts (called web 2005) are V instances found with general web search tools in early 2006. Since web texts are not easily dated, the web V instances are not associated with a date of posting.

A specific V instance is identified in the DVDC with a communicative field (cfield) and a record number for that communicative field (cfrn). For example, “law journals.45” is the V instance numbered 45 is the law-journal field. Where possible, enough information in provided with each V instance to support a standard bibliographic citation.

Corrections, improvements, updates, and extensions of the DVCD are welcomed. Because the DVCD attempts to be a representative sample of different communicative fields, any additions must come from a systematic search within a well-defined corpus. Instances apart from a systematic search of a set of texts don’t serve the analytical purpose of the DVCD. They are included only in the miscellaneous section if they are notable, e.g. historically early or particularly bizarre.

Dataset sheets:

  • summary stats: instances through time by communicative field (news sources, law journals, Congressional documents, courts) and instance variants (age specification, frightening comparators, etc.)
  • news summary: counts of instances by year
  • law journals summary: counts of instances by year
  • Congress summary: counts of instances by year
  • courts summary: counts of instances by year
  • news 1996-2005: details of each instance as structured data
  • law journals: details of each instance as structured data
  • Congress: details of each instance as structured data
  • courts: details of each instance as structured data
  • web: details of each instance as structured data
  • misc: details of each instance as structured data

