eDiscovery Trends: TREC Study Finds that Technology Assisted Review is More Cost Effective
July 16, 2012
As reported in Law Technology News (Technology-Assisted Review Boosted in TREC 2011 Results by Evan Koblentz), the Text Retrieval Conference (TREC) Legal Track, a government sponsored project designed to assess the ability of information retrieval techniques to meet the needs of the legal profession, has released its 2011 study results (after several delays). The overview of the 2011 TREC Legal Track can be found here.
The report concludes the following: “From 2008 through 2011, the results show that the technology-assisted review efforts of several participants achieve recall scores that are about as high as might reasonably be measured using current evaluation methodologies. These efforts require human review of only a fraction of the entire collection, with the consequence that they are far more cost-effective than manual review.”
However, the report also notes that “There is still plenty of room for improvement in the efficiency and effectiveness of technology-assisted review efforts, and, in particular, the accuracy of intra-review recall estimation tools, so as to support a reasonable decision that 'enough is enough' and to declare the review complete. Commensurate with improvements in review efficiency and effectiveness is the need for improved external evaluation methodologies that address the limitations of those used in the TREC Legal Track and similar efforts.”
Other notable tidbits from the study and article:
- Ten organizations participated in the 2011 study, including universities from diverse locations such as Beijing and Melbourne and vendors including OpenText and Recommind;
- Participants were required to rank the entire corpus of 685,592 documents by their estimate of the probability of responsiveness to each of three topics, and also to provide a quantitative estimate of that probability;
- The document collection used was derived from the EDRM Enron Data Set;
- The learning task had three distinct topics, each representing a distinct request for production. A total of 16,999 documents was selected – about 5,600 per topic – to form the “gold standard” for comparing the document collection;
- OpenText had the top number of documents reviewed compared to recall percentage in the first topic, the University of Waterloo led the second, and Recommind placed best in the third;
- One of the participants has been barred from future participation in TREC – “It is inappropriate –- and forbidden by the TREC participation agreement –- to claim that the results presented here show that one participant’s system or approach is generally better than another’s. It is also inappropriate to compare the results of TREC 2011 with the results of past TREC Legal Track exercises, as the test conditions as well as the particular techniques and tools employed by the participating teams are not directly comparable. One TREC 2011 Legal Track participant was barred from future participation in TREC for advertising such invalid comparisons”. According to the LTN article, the barred participant was Recommind.
For more information, check out the links to the article and the study above. TREC previously announced that there would be no 2012 study and is targeting obtaining a new data set for 2013.
So, what do you think? Are you surprised by the results or are they expected? Please share any comments you might have or if you’d like to know more about a particular topic.
Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.