www.cloudninediscovery.com

Subscription Center

Sign up to receive eDiscovery Daily's articles via email or add the RSS feed to your newsreader of choice.

  • RSS Feed

Library

Browse eDiscovery Daily Blog

About the Bloggers

Brad Jenkins

Brad Jenkins, President and CEO of CloudNine Discovery, has over 20 years of experience leading customer focused companies in the litigation support arena. Brad has authored many articles on litigation support issues, and has spoken before national audiences on document management practices and solutions.

Doug Austin

Doug Austin, Professional Services Manager for CloudNine Discovery, has over 20 years experience providing legal technology consulting and technical project management services to numerous commercial and government clients. Doug has also authored several articles on eDiscovery best practices.

Jane Gennarelli

Jane Gennarelli is a principal of Magellan’s Law Corporation and has been assisting litigators in effectively handling discovery materials for over 30 years. She authored the company’s Best Practices in a Box™ content product and assists firms in applying technology to document handling tasks. She is a known expert and often does webinars and presentations for litigation support professionals around the country. Jane can be reached by email at jane@litigationbestpractices.com.

eDiscovery Searching: Proximity, Not Absence, Makes the Heart Grow Fonder

January 14, 2011

By Doug Austin

 

Recently, I assisted a large corporate client where there were several searches conducted across the company’s enterprise-wide document management systems (DMS) for ESI potentially responsive to the litigation.  Some of the individual searches on these systems retrieved over 200,000 files by themselves!

DMS systems are great for what they are intended to do – provide a storage archive for documents generated within the organization, version tracking of those documents and enable individuals to locate specific documents for reference or modification (among other things).  However, few of them are developed with litigation retrieval in mind.  Sure, they have search capabilities, but it can sometimes be like using a sledgehammer to hammer a thumbtack into the wall – advanced features to increase the precision of those searches may often be lacking.

Let’s say in an oil company you’re looking for documents related to “oil rights” (such as “oil rights”, “oil drilling rights”, “oil production rights”, etc.).  You could perform phrase searches, but any variations that you didn’t think of would be missed (e.g., “rights to drill for oil”, etc.).  You could perform an AND search (i.e., “oil” AND “rights”), and that could very well retrieve all of the files related to “oil rights”, but it would also retrieve a lot of files where “oil” and “rights” appear, but have nothing to do with each other.  A search for “oil” AND “rights” in an oil company’s DMS systems may retrieve every published and copyrighted document in the systems mentioning the word “oil”.  Why?  Because almost every published and copyrighted document will have the phrase “All Rights Reserved” in the document.

That’s an example of the type of issue we were encountering with some of those searches that yielded 200,000 files with hits.  And, that’s where proximity searching comes in.  Proximity searching is simply looking for two or more words that appear close to each other in the document (e.g., “oil within 5 words of rights”) – the search will only retrieve the file if those words are as close as specified to each other, in either order.  Proximity searching helped us reduce that collection to a more manageable number for review, even though the enterprise-wide document management system didn’t have a proximity search feature.

How?  We wound up taking a two-step approach to get the collection to a more likely responsive set.  First, we did the “AND” search in the DMS system, understanding that we would retrieve a large number of files, and exported those results.  After indexing them with a first pass review tool that has more precise search alternatives (at Trial Solutions, we use FirstPass™, powered by Venio FPR™, for first pass review), we performed a second search on the set using proximity searching to limit the result set to only files where the terms were near each other.  Then, tested the results and revised where necessary to retrieve a result set that maximized both recall and precision.

The result?  We were able to reduce an initial result set of 200,000 files to just over 5,000 likely responsive files by applying the proximity search to the first result set.  And, we probably saved $50,000 to $100,000 in review costson a single search.

I also often use proximity searches as alternatives to phrase searches to broaden the recall of those searches to identify additional potentially responsive hits.  For example, a search for “Doug Austin” doesn’t retrieve “Austin, Doug” and a search for “Dye 127” doesn’t retrieve “Dye #127”.  One character difference is all it takes for a phrase search to miss a potentially responsive file.  With proximity searching, you can look for these terms close to each other and catch those variations.

So, what do you think?  Do you use proximity searching in your culling for review?  Please share any comments you might have or if you’d like to know more about a particular topic.

http://www.cloudninediscovery.com/ondemand/free-software-trial.aspx

Comments

  • January 25, 2011 Doug Austin

    Boolean searches are searches that use AND, OR or NOT in their search syntax. For example, if you want to find all documents that contain the words bankruptcy AND trustee, that's a boolean search that will find any document containing BOTH words. If you want to find all documents that contain the words bankruptcy OR trustee, that's a boolean search that will find any document containing EITHER word. And, boolean searches can be "nested" using parentheses, for example: (letter OR memo) AND bankruptcy will find documents containing either letter or memo, but only if they also contain bankruptcy. Finally, if you want to find all documents that contain "bankruptcy letter", that's a phrase search (both words have to appear in exactly that order with nothing in between them). All popular retrieval applications will support boolean retrieval.

    An AND search returns a narrower set of hits than an OR search, but sometimes an AND search can still return a broader result set than desired (as is the case with oil AND rights above, which retrieved a number of documents where the two words had nothing to do with each other). That's where proximity searching comes in -- you can further narrow the results by not only requiring that both words appear in the document, but also that they appear near each other (e.g., bankruptcy WITHIN 5 WORDS OF trustee" will retrieve documents with both words, but only if those words appear within 5 words of each other). Proximity searching is an advanced searching technique to conduct a more precise search than an AND search, with more recall than a phrase search (e.g., bankruptcy WITHIN 5 WORDS OF trustee will retrieve more documents than "bankruptcy trustee" because there are more variations that match, such as "trustee for this bankruptcy").

    From broadest to narrowest for search results, it goes like this:
    bankruptcy OR trustee -- broadest search as either word can appear
    bankruptcy AND trustee -- narrower search as both words must appear
    bankruptcy WITHIN X WORDS OF trustee -- narrower than AND as both words must be close to each other in the document
    "bankruptcy trustee" -- narrowest, as both words must appear in exactly that order with no words or characters between

    Hope that helps.

  • January 25, 2011 vaibhav

    This is indeed a very good iinformation but here we are searching by using boolean searches, whether these searches are different from proximity searches? As i am new to e-discovery pl;ease clear me for that.


    Thank You

What Do You Think?

Please comment on the above article.

Name (required)
Email Address (required, but won’t be published)
Web Address (optional) Remember My Information
TypeKey/TypePad Login (optional)