www.cloudninediscovery.com

Subscription Center

Sign up to receive eDiscovery Daily's articles via email or add the RSS feed to your newsreader of choice.

  • RSS Feed

Library

Browse eDiscovery Daily Blog

About the Bloggers

Brad Jenkins

Brad Jenkins, President and CEO of CloudNine Discovery, has over 20 years of experience leading customer focused companies in the litigation support arena. Brad has authored many articles on litigation support issues, and has spoken before national audiences on document management practices and solutions.

Doug Austin

Doug Austin, Professional Services Manager for CloudNine Discovery, has over 20 years experience providing legal technology consulting and technical project management services to numerous commercial and government clients. Doug has also authored several articles on eDiscovery best practices.

Jane Gennarelli

Jane Gennarelli is a principal of Magellan’s Law Corporation and has been assisting litigators in effectively handling discovery materials for over 30 years. She authored the company’s Best Practices in a Box™ content product and assists firms in applying technology to document handling tasks. She is a known expert and often does webinars and presentations for litigation support professionals around the country. Jane can be reached by email at jane@litigationbestpractices.com.

Proximity Searches Can Be the Right Balance of Recall and Precision – eDiscovery Best Practices

October 08, 2012

By Doug Austin

 

When performing keyword searching, the challenge to performing those searches effectively is to balance recall (retrieving responsive documents with hits) and precision (not retrieving too many non-responsive documents with hits).  A search that has 100% precision will contain only responsive documents; however, that does not mean that all of the responsive documents have been retrieved.  A search that has 100% recall will contain all of the responsive documents in the collection; however, it may also contain a large number of non-responsive documents, which can be drive up review costs.  So, how to perform searches that effectively balance recall and precision?

One way is through proximity searching.  Proximity searching is simply looking for two or more words that appear close to each other in the document.  It’s more precise than an AND search (i.e., termA and termB) with more recall than a phrase search (i.e., “termA termB”).  Let’s look an example.

You’re working for an oil company and you’re looking for documents related to “oil rights” (such as “oil rights”, “oil drilling rights”, “oil production rights”, etc.).  You could perform phrase searches, but any variations that you didn’t think of would be missed (e.g., “rights to drill for oil”, etc.).  You could perform an AND search (i.e., “oil” AND “rights”), and that could very well retrieve all of the files related to “oil rights”, but it would also retrieve a lot of files where “oil” and “rights” appear, but have nothing to do with each other.  A search for “oil” AND “rights” throughout various oil company’s data stores may retrieve several published and copyrighted documents that mention the word “oil”, but have nothing to do with “oil rights”.  Why?  Because almost every published and copyrighted document will have the phrase “All Rights Reserved” in the document, so those will be retrieved, even though many of them will likely be non-responsive.

A proximity search like “oil within 5 words of rights” will only retrieve the document if those words are as close as specified to each other, in either order.  Proximity searching helps reduce the result set to a more manageable number for review, by eliminating all of the files that happen to mention “oil” and “rights” somewhere in the document, but not in context with each other.  Yet, it catches all of the variations of phrases containing “oil” and “rights” for which you may not think to search.

Proximity searches are great for searching people’s names, as well.  For example, a phrase search for “John Adams” won’t retrieve “Adams, John”, but a proximity search for “John within 3 words of Adams” will retrieve “John Adams”, “Adams, John”, and even “John Q. Adams”.

When developing a search of two or more related words that effectively balances recall and precision, consider using a proximity search.  It just might be the right search for the situation.

So, what do you think?  Do you use proximity searching to make your searches more effective?  Please share any comments you might have or if you’d like to know more about a particular topic.

Disclaimer: The views represented herein are exclusively the views of the author, and do not necessarily represent the views held by CloudNine Discovery. eDiscoveryDaily is made available by CloudNine Discovery solely for educational purposes to provide general information about general eDiscovery principles and not to provide specific legal advice applicable to any particular circumstance. eDiscoveryDaily should not be used as a substitute for competent legal advice from a lawyer you have retained and who has agreed to represent you.
http://www.cloudninediscovery.com/ondemand/free-software-trial.aspx

Comments

What Do You Think?

Please comment on the above article.

Name (required)
Email Address (required, but won’t be published)
Web Address (optional) Remember My Information
TypeKey/TypePad Login (optional)