Answers to some of the most frequently asked questions about electronic discovery.
Can't find your answer? Send us your question at the bottom of this post.
Click on your area of interest below, or browse the entire post.
What is ESI?
ESI is an acronym for Electronically Stored Information. We’re just going to call it “data” for purposes of this FAQ (we’re not big fans of jargon).
What is metadata?
Metadata is "data about data." Metadata tells you who created a file, or who sent an email, and when they created or sent that document. It can also tell you where a document is stored on a user’s system or the geographic location a picture was taken. It’s hugely helpful in telling your data’s story.
What is collection?
Collection is the process of gathering your data. This can be a really simple process (just taking the contents of a USB drive), or very complex (collecting deleted images off of hard drives, or collecting all of the data in the hands of one person, including smartphones, tablets, computer hard drives, data on email servers, social media data, and more).
What is the difference between a physical forensic image and a targeted collection (AKA logical forensic image)?
A physical forensic image is an exact copy of a hard drive. Every 1 and 0 of the source drive is copied. Imaging this way is most effective as it ensures no data will be "missed" when collecting from that device. Physical forensic images also contain deleted information.
A targeted collection (AKA a "logical forensic image") is the process of capturing the entire content of a single volume (like your entire C:\ drive) or a folder (like your "My Documents" folder) or an individual file (like "smoking Gun.PDF") on the device you are imaging. A logical image does not include the potential to recover deleted information.
What is processing?
Processing gets all of the best parts of information out of data--text and metadata. After data is uploaded, servers will churn through all of the data to make it usable and searchable by you during review.
Can we avoid processing the documents?
No, unfortunately. Processing is an integral part of eDiscovery.
Why do you need to process documents in a particular time zone? Why is it recommended to process documents in UTC (GMT)?
Selecting a single time zone for processing all documents helps us ensure consistency across documents. UTC is the standard time zone the world uses to regulate time, so that is the default time zone. Software will usually adjust UTC to the time zone of the reviewer by default during review. If there is no agreement for time zone processing the most defensible method for processing data is in UTC.
What is a document family? Why should we keep document families together?
The most common example of a document family is an email and an attachment. The email and attachment are part of the same document family. Keeping the family together helps maintain essential context for individual documents. Another example of a document family is documents in a .zip compressed file.
What is email threading?
Email threading will place all emails in the same thread of replies and forwards into one group. This helps minimize unnecessarily reviewing the same document over and over again.
What is the difference between Optical Character Recognition (OCR) and extracted text? Can you OCR handwritten documents?
Extracted text is textual content that has been copied directly from the file. Since extracted text is an exact copy of the text it has the highest accuracy possible. This is only possible when the file contains raw textual content. For example, a Microsoft Word file contains text which can be extracted. On the other hand, OCR text has been generated from image files which contain no raw textual content. Rather, a software program scans the image and algorithmically attempts to identify text in the image. OCR has variable accuracy. If the image is crisp and has high contrast, the OCR will be more successful. If the image lacks contrast, there are visual aberrations or the text is blurry, then the quality of the OCR will be low.
Since handwriting is highly variable, there are no reliable tools for OCRing handwriting.
What is data culling?
Data culling is the process of removing files that are not relevant to the case at hand. Data culling could remove files that are not in the right date range for a case, or remove file types that are not relevant to a case. Data culling can hugely lessen the amount of documents that need to be reviewed, and therefore save a lot of money.
Can you cull system files?
System files are culled through a process called De-NISTing. The National Institute of Standards and Technology maintains a database with the known hash-values of system files. By checking against that database we can remove system files.
What happens to encrypted or password-protected documents?
Password protected files will not be processed without the password as a standard practice. If a password is supplied, then the document can be processed.
What is a direct hit? What is an indirect hit?
A direct hit is a document that has the search term within the document. An indirect hit is a document that has the search term within a family member, like an attachment in an email. That is, when a search term hits on an attachment to an email, the attachment would be a direct hit and the email would be an indirect hit.
What is proximity searching?
Proximity searching is searching for words or phrases within a certain number of words of each other. For example "spot within 3 words of dog" would return a hit on "Spot the dog ran across the park." It would not return a hit on "Can you spot the mistake that the dog made?"
What is fuzzy searching?
Fuzzy searching pulls up words that are at least one letter away to the word being searches. Let's say you're searching for the word "clarify". Fuzzy searching would also pull up words like "clarity", "charity", "clarifies", and "clarified". The fuzziness level determines if your results include words that differ with one letter or more.
WITHIN RELATIVITY, What is a dtSearch index? How is it different from a keyword index?
A dtSearch index is a special text index that provides additional search capabilities beyond the keyword search. With a dtsearch index, you can search with both proximity searching and fuzzy searching.
WHAT IS A Simple Filter Search?
Filters query across the searchable set of documents in the active view to return your results. You can use filters on single fields or multiple fields to get the matching documents in your view. This is the simplest form of search option which limits the documents/items that appear in the lists on Relativity tabs.
Why shouldn’t we use a multiple choice field for production designation, privilege fields, or confidentiality?
Multiple choice fields on production designation, privilege and confidentiality can result in contradictory coding choices. For example, if production designation allowed multiple choice, a document could accidentally be marked with "produce" and "do not produce". The same could apply to the other fields.
Why should we have a “Not privileged” or “Not confidential” designation? Why shouldn’t we leave those fields unmarked?
A "Not Privileged" designation is a good indicator to make sure that the document was reviewed for privilege. Without this designation, if the a document's privilege designation is unmarked, then that document may not be privileged, or it just might not have been reviewed for privilege yet. By utilizing a "Not Privileged" designation, all ambiguity is removed.
Why should we avoid using family or duplicate propagation?
Family and duplicate propagation can lead to unexpected results in a review. When reviewing, tagging a parent email as responsive will tag the family as responsive then. If you then tag the attachment as non-responsive, the parent email will then also be tagged. This can create confusion in the workflow.
Why are there duplicates in the database after deduplication?
Duplicates will still exist due to the same document being attached to multiple emails. The document will not deduplicate out in order to keep the document family whole.
Why are there so many junk files?
“Junk Files” are many times images included as part of a user’s signature. However, not all images extracted out of an email will be junk, such as when someone references a screen-capture in their email.
What is a document image?
A document image is the rendering of a document in a static, printable form.
What formats can we use?
Image productions are typically in single-bit TIFFs for black and white or .jpg for color. Image productions may also be produced in .pdf.
WHAT IS A TIFF IMAGE?
TIFF stands for Tagged Image File Format. It is essentially a picture of a document. A single TIFF file can consist of a single document, or multiple documents. It is the most common standard for productions of reviewed documents.
Why do we need to image documents?
Imaging documents and branding them with production numbers allows both parties to know exactly which page is being referenced and ensures that both parties are looking at the same image.
Why do we need load files? We are not producing metadata.
Load files will also contain document breaks and family groupings. These files can then load the documents into the review tool, maintaining the document relationships.
WHAT IS AN MD5 HASH VALUE?
MD5 hash value is used to verify which documents in a matter are exact duplicates. An MD5 hash value is a unique fingerprint assigned to an electronically created file.
What are load files?
There are two relatively standard types of load files used in eDiscovery. A .DAT file and an .OPT file. There are other types of load files, but these tend to be the most common. A .DAT file is a text-based document which contains the data extracted from data during the processing phase of the EDRM. An OPT file is a file which contains a link to the TIFF images or PDF's that are produced by a party in a document production.
The content of a load file is customizable based on the needs of the matter, for example in some cases the "Modified Date" of a document may be extremely important, so for this matter the "Modified Date" field should be included in the Load File. There are various phases of the EDRM where a load file is generated. First, load files are typically created after processing data a party has collected for a matter, for which they must review the data prior to producing documents to an opposing party. Once the document processing is finished for a particular media the data can be exported and uploaded to a review database like Relativity. The load file will contain the metadata that is important to your matter. You should discuss which metadata fields are important to the matter early in the case prior to reviewing documents, Load files are also provided by parties to a legal matter to accompany the documents they produce to each other. Parties typically need to agree on which metadata fields will be produced in a document production, and these fields will be included in the load files that accompany the documents they produce.
What is a DAT file? What does it contain?
A DAT file is a document level load file. It contains document metadata, text file location, and/or native file location for each document.
What is an OPT file? What does it contain?
An OPT file is a page-level load file. Each line points to a single page in a document with the first page being marked with a "Y".
How do I send you documents without invalidating the metadata?
This can be a bit tricky for even experienced computer users. Typically, if metadata has the potential to be important in a matter, it is crucial to consult with forensic professional to make sure that appropriate steps are being followed to prevent spoliation. It is always best to do things in a way that is defensible with the court when preserving and collecting documents for a legal matter. Metadata can be treated as evidence by the court and it is important to have a forensically sound method of collecting documents prior to implementing the plan to collect them.
If I have a user account already, will it cost another $100/month to access a second database?
User accounts can access multiple workspaces without any increase in the fee.
What are native files?
Native files are the electronic documents in their standard format. For example, a Microsoft Excel file is an example of a native file. A non-native file is the image of that file (imagine a PDF version of a Microsoft Word document).
How do I download/print a document from Relativity?
Here is a short, interactive video that gives you hands on experience with saving PDFs for printing.
How do I tag/code a document in Relativity?
Here is a short, interactive video to give you hands on experience on coding documents.
How do I create a Layout for coding documents?
Here is a step by step procedure to create coding layouts.