After the crime: content-based forensic triage in practice
After the crime: content-based forensic triage in practice

The digitisation of all aspects of business and growing volumes of digital storage are causing the global digital forensics industry to expand rapidly.

The industry is only projected to grow further as data breaches and other cyber crimes necessitate digital forensic investigations and analysis as responses.

However, this Big Data era has pushed the industry to crisis point, as the growing list of devices that contain electronic storage, and thus potential evidence, is making the traditional methods for evaluating electronic evidence unsustainable.

Despite these changes, many investigators steadfastly stick to the traditional method of analysing each data repository individually using forensic tools, then manually correlating the evidence they have uncovered.

If you can have dozens of devices in a single home, imagine how many would be involved in a case with multiple suspects and several locations including homes and offices. This approach has become immensely time-consuming and inefficient in an age where everyone wants answers fast.

Data growth outstripping investigative capabilities

As a rule of thumb, the number of devices containing data involved in a typical investigation doubles every two years, and the volume of data grows even faster. As well as increasing in volume and variety, we are seeing digital evidence become more complex.

For investigators and regulators working in corporate environments, evidence can be stored in file shares, email databases, email archives, collaboration and document management systems, among others. These repositories have intricate ways of storing and embedding data multiple levels deep. They often use closed, proprietary formats that typically require a vendor-supplied software interface to read the information within them.

In my experience as an investigator, I have seen this data – ever growing, moving, changing and becoming more complex – stretch most investigators to capacity.

Content-based forensic triage – the more efficient method

In recent years, we have seen law enforcement and corporate investigators take a different approach. Content-based forensic triage involves collecting all available data in a single storage location, then using a combination of data management, analytical and forensic techniques to focus on the most critical evidence sources until the key facts emerge. In my opinion, it achieves the same or better results as traditional forensic methods, but much faster and more efficiently.

The content-based forensic triage process is simple and logical. It begins with ingesting all data sources into a single repository, followed by a light metadata scan to tabulate information such as the sender, size, format and subject line of an email. Then, using techniques such as network diagrams and timelines, investigators can see connections and relationships between people and evidence.

Having identified the most likely evidence sources, investigators can then extract full text and metadata and with the use of advanced investigative tools, can then automatically extract and highlight intelligence items - including names, email addresses, IP addresses, credit card numbers, bank account numbers and amounts of money.

Cross-referencing this intelligence across all available evidence can rapidly reveal relationships between people and entities, deliver points to prove and also offer broader intelligence. It brings to light connections that human investigators might miss.

Finally, the process allows investigators to examine only the most relevant data sources. In most cases, this process will already have located the critical facts of the case. If not, it will almost certainly have provided clues as to where such information is hidden. Investigators can then use their digital forensics skills to dig deep into the likeliest evidence sources. In this way, they avoid wasting countless hours forensically analysing irrelevant material.

Case study: multi-jurisdictional fraud investigation

When a government agency began investigating a company fraudulently selling aircrafts that didn't exist, it recognised it would need a team of up to 20 investigators to examine the available data using traditional methods. The agency had seized approximately 40 devices including desktop and laptop computers and smartphones. Investigating each device sequentially would have made it impossible to locate links between different custodians and purchases.

Instead, the agency ingested all available data into a single storage location, then used our software to index and cross-reference it. The Nuix tools helped a single investigator quickly identify the most critical evidence, enabling the agency to bring charges.

In addition, by using the software's near-duplicate functions to find similar documents, the investigator brought to light a series of related companies – unknown to the agency – conducting fraudulent transactions for aircraft parts, boats and other high-value products.

Nuix made it easy for the agency to transfer the intelligence from the original case to investigations of the related companies, which led to further charges. I find it unlikely that an investigator using traditional methods would have been able to uncover the parallel companies, when the investigation was only looking for one in the first instance.

Avoid time-consuming and unnecessary forensic analysis

Human sleuths, however brilliant, can't hope to consistently and accurately cross-reference and find correlations across millions of data points. It is easy to miss connections, particularly without an automated way to identify intelligence items.

Content-based forensic triage however, allows investigators to access data stored in complex corporate repositories and cloud-based services, and then automatically cross-references the intelligence – revealing connections that may not have been immediately obvious.

Additionally, using advanced techniques such as word clusters - delivering more relevant results and fewer false positives than basic keyword searches – and visually analysing data makes it much easier to detect trends and isolate outliers across massive volumes of evidence.

Finding the electronic smoking gun often requires searching and cross-referencing large amounts of data. Using traditional investigation methods can result in investigators routinely turning away potential evidence and leaves the breach in the organisation open longer than it needs to be.

The content-based forensic triage approach enables corporate investigators to find answers faster - ensuring critical evidence is not overlooked, and results in finding and plugging the leak efficiently. 

Dr James Kent is head of investigations at Nuix