Pattern-of-Life Analysis Using LLMs for Digital Investigation

Funding

Self-funded

Project code

CMP10221026

Department

School of Computing

Start dates

October, February and April

Application deadline

Applications accepted all year round

Applications are invited for a self-funded, 3-year full-time or 6-year part time PhD project.

The PhD will be based in the School of Computing and will be supervised by Dr Aikaterini Kanta, Dr Mo Adda and Dr Bander Al-rimy.

The work on this project will:

Design and fine-tune domain-specific Large Language Models (LLMs) for digital forensic investigation.
Automate extraction of behavioural patterns from forensic disk images, correlating with external OSINT sources.
Develop a cross-modal fusion pipeline combining private (e.g., disk artifacts) and public (e.g., social media) data.
Enable timeline reconstruction and intent inference through AI-driven pattern recognition.
Enhance forensic reporting with explainable AI outputs tailored to legal and law enforcement needs.

Digital forensic analysts are increasingly overwhelmed by the scale and complexity of modern investigations. Disk images alone can contain millions of artefacts. When combined with the vast range of publicly available OSINT sources, the challenge becomes one of filtering signal from noise, then making sense of it all. Traditional tools are not built to handle this level of semantic analysis or cross-modal reasoning.

This project proposes to build a system that can conduct pattern-of-life analysis by integrating forensic disk image data with OSINT using bespoke fine-tuned Large Language Models. These models will be specifically trained for forensic tasks, capable of parsing semi-structured and unstructured data from diverse sources such as browser histories, chat logs, deleted files, social media activity, and forum posts.

The research will focus on: model development, data fusion, and investigative application. A custom LLM will be trained on a curated corpus of forensic data and OSINT to recognise behavioural patterns such as emotional state indicators. A data processing pipeline will be developed to extract content from disk images using established forensic tools and link it to relevant external sources. This combined input will feed the model to generate behavioural timelines, highlight anomalies, and support hypotheses about user intent.

The system will be evaluated both technically and operationally. This includes measuring the precision and recall of extracted behavioural patterns, the coherence of generated summaries, and the utility of the output in investigative settings. Case studies using synthetic datasets and historical CTF images will be used to validate performance.

Legal and ethical considerations will be a central focus. The project will investigate how to maintain evidentiary standards such as reproducibility and integrity while using probabilistic models. It will also assess the risks of false positives and biases in automated behavioural inference.

This research aims to shift the field from artefact-by-artefact analysis to holistic behavioural understanding, providing investigators with richer context and faster insights.

Fees and funding

Visit the research subject area page for fees and funding information for this project.

Funding availability: Self-funded PhD students only.

PhD full-time and part-time courses are eligible for the UK (UK and EU students only).

Bench fees

Some PhD projects may include additional fees – known as bench fees – for equipment and other consumables, and these will be added to your standard tuition fee. Speak to the supervisory team during your interview about any additional fees you may have to pay. Please note, bench fees are not eligible for discounts and are non-refundable.

Entry requirements

You'll need a good first degree from an internationally recognised university (minimum upper second class or equivalent, depending on your chosen course) or a Master’s degree in Computer Science, Cybersecurity or a related area. In exceptional cases, we may consider equivalent professional experience and/or Qualifications. English language proficiency at a minimum of IELTS band 6.5 with no component score below 6.0.

Applicants should have a strong foundation in Machine Learning and Cybersecurity, with solid Python programming skills. Familiarity with Large Language Models (LLMs), digital forensics, and OSINT analysis is highly desirable. Prior experience with forensic toolkits (e.g. Autopsy, The Sleuth Kit) or working with unstructured data (e.g. chat logs, browser histories, social media content) will be an advantage.

How to apply

We’d encourage you to contact Dr Aikaterini Kanta (katerina.kanta@port.ac.uk) to discuss your interest before you apply, quoting the project code.

When you are ready to apply, please follow the 'Apply now' link on the Computing PhD subject area page and select the link for the relevant intake. Make sure you submit a personal statement, proof of your degrees and grades, details of two referees, proof of your English language proficiency and an up-to-date CV. Our ‘How to Apply’ page offers further guidance on the PhD application process.

When applying please quote project code: CMP10221026

���ϳԹ�