Evaluation Benchmark Development for a Geotechnical Large Language Model

We are seeking a diligent placement student to support the evaluation component of a QAA-funded project developing a discipline-aware AI tool for engineering education. The role supports a collaboration between the Universities of Cardiff, Manchester, Surrey, and Glasgow examining how locally-deployed, fine-tuned language models -grounded in authoritative geotechnical sources- can enhance teaching quality while preserving institutional data sovereignty. In practical terms, the finished tool will be a specialist AI assistant for geotechnical engineering, used by undergraduate students across the four partner universities as a reliable, source-grounded alternative to general-purpose chatbots such as ChatGPT.

Key Responsibilities:

Benchmark Curation and Extension (50%)

Extend and curate an existing 300-question geotechnical evaluation benchmark across additional topics contributed by the partner universities (Manchester, Surrey, and Glasgow)
Apply and refine a five-tier question taxonomy covering basic recall, numerical reasoning, procedural judgement, interpretative judgement, and negative testing
Maintain clear provenance and licensing records for all source material used in benchmark construction
Validate inter-rater reliability on the human-scored component of the 60/20/20 hybrid rubric

Model Evaluation (35%)

Run the benchmark against base and fine-tuned candidate models using the project evaluation harness
Apply the 60/20/20 hybrid scoring rubric (human assessment, passage ranking, semantic similarity) and analyse inter-metric divergence
Characterise weak areas and pedagogical failure modes to inform knowledge-graph prioritisation

Documentation and Reporting (15%)

Produce a reusable benchmark release including dataset, scoring rubric, and scoring scripts
Contribute to project interim outputs, progress reports, and dissemination materials

Required Skills and Attributes

Background in civil, geotechnical, or a closely related engineering discipline
Experience with programming, ideally in Python
Systematic approach to structured datasets with clear, careful technical writing

Desirable Experience

Familiarity with generative AI or large language model tools
Prior exposure to Python, R, or MATLAB for data handling
Understanding of evaluation methodology or educational assessment design

Support and Development

The successful candidate will work under direct supervision of Dr Evan Ricketts and Dr Fei Jin, with access to the wider project team across the four partner institutions. This position represents an excellent opportunity for students interested in applied AI within civil engineering to gain hands-on research experience, contribute to a published evaluation benchmark, and build a portfolio in an emerging discipline-specific AI area. Interns will be named in project outputs and acknowledged in any research publications arising from their contributions, with the opportunity for co-authorship on strong individual contributions.

Connect with a cause that needs you!

Intern Placement QAA project Geotechnical Evaluation Benchmark for LLM ER