Invited speaker — MLSS Melbourne 2026
Lecturing at the Machine Learning Summer School, Melbourne (by invitation from Maincode).
Senior ML / Research Engineer
LLMs · Multimodal AI · Computer Vision · Efficient Training

Senior ML engineer and published researcher specialising in large language models, multimodal AI, computer vision, machine unlearning, and efficient training systems. Co-founder of A2.AI (a2ai.com.au).
Recognised for shipping research-grade systems into production at scale — from TikTok's Trust & Safety MLLMs to VFX pipelines on Mad Max: Furiosa, Mortal Kombat II, Deadpool, Mickey 17, Sonic 3, Sinners, Michael and A Complete Unknown.
Inventor on a US provisional patent (attention mechanism) and a granted UK design patent (AI-Assisted Rural & Indigenous Healthcare Robot). Co-investigator on an A$2.1M grant powering frontier-scale training on 256× NVIDIA H200 GPUs.
Lab to live system, from defence imagery and indie VFX through TikTok-scale Trust & Safety MLLMs to frontier H200 training.
AIML, University of Adelaide · CSIRO · A$2.1M Grant Co-Investigator · Adelaide, Australia
TikTok · Trust & Safety Research · Australia
Rising Sun Pictures · VFX ML Research · Adelaide, Australia
Adelaide Business School & UoA · Applied CV & NLP for Market Intelligence · Adelaide, Australia
DRDO & WESEE (Indian Navy) · Defence R&D · India
Australian Institute for Machine Learning, University of Adelaide
Collaborations: University of Oxford, University of Surrey, Monash University
The University of Adelaide
Adelaide, Australia
Rajasthan Technical University
India
Top-tier peer-reviewed venues across LLMs, multimodal AI, computer vision, and machine unlearning.
A. Garg, H. Saratchandran, S. Lucey
A. Yadav, A. Garg, T. D. Huy, L. Liu
A. Garg, S. Lucey, H. Saratchandran
R. Xu*, A. Garg* (co-first), H. Saratchandran, S. Lucey
A. Garg, H. Saratchandran, R. Garg, S. Lucey
A. Garg, C. Nguyen, R. Felix, Y. Liu, T.-T. Do, G. Carneiro
A. Garg, C. Nguyen, R. Felix, T.-T. Do, G. Carneiro
A. Garg, C. Nguyen, R. Felix, T.-T. Do, G. Carneiro
A. Garg, C. Nguyen, R. Felix, T.-T. Do, G. Carneiro
P. Shah, A. Garg, V. Gajjar
Full publication record, citations & h-index.
Continuously updated with new accepted papers and preprints.
Generative-AI pipelines built at Rising Sun Pictures — deepfake, gaze, super-resolution, NeRF / Gaussian Splatting — integrated into shipping VFX workflows on:









Recent talks, papers, awards, patents and grants.
Lecturing at the Machine Learning Summer School, Melbourne (by invitation from Maincode).
AI-Assisted Rural & Indigenous Healthcare Robot — Class 24, Medical Equipment. UK Intellectual Property Office.
Machine unlearning for stable vision-language alignment. First-author work with Saratchandran and Lucey.
Attention mechanism for neural networks — compute and memory-efficient LLM training, productionised in internal pipelines.
Adaptive estimation of instance-dependent ID/OOD label noise for robust learning. arXiv:2501.13389.
Peer-agreement based sample selection for training with instance-dependent noisy labels.
Recognised among the top reviewers worldwide.
Lead investigator for frontier-scale foundation model training on 256× NVIDIA H200 GPUs.
Advising on responsible-AI, trustworthy LLM/MLLM systems, and national-scale safety research.
Senior ML engineer designing MLLM architectures for production safety models at scale.
Graphical-model-based noise-rate estimation; published at ECCV 2024 (Springer).
Public ML projects across LLMs, attention mechanisms, noisy-label learning, and PEFT.
Visual deep dives into the papers and ideas shaping modern deep learning. Each post is a fully-rendered HTML reading experience.
Three environment variables, no patching: point Claude Code at any Anthropic-compatible endpoint you host yourself. Route to a cluster (claude-glm) or an on-device model (claude-fable5) — plus how to name them in the /model picker and fix the connection failures you'll actually hit on self-hosted endpoints.
Mixture of Experts explained from scratch. Why the smartest AI models don't use their whole brain, what an 'expert' actually is, and how a tiny router decides who answers — an interactive workbook you can play with and break. No ML background needed.
An interactive, jargon-free explainer of the context window: what the model can actually see, why doubling the conversation quadruples the cost, and what 'lost in the middle' really means.
The model is only part of the story. An interactive tour of the six things that wrap a language model — turning a single chat turn into an agent that can read, search, run code, and act.
A field guide to every major attention mechanism, from the original Transformer to Flash Attention 4 — seven families, fifty mechanisms, one decade.
How a hierarchical, parameter-free trick lets you pre-train Transformers on million-token contexts — and still get a fully dense model at the end.
LIMA argued a thousand carefully chosen examples can rival a million sloppy ones. LESS gave us the math. A complete recipe for data-efficient fine-tuning.
Attention transfer as feature-map distillation, LoRA as residual correction, and a block-wise schedule that makes 405B-parameter linearization tractable.
What if partially-observable reinforcement learning is just next-token prediction wearing a different costume?
Compute & memory-efficient training of large language models.
Filed May 2026
Class 24, Medical Equipment · UK Intellectual Property Office
No. 6520933 · 29 April 2026
Recognised among the top reviewers worldwide.
Training large foundation models on 256× NVIDIA H200 GPUs.
By invitation from Maincode.
140,000+ visits across public ML repositories.