Tal announcement – “Breaking the Speed Barrier: LLM Inference Optimization” by Nikola Henezi

Tal announcement – “Breaking the Speed Barrier: LLM Inference Optimization” by Nikola Henezi

We’re happy to announce that Nikola Henezi will be presenting at DORS/CLUC this year. Nikola has spent the last 15 years at the intersection of functional programming and practical software engineering. With a background in mathematics and theoretical computer science, his work focuses on solving complex challenges through elegant simplicity.

At QED Ltd., he builds sophisticated solutions without unnecessary complexity, driven by the belief that good engineering is about choosing the right tools and using them well. Prior to that, he founded Strongly Typed, where his team demonstrated how functional programming and strong type systems can dramatically reduce development time while improving code quality.

Recently, Nikola’s focus has shifted toward LLM inference optimization, particularly with the Llama family of models. His work explores the relationship between architectural decisions, quantization techniques, and the resulting impact on performance and quality.

In his talk, Breaking the Speed Barrier: LLM Inference Optimization, Nikola will share hands-on insights gained from working with Llama-based models and dive into practical strategies for overcoming inference bottlenecks. He will discuss how different optimization choices affect both performance and output fidelity, and how to construct robust, cost-effective inference pipelines that scale—from high-speed single-user scenarios to high-throughput production systems. Whether you’re deploying language models in production or simply exploring their capabilities, this talk will offer valuable perspectives on making them run faster and smarter.