Vision-based hand pose estimation methods for Augmented Reality in industry: Crowdsourced evaluation on HoloLens 2

Abstract

Gestural input based on hand pose estimation is a common interaction method for augmented reality (AR). This interaction technique has gained more popularity with the emergence of novel AR-supporting devices such as Microsoft HoloLens 2 (HL2) and advancements in computer vision research underpinning hand-tracking and gesture recognition methods. In our work, we focus on challenging cases where the AR interface is facilitated with a state-of-the-art HL2 headset for unconstrained execution of tasks requiring simultaneous hand movement and tracking. When using this headset, AR users might bimanually interact with digital and physical objects that are visible in the user’s field of view (FoV) through the see-through visor. Due to the limiting in-built capabilities, we investigated a range of hand pose estimation functionalities from different domains. To ensure a fair comparison, we asked several participants to carry out tasks requiring interactions with real-world objects and record the performance of various hand-tracking solutions. Next, we evaluated the performance of these algorithms through crowdsourcing, often used to provide ground truth for machine learning training. Our results provide a guideline for AR developers in selecting appropriate hand-tracking solutions for a given deployment context.

BibTeX

				
					@article{ZYWANOWSKI2025104328,
title = {Vision-based hand pose estimation methods for Augmented Reality in industry: Crowdsourced evaluation on HoloLens 2},
journal = {Computers in Industry},
volume = {171},
pages = {104328},
year = {2025},
issn = {0166-3615},
doi = {https://doi.org/10.1016/j.compind.2025.104328},
url = {https://www.sciencedirect.com/science/article/pii/S0166361525000934},
author = {Kamil Żywanowski and Mikołaj Łysakowski and Michał R. Nowicki and Jason T. Jacques and Sławomir K. Tadeja and Thomas Bohné and Piotr Skrzypczyński},
keywords = {Augmented reality, Hand-tracking, Gesture recognition, Hand pose estimation, Crowdsourcing},
}
				
			
APA Reference

Kamil Żywanowski, Mikołaj Łysakowski, Michał R. Nowicki, Jason T. Jacques, Sławomir K. Tadeja, Thomas Bohné, Piotr Skrzypczyński, Vision-based hand pose estimation methods for Augmented Reality in industry: Crowdsourced evaluation on HoloLens 2, Computers in Industry, Volume 171, 2025, 104328, ISSN 0166-3615, DOI:10.1016/j.compind.2025.104328

Cyber-human Lab Contributors

Dr Sławomir Tadeja

Dr Slawomir K. Tadeja is a Postdoctoral Associate with the Department of Mechanical Engineering at the Massachusetts Institute of Technology (MIT). Here, he works...

Dr Thomas Bohné

Thomas Bohné is the founder and head of the Cyber-Human Lab at the University of Cambridge’s Department of Engineering. He is also leading research...