Multimodal Fusion of EMG and Vision for Human Grasp Intent Inference in Prosthetic Hand Control
Improving amputees’ quality of life with better functional prostheses
In 2005, an estimated 1.6 million people in the USA were living with the loss of one or more limbs, with this number expected to double by 2050. Around 80% of upper-limb amputees who have lost one or both hands currently use a cosmetic prosthesis, which does not restore the lost function needed for everyday tasks at home and at work.
To allow it to be used seamlessly in everyday life, a prosthesis needs to mimic the natural human behavior of pre-forming the grasp type of the hand prior to reaching the object—but creating such a functional prosthesis that can do this reliably and intuitively is a challenge.
Artificial intelligence (AI)-driven functional prostheses aim to address this by allowing the wearer's intended arm and hand movements to be inferred in advance based on environmental and physiological signals.
Combining electromyography and eye tracking to improve functional prostheses
Existing AI-driven functional prostheses have focused on using a single source of data to guide the reaching and grasping actions of a prosthetic robot arm and hand. However, inferring the intended action from one source alone can be unreliable and susceptible to errors.
For example, recording electromyography (EMG) signals from muscles in the forearm offers an intuitive way for the amputee to control the robotic prosthesis. However, errors can creep in as the recording electrodes shift or skin impedance changes over time.
Recently, a research team led by Gunar Schirner at the Embedded Systems Laboratory at Northeastern University, Boston, MA, USA investigated whether combining data from two independent sources might help to improve the performance of functional prostheses.
They sought to develop a hybrid EMG–visual AI algorithm for robotic prostheses that would combine EMG recordings from muscles in the forearm with world-camera and eye-gaze data from a Pupil Core headset to predict the required grasp type for different objects (Figure 1).
Leveraging AI to predict grasp type
To train and test the hybrid AI algorithm, a group of experimental subjects without limb damage were asked to pick up a variety of objects using predefined grasp types, e.g. scissors with a "distal grip", or a board eraser with a "palmar grip" (Figure 2).
World-camera, eye-gaze, and EMG signals were recorded during the task and then segmented into defined reaching, grasping/moving, and resting phases (Figure 3).
The world-camera data were used to train a visual classifier to identify objects in the subject’s field-of-view from 53 predefined possibilities, along with their corresponding grasp types. Eye-gaze location was then used to select between potential objects-of-interest in the visual field, based on bounding boxes defined by the classifier. Meanwhile the EMG classifier was trained to identify the most likely grasp type using the EMG signals from the subject’s forearm.
Benchmarking the AI
The researchers wanted to investigate if “fusing” the probabilities from the two classifiers would improve the accuracy and speed with which the correct grasp type could be determined. First, they examined the performance of the EMG or visual classifiers alone by analyzing the accuracy of grasp type estimation at each timepoint during the reaching, grasping, and returning phases.
When the EMG classifier was applied to task data that had not been used for training, the results showed that it was more than 80% accurate at identifying the correct grasp types throughout most of the reaching and grasping phases, with the accuracy peaking around the onset of grasping (Figure 3, blue line).
In addition, this classifier was able to predict the correct grasp type well before the grasp onset, which would give a robotic hand in a functional prosthesis time to pre-form its grasp type.
Next, the researchers analyzed the accuracy of the visual classifier at identifying the correct object and corresponding grasp type across the different task phases. The classifier was around 85% accurate across most of the phases, although this accuracy dropped by about 20% during the grasping phase and the adjacent time periods, which was probably because the subject’s arm and hand occluded the object (Figure 3, orange line)
Finally, the researchers calculated the accuracy with which the fused EMG and visual classifiers could predict the correct grasp type. The combined classifier was found to increase the average accuracy of prediction by around 15% compared with that of either classifier alone (Figure 3, green line).
Importantly, the predictions of the visual and EMG classifiers complemented each other well, as the EMG classifier had the highest accuracy during the reaching and grasping phases, while the visual classifier showed the opposite pattern.
Towards more effective AI-driven functional prostheses
In their fascinating study, the research team successfully improved the accuracy of grasp-type prediction for functional prostheses by combining world-camera and eye-gaze data from a Pupil Core headset with forearm EMG recording. Crucially, the algorithms developed by the researchers were able to predict the correct grasp type well in advance of reaching an object.
We at Pupil Labs commend the research team on their success in developing effective methods for fusing multiple sources of information to accurately predict grasp type. We hope that their pioneering work will lead to the creation of better functional prostheses capable of seamless, human-like reaching and grasping, which can contribute to improving the quality of life of upper-limb amputees worldwide.
If you wish to include your published works or research projects in future digests, please reach out!