A robust multimodal fusion framework for command interpretation in human-robot cooperation