Using External Knowledge to Improve Zero-shot Action Recognition in Egocentric Videos

Zero-shot learning is a very promising research topic. For a vision-based action recognition system, for instance, zero-shot learning allows to recognise actions never seen during the training phase. Previous works in zero-shot action recognition have exploited in several ways the visual appearance of input videos to infer actions. Here, we propose to add external knowledge to improve the performance of purely vision-based systems. Specifically, we have explored three different sources of knowledge in the form of text corpora. Our resulting system follows the literature and disentangles actions into verbs and objects. In particular, we independently train two vision-based detectors: (i) a verb detector and (ii) an active object detector. During inference, we combine the probability distributions generated from those detectors to obtain a probability distribution of actions. Finally, the vision-based estimation is further combined with an action prior extracted from text corpora (external knowledge). We evaluate our approach on the EGTEA Gaze+ dataset, an Egocentric Action Recognition dataset, demonstrating that the use of external knowledge improves the recognition of actions never seen by the detectors.

Egileak (ixakideak):

Eneko Agirre

Gorka Azkune

Egileak:

Adrián Nuñez-Marcos, Gorka Azkune, Eneko Agirre, Diego López-de-Ipiña, Ignacio Arganda-Carreras

Fitxategi publikoak: