Unsupervised Large-Scale World Locations Dataset
Published in The 2nd Workshop on Visual Understanding by Learning From Web Data 2018 at CVPR
Carlos Roig, David Varas, Issey Masuda, Manuel Sarmiento, Genis Floriach, Joan Espadaler, Juan Carlos Riveiro and Elisenda Bou-Balust
Deep Learning systems require vasts amounts of data to be trained. Due to this, it is unfeasible to rely on humanly-curated datasets. Moreover, it is also needed to have access to large data collections (usually Internet-based and very noisy). To solve this challenge, this paper presents a) a novel approach to generate unsupervised large-scale classname-annotated landmark datasets and b) a system to reduce the noise in such datasets without supervision. To evaluate the robustness of the generated dataset, this paper compares the presented dataset with the Google Landmark dataset on a Landmark Recognitoin task, showing similar results on the Oxford5K and Paris6K sets. The noise filtering system is evaluated demonstrating successful results. The combination of the unsupervised dataset generation and the unsupervised noise filtering systems presented in this paper have the potential to drastically increase currently available landmark datasets and therefore its potential applications.
ViTS: Video Tagging System from Massive Web Multimedia Collections
Published at 5th Workshop on Web-scale Vision and Social Media (VSM) at ICCV 2017
Dèlia Fernández, David Varas, Joan Espadaler, Issey Masuda, Jordi Ferreira, Alejandro Woodward, David Rodríguez, Xavier Giró-i-Nieto, Juan Carlos Riveiro and Elisenda Bou
The popularization of multimedia content on the Web has arised the need to automatically understand, index and retrieve it. In this paper we present ViTS, an automatic Video Tagging System which learns from videos, their web context and comments shared on social networks. ViTS analyses massive multimedia collections by Internet crawling, and maintains a knowledge base that updates in real time with no need of human supervision. As a result, each video is indexed with a rich set of labels and linked with other related contents. ViTS is an industrial product under exploitation with a vocabulary of over 2.5M concepts, capable of indexing more than 150k videos per month. We compare the quality and completeness of our tags with respect to the ones in the YouTube-8M dataset, and we show how ViTS enhances the semantic annotation of the videos with a larger number of labels (10.04 tags/video), with an accuracy of 80,87%.
ViTS iccv207 Data: vits_iccv2017_data.zip
What is going on in the world? A display platform for media understanding
Published at IEEE 1st International Conference on Multimedia Information Processing and Retrieval (IEEE MIPR 2018)
Dèlia Fernández, Joan Espadaler, David Varas, Issey Masuda, Jordi Ferreira, Aleix Colom, David Rodríguez, David Vegas, Miquel Montalvo, Xavier Giró-i-Nieto, Juan Carlos Riveiro and Elisenda Bou
News broadcasters and on-line publishers daily generate a large amount of articles and videos describing events currently happening in the world.
In this work, we present a system that automatically indexes videos from a library and links them to stories developing in the news. The user interface displays in an intuitive manner the links between videos and stories and allows navigation through related content by using associated tags.
This interface is a powerful industrial tool for publishers to index, retrieve and visualize their video content. It helps them identify which topics require more attention or retrieve related content that has already been published about the stories.