Menu

Vilynx Research

Unsupervised Multi-label Dataset Generation from Web Data

Published in The 3rd Workshop on Visual Understanding by Learning From Web Data at Conference on Computer Vision and Pattern Recognition (CVPR) 2019

Authors:
Carlos Roig, David Varas, Issey Masuda, Juan Carlos Riveiro and Elisenda Bou-Balust

Abstract:
This paper presents a system towards the generation of multi-label datasets from web data in an unsupervised manner. To achieve this objective, this work comprises two main contributions, namely: a) the generation of a low-noise unsupervised single-label dataset from web-data, and b) the augmentation of labels in such dataset (from single label to multi label). The generation of a single-label dataset uses an unsupervised noise reduction phase (clustering and selection of clusters using anchors) obtaining a 85% of correctly labeled images. An unsupervised label augmentation process is then performed to assign new labels to the images in the dataset using the class activation maps and the uncertainty associated with each class. This process is applied to the dataset generated in this paper and a public dataset (Places365) achieving a 9.5% and 27% of extra labels in each dataset respectively, therefore demonstrating that the presented system can robustly enrich the initial dataset.

Materials:

Multi-modal Pyramid Feature Combination for Human Action Recognition

Published in the Workshop on Multi-modal Video Analysis and Moments in Time at International Conference on Computer Vision (ICCV) 2019.

Authors:
Carlos Roig,  Manuel Sarmiento, David Varas, Issey Masuda, Juan Carlos Riveiro and Elisenda Bou-Balust

Abstract:
Accurate human action recognition remains a challenging task in the field of computer vision. While many approaches focus on narrow image features, this work proposes a novel multi-modal method that combines task specific features (action recognition, scene understanding, object detection and acoustic event detection) for human action recognition.
This work encompasses two contributions: 1) The introduction of a feature fusion block that uses a gating mechanism to perform attention over features from other domains and 2) A pyramidal feature combination approach that hierarchically combines pairs of features from different tasks using the previous fusion block. The richer features generated by the pyramid are used for human action recognition.
This approach is validated using a subset of the Moments In Time dataset, resulting in an accuracy of 35.43%.

Materials:

VLX-Stories: building an online Event Knowledge Base with Emerging Entity detection‍

Published in The 18th International Semantic Web Conference (ISWC), 2019

Authors:
Dèlia Fernàndez-Cañellas, Joan Espadaler, David Rodriguez, Blai Garolera, Gemma Canet, Aleix Colom, Joan Marco Rimmek, Xavier Giro-i-Nieto, Elisenda Bou, and Juan Carlos Riveiro

Abstract:
We present an online multilingual system for event detection and comprehension from media feeds. The system retrieves information from news sites, aggregates them into events (event detection), and summarizes them by extracting semantic labels of its most relevant entities (event representation) in order to answer the journalism Ws: who, what, when and where. The generated events populate VLX-Stories -an event ontology- transforming unstructured text data to a structured knowledge base representation. Our system exploits an external entity Knowledge Graph (VKG) to help populate VLX-Stories. At the same time, this external knowledge graph can also be extended with a Dynamic Entity Linking (DEL) module, which detects emerging entities (EE) on unstructured data. The system is currently deployed in production and used by media producers in the editorial process, providing real-time access to breaking news. Each month, VLX-Stories detects over 9000 events from over 4000 news feeds from seven different countries and in three different languages. At the same time, it detects over 1300 EE per month, which populate VKG.

Materials:

VLX-Stories: a Semantically Linked EventPlatform for Media Publishers

Published in The 18th International Semantic Web Conference (Demo Track) - ISWC 2019

Authors:
Dèlia Fernàndez-Cañellas, Joan Espadaler, Blai Garolera, David Rodriguez, Gemma Canet, Aleix Colom, Joan Marco Rimmek, Xavier Giro-i-Nieto, Elisenda Bou-Balust, and Juan Carlos Riveiro

Abstract:
In this article we present a web platform used by media producers to monitor world events, detected by VLX-Stories. The event detector system retrieves multi-regional articles from news sites, aggregatesthem by topic, and summarizes them by disambiguating and structuringtheir most relevant entities in order to answer the journalism W’s: who,what, when and where. These events populate VLX-Stories -an eventontology- transforming unstructured text data to a structured knowledge base representation. The dashboard displays online detected eventsin a semantically linked space which allows navigation among trendingnews stories on distinct countries, categories and time. Moreover, detected events are linked to costumer contents, helping editorial process by providing real-time access to breaking news related to their contents.

Materials:

Linking Media: adopting Semantic Technologies for multimodal media connection

Published at International Semantic Web Conference (Industry Track) - ISWC 2018

Authors:
Dèlia Fernández-Cañellas, Elisenda Bou-Balust, Xavier Giró-i-Nieto, Juan Carlos Riveiro, Joan Espadaler, David Rodriguez, Aleix Colom, Joan Marco Rimmek, David Varas, Issey Massuda and Carlos Roig

Abstract:
Media producers publish large amounts of multimedia content online - both text, audio and video. To be able to explode all this information we need methods to connect multimodal documents. Integrating and linking media documents requires the understanding and extraction of semantics which describe its content with a universal representation. Labels could be used to describe document contents. However, most of the times this data is not labeled or when labeled it does not use standards. Moreover, manually labeling data is unfeasible, therefore automatic methods for tagging are needed.
Vilynx provides a media platform with semantic solutions to automatically index multimedia documents from a library and generates search and recommendation engines by linking them to other contents, trends and to stories developing in the news. The user interface displays in an intuitive manner the links between media documents and stories and allows navigation through related content by using associated semantic tags. This interface is a powerful industrial tool for publishers to index, retrieve and visualize their contents. It helps them identify which topics require more attention, or retrieve related content that has already been published about the stories. Moreover, recommendation and search tools are build on top of the detected semantic entities and integrated on customer’s web pages.

Materials:

Unsupervised Large-Scale World Locations Dataset

Published in The 2nd Workshop on Visual Understanding by Learning From Web Data at Conference on Computer Vision and Pattern Recognition (CVPR) 2018.

Authors:
Carlos Roig, David Varas, Issey Masuda, Manuel Sarmiento, Genis Floriach, Joan Espadaler, Juan Carlos Riveiro and Elisenda Bou-Balust

Abstract:
Deep Learning systems require vasts amounts of data to be trained. Due to this, it is unfeasible to rely on humanly-curated datasets. Moreover, it is also needed to have access to large data collections (usually Internet-based and very noisy). To solve this challenge, this paper presents a) a novel approach to generate unsupervised large-scale classname-annotated landmark datasets and b) a system to reduce the noise in such datasets without supervision. To evaluate the robustness of the generated dataset, this paper compares the presented dataset with the Google Landmark dataset on a Landmark Recognitoin task, showing similar results on the Oxford5K and Paris6K sets. The noise filtering system is evaluated demonstrating successful results. The combination of the unsupervised dataset generation and the unsupervised noise filtering systems presented in this paper have the potential to drastically increase currently available landmark datasets and therefore its potential applications.

Materials:

What is going on in the world? A display platform for media understanding

Published at IEEE 1st International Conference on Multimedia Information Processing and Retrieval (IEEE MIPR 2018)

Authors:
Dèlia Fernández, Joan Espadaler, David Varas, Issey Masuda, Jordi Ferreira, Aleix Colom, David Rodríguez, David Vegas, Miquel Montalvo, Xavier Giró-i-Nieto, Juan Carlos Riveiro and Elisenda Bou

Abstract:
News broadcasters and on-line publishers daily generate a large amount of articles and videos describing events currently happening in the world.
In this work, we present a system that automatically indexes videos from a library and links them to stories developing in the news. The user interface displays in an intuitive manner the links between videos and stories and allows navigation through related content by using associated tags.

This interface is a powerful industrial tool for publishers to index, retrieve and visualize their video content. It helps them identify which topics require more attention or retrieve related content that has already been published about the stories.

Materials:

ViTS: Video Tagging System from Massive Web Multimedia Collections

Published at 5th Workshop on Web-scale Vision and Social Media (VSM) at International Conference on Computer Vision (ICCV) 2017

Authors:
Dèlia Fernández, David Varas, Joan Espadaler, Issey Masuda, Jordi Ferreira, Alejandro Woodward, David Rodríguez, Xavier Giró-i-Nieto, Juan Carlos Riveiro and Elisenda Bou

Abstract:
The popularization of multimedia content on the Web has arised the need to automatically understand, index and retrieve it. In this paper we present ViTS, an automatic Video Tagging System which learns from videos, their web context and comments shared on social networks. ViTS analyses massive multimedia collections by Internet crawling, and maintains a knowledge base that updates in real time with no need of human supervision. As a result, each video is indexed with a rich set of labels and linked with other related contents. ViTS is an industrial product under exploitation with a vocabulary of over 2.5M concepts, capable of indexing more than 150k videos per month. We compare the quality and completeness of our tags with respect to the ones in the YouTube-8M dataset, and we show how ViTS enhances the semantic annotation of the videos with a larger number of labels (10.04 tags/video), with an accuracy of 80,87%.

Materials: