Inter-media
Semantic
Extraction &
REasoning

 

Presentation
Objectives
Partners
Activities
  + Meetings
  
  + 1st meeting

First meeting 27/28 sept 2004

Photos


Thematic Sessions
 


Contents
 

Video Mining & Retrieval

 

 

 

 

 

 

video analysis research at NII: towards mining video archives
Shin'ichi Satoh (NII, Japan)

In this talk, our approaches towards mining useful knowledge from large-scale video archives are introduced. We first brief our research efforts in video analysis and handling multimedia information, including efficient data structure for high-dimensional feature space, high-speed face detection, identical video segment detection, and face-name association in news videos. We then present our recent works towards mining video archives, including threading analysis of news topics, news video browser based on identical video segments, and news video abstraction combining both thread structure and identical video relation. Through these examples, we would like to show our advantages in video analysis technologies, high-dimensional feature space handling, and mining video archives.

Download slides here:

Evaluation in content-based video retrieval : the TREC video program
George Quénot (CLIPS, France)

TREC (Text REtrieval Conference) is an annual series of conferences dedicated to the evaluation of IR (Information Retrieval) systems and methods. Since 2001, it includes a video track that spun off as a separate workshop in 2003. The TREC video track/workshops conducts evaluations of video IR systems and/or components. The last two editions had tasks in shot segmentation, feature extraction, story segmentation and content-based search. The TREC video 2003 and 2004 corpus contains 180+ hours of MPEG-1 american TV news. The talk will give an overview of the TRECvid activity and of the CLIPS participations to it (see the official TRECvid web site).

Download slides here:

Human activity recognition for video understanding
Francois Bremond and Monique Thonnat (Orion, INRIA Sophia Antipolis, France)

This talk will present how to extract automatically semantic information from videos captured by fixed cameras. In particular this information consists in the detection of the present mobile objects (persons or vehicles), their trajectory in 3D, their behaviours, and the recognition of complex scenarios. This objective is achieved by using both computer vision techniques (motion detection, tracking, 3D analysis) and artificial intelligence techniques (a priori knowledge representation of 3D the environment and the scenarios models, ontology-based knowledge acquisition and spatio-temporal reasoning). The results are shown on different real applications as metro stations video surveillance and bank monitoring. We are currently working on extending these techniques for medical purposes (e.g. surveillance of the elderly) in cooperation with NCKU in Taiwan. Within the Isere project we could propose our video event ontology (work done in cooperation with American vision laboratories within ARDA workshops) as a basis for semantic multimedia video description.

Download slides here:

From Person Identification To Gesture Analysis
Philippe Joly (IRIT, France)

This talk will present two different types of works lead at IRIT in Toulouse for surveillance applications. The first one aims at automatically labelling characters appearing on the screen along the time. The goal is to give a same label to a same character for each of its occurrences in a given document. The main feature used to achieve this goal is the colour of to the costume. Some examples will be shown. The second part of the talk will present some works made on gesture analysis. On the base of a hierarchical model of the human body, the limbs of a human shape extracted form a video stream are localized and used to characterize some predefined gestures. Here again, some examples will be shown.

Download slides here:

Content-based Retrieval of Object Movies
Yi-Ping Hung (National Taiwan University)

I shall present an approach to retrieving object movies based on their content. Here, an object movie refers to a set of images captured from different perspectives around a 3D object. In our digital museum project working together with National Palace Museum and National Museum of History, we have adopted object movies as the 3D representation, for it photo-realistic view effect and for its ease of acquisition. In order to retrieve the desired object movie from the database, we first map an object movie into a manifold in the feature space. Two different sets of feature descriptors, one dense and one condensed, are designed to sample the manifold. Based on these descriptors, we define the dissimilarity measure between the query and the target in the object movie database. The query we considered can be either a complete object movie or simply a subset of views. At the end, I'll also introduce some of our other projects on visual surveillance.

Multimedia Fusion & filtering

Experiential Sampling in Multimedia Systems
Mohan Kankanhalli (NUS, SoC, Singapore)

Multimedia systems must deal with multiple data streams. Each data stream usually contains a significant volume of redundant noisy data. In many applications, it is essential to focus computing resources on a relevant subset of data streams at any given time instant and use it to build the model of the environment. We formulate this problem as an experiential sampling problem and propose an approach for utilizing computing resources efficiently on the most informative subset of data streams. We generalize the notion of static visual attention to multimedia data streams in a dynamical systems setting. The goal-driven generalized attention is maintained by a sampling representation that uses the current context and past experience for attention evolution. We will briefly talk about the theory as well its applications and areas for future work.

This work has been done jointly with Jun Wang of the Delft University of Technology and Ramesh Jain of GeorgiaTech.

Semantic Processing of News Video by Fusing Multi-Modal Information Sources and External Knowledge
Tat-Seng CHUA (School of Computing, National University of Singapore)

For many years, we have been working on different isolated islands of technologies in tackling the complex problems of multimedia information processing. Only recently have we begun to use different media types, such as visual, audio, and text routinely to analyze video contents. The use of only intra-contents, however, is still inadequate. To progress further, we need an equivalent of XML-like meta-level model to encode domain knowledge, and the judicious use of external knowledge, like the redundancy of web, ontology, and linguistic resources (dictionaries, encyclopedia) etc.

This research presents a framework for semantic news video processing that incorporates multi-modality features, domain model and external knowledge. We illustrate our framework based on the tasks of segmenting news video into story units, extracting semantic concepts in images/video, and retrieval. The research has been applied to over 120-hour of news video in TRECVID 2003/04, and effective results have been obtained.


 

Image indexing

Some works on Symbolic Photograph Indexing and Retrieval.
Dr. Philippe MULHEM, CNRS Researcher, (CLIPS Laboratory, MRIM Group, France)

This talk will present some of the work done during the fruitfull Singapore-France collaboration in the years 1999-2003 on Symbolic Image Indexing and Retrieval, and some ongoing research in the Modeling and Retrieval of Multimedia Information Group if the CLIPS laboratory. I will introduce first the reasons why I consider Symbolic Image Indexing and Retrieval so important. Then, I'll explain parts of the work done in cooperation with I2R on image retrieval dedicated to a 2-level image indexing and retrieval model and system that combines keywords and conceptual graphs modeling, and also some contextual information (date and time data). The current works that will be presented are: - fast relational symbolic indexing and retrieval - personnalized home photographs retrieval - signal/symbols relational indexing and retrieval. I will conclude by relating the presented works and the ISERE project goals.

Download slides here:

Learning and Integrating Semantics for Image Indexing and Retrieval
Lim Joo Hwee (I2R, Singapore)

To bridge the semantic gap in content-based image retrieval, detecting meaningful visual entities (e.g. faces, sky, foliage, buildings etc) in image content and classifying images into semantic categories based on trained pattern classifiers have become active research trends. In this seminar, we present dual cascading learning frameworks that extract and combine intra-image and inter-class semantics for image indexing and retrieval.

In the supervised learning version, support vector detectors are trained on semantic support regions without image segmentation. The reconciled and aggregated detection-based signatures then serve as input for support vector learning of image classifiers to generate class-based image indexes. During retrieval, similarities based on both indexes are combined to rank images.

In the semi-supervised learning approach, image classifiers are first trained on local image blocks from a small number of labelled images. Then local semantic patterns are discovered from clustering the image blocks with high classification output. Training samples are induced from cluster memberships for support vector learning to form local semantic pattern detectors. During retrieval, similarities based on local class pattern index and discovered pattern index are combined to rank images.

Query-by-example experiments on 2400 unconstrained consumer photos with 16 semantic queries show that the combined matching approaches are better than matching with single indexes. Both the supervised semantics design and the semantics discovery approaches also outperformed the linear fusion of color and texture features significantly in average precisions by 55% and 39% respectively.

Download slides here:

Object recognition for semantic image indexing.
Monique Thonnat and Nicolas Maillot (Orion, INRIA Sophia Antipolis, France)

This talk will present how to automatically analyse the contents of images for semantic indexing. In particular the content of the images is described in terms of the category of the main object. This objective is achieved by using both computer vision and artificial intelligence techniques. Computer vision techniques are used for image segmentation and feature extraction. Two kinds of artificial intelligence techniques are used: 1) ontology-based knowledge acquisition techniques for object category description in terms of visual concepts; 2) automatic numerical learning techniques for mapping numerical image features and visual concepts. First results are shown on the indexing of image data base of different sources (TV news and web images) and containing objects like airplanes and ships. Within the Isere project we could share common data and tools (image data bases, evaluation performances, ontologies, ...).

Download slides here:


 

Integrating Context

Multichannel smart sound sensor for perceptive spaces
Eric Castelli (MICA, Vietnam)

Sound in smart home are usually encountered for friendly man-machine interfaces, but sound information extraction is a complex task because of environmental noise and of multichannel processing need. A multichannel sound processing system capable to detect and identify sound events in noisy conditions is presented. The multichannel sound processing allows us to localize the sound in smart home and to select appropriate signal for identification procedure. This sensor is real time implemented on PC. The event detection module is carried out for each channel in real time. The classification module is launched in a parallel task on the channel chosen by data fusion process. The aim of this process is to select the channel with the biggest signal to noise ratio when a multiple detection occurs. The system validation is made on a test set and is presented with the proposed methodology of evaluation for a medical telemonitoring application. The obtained results are allowing us to develop smart home applications.

Download slides here:

Context Aware Health Care Delivery System
Pau-Choo Chung (Department of Electrical Engineering Institute of Computer and Communication Engineering National Cheng Kung University Tainan, Taiwan ROC)

This presentation will give an introduction of our ongoing work, context aware health care delivery system. The system consists of a daily behavior repository, a behavior analysis module, a service repository and a service presentation module. The daily behavior repository stores the daily behavior acquired from statistic. The behavior analysis module analyzes the behavior of target people. The service repository stores basic service functions, while the service presentation module determines the services to be delivered based on the results acquired from behavior analysis and the environment contexts. The context aware health care delivery system is also embedded with the medical teleconsultation system so that a necessary realtime video conference can be activated when necessary situation arises.

Context aware infrastructure for smart spaces
Zhang Daqing (Context Aware Systems Department, Institute for Infocomm Research, Singapore)

Starting from the Pervasive Computing vision by Mark Weiser, the presentation talks about the challenges in smart space and introduces an open-standard based service infrastructure for context aware services in the pervasive environment. The issues like context modelling, representation, aggregation, discovery and query are addressed using solutions like Semantic Web, UPnP, AI and Database. A Semantic Space is proposed and built to support the automatic device/service plug-and-play, dynamic high-level events inference and rapid application development.


 

Related Projects

SnapToTell Ubiquitous Information Access from Camera
Lim Joo Hwee (I2R, Singapore), Jean-Pierre Chevallet, (IPAL-CNRS, France)

With the proliferation of camera phones, many novel applications and services will emerge. In this talk, we present the SnapToTell system, which provides information directory service to tourists based on pictures taken by the camera phones and location information. We discuss key issues that motivate the design of the system and describe the system architecture. Next we present preliminary experimental results on scene recognition based on a realistic data set of scenes and locations in Singapore. Last but not least, we also discuss directions to be taken in the near future and relation whith the ISERE Project.

Download slides here:

Adaptive and Personalized Delivery of Multimedia Information
Leow Wee Kheng (Dept. of Computer Science National University of Singapore)

Thematic Strategic Research Project – UWB & Pervasive Computing Program

Download slides here:

 

Page updated February 9th, 2005