|
|
First meeting 27/28 sept 2004

Photos
|
|
Thematic Sessions
|
Contents
|
|
Video Mining & Retrieval
|
video analysis research at NII: towards mining video archives
Shin'ichi Satoh (NII, Japan)
In this talk, our approaches towards mining
useful knowledge from large-scale video archives are introduced. We
first brief our research efforts in video analysis and handling
multimedia information, including efficient data structure for
high-dimensional feature space, high-speed face detection, identical
video segment detection, and face-name association in news videos. We
then present our recent works towards mining video archives, including
threading analysis of news topics, news video browser based on identical
video segments, and news video abstraction combining both thread
structure and identical video relation. Through these examples, we would
like to show our advantages in video analysis technologies,
high-dimensional feature space handling, and mining video archives.
Download slides here:

Evaluation in content-based video
retrieval : the TREC video program
George Quénot (CLIPS, France)
TREC (Text REtrieval Conference) is an
annual series of conferences dedicated to the evaluation of IR
(Information Retrieval) systems and methods. Since 2001, it includes a
video track that spun off as a separate workshop in 2003. The TREC video
track/workshops conducts evaluations of video IR systems and/or
components. The last two editions had tasks in shot segmentation,
feature extraction, story segmentation and content-based search. The
TREC video 2003 and 2004 corpus contains 180+ hours of MPEG-1 american
TV news. The talk will give an overview of the TRECvid activity and of
the CLIPS participations to it (see the
official TRECvid
web site).
Download slides here:

Human activity recognition for video
understanding
Francois Bremond and Monique Thonnat
(Orion, INRIA Sophia Antipolis, France)
This talk will present how to extract
automatically semantic information from videos captured by fixed
cameras. In particular this information consists in the detection of the
present mobile objects (persons or vehicles), their trajectory in 3D,
their behaviours, and the recognition of complex scenarios. This
objective is achieved by using both computer vision techniques (motion
detection, tracking, 3D analysis) and artificial intelligence techniques
(a priori knowledge representation of 3D the environment and the
scenarios models, ontology-based knowledge acquisition and
spatio-temporal reasoning). The results are shown on different real
applications as metro stations video surveillance and bank monitoring.
We are currently working on extending these techniques for medical
purposes (e.g. surveillance of the elderly) in cooperation with NCKU in
Taiwan. Within the Isere project we could propose our video event
ontology (work done in cooperation with American vision laboratories
within ARDA workshops) as a basis for semantic multimedia video
description.
Download slides here:

From Person Identification To Gesture
Analysis
Philippe Joly (IRIT, France)
This talk will present two different types
of works lead at IRIT in Toulouse for surveillance applications. The
first one aims at automatically labelling characters appearing on the
screen along the time. The goal is to give a same label to a same
character for each of its occurrences in a given document. The main
feature used to achieve this goal is the colour of to the costume. Some
examples will be shown. The second part of the talk will present some
works made on gesture analysis. On the base of a hierarchical model of
the human body, the limbs of a human shape extracted form a video stream
are localized and used to characterize some predefined gestures. Here
again, some examples will be shown.
Download slides here:

Content-based Retrieval of Object Movies
Yi-Ping Hung (National Taiwan
University)
I shall present an approach to retrieving
object movies based on their content. Here, an object movie refers to a
set of images captured from different perspectives around a 3D object.
In our digital museum project working together with National Palace
Museum and National Museum of History, we have adopted object movies as
the 3D representation, for it photo-realistic view effect and for its
ease of acquisition. In order to retrieve the desired object movie from
the database, we first map an object movie into a manifold in the
feature space. Two different sets of feature descriptors, one dense and
one condensed, are designed to sample the manifold. Based on these
descriptors, we define the dissimilarity measure between the query and
the target in the object movie database. The query we considered can be
either a complete object movie or simply a subset of views. At the end,
I'll also introduce some of our other projects on visual surveillance.

|
|
Multimedia Fusion &
filtering
|
Experiential Sampling in Multimedia
Systems
Mohan Kankanhalli (NUS, SoC, Singapore)
Multimedia systems must deal with multiple
data streams. Each data stream usually contains a significant volume of
redundant noisy data. In many applications, it is essential to focus
computing resources on a relevant subset of data streams at any given
time instant and use it to build the model of the environment. We
formulate this problem as an experiential sampling problem and propose
an approach for utilizing computing resources efficiently on the most
informative subset of data streams. We generalize the notion of static
visual attention to multimedia data streams in a dynamical systems
setting. The goal-driven generalized attention is maintained by a
sampling representation that uses the current context and past
experience for attention evolution. We will briefly talk about the
theory as well its applications and areas for future work.
This work has been done jointly with Jun
Wang of the Delft University of Technology and Ramesh Jain of
GeorgiaTech.
Semantic Processing of News Video by
Fusing Multi-Modal Information Sources and External Knowledge
Tat-Seng CHUA (School of Computing,
National University of Singapore)
For many years, we have been working on
different isolated islands of technologies in tackling the complex
problems of multimedia information processing. Only recently have we
begun to use different media types, such as visual, audio, and text
routinely to analyze video contents. The use of only intra-contents,
however, is still inadequate. To progress further, we need an equivalent
of XML-like meta-level model to encode domain knowledge, and the
judicious use of external knowledge, like the redundancy of web,
ontology, and linguistic resources (dictionaries, encyclopedia) etc.
This research presents a framework for
semantic news video processing that incorporates multi-modality features,
domain model and external knowledge. We illustrate our framework based
on the tasks of segmenting news video into story units, extracting
semantic concepts in images/video, and retrieval. The research has been
applied to over 120-hour of news video in TRECVID 2003/04, and effective
results have been obtained.

|
|
Image indexing |
Some works on Symbolic Photograph Indexing
and Retrieval.
Dr. Philippe MULHEM, CNRS Researcher,
(CLIPS Laboratory, MRIM Group, France)
This talk will present some of the work
done during the fruitfull Singapore-France collaboration in the years
1999-2003 on Symbolic Image Indexing and Retrieval, and some ongoing
research in the Modeling and Retrieval of Multimedia Information Group
if the CLIPS laboratory. I will introduce first the reasons why I
consider Symbolic Image Indexing and Retrieval so important. Then, I'll
explain parts of the work done in cooperation with I2R on image
retrieval dedicated to a 2-level image indexing and retrieval model and
system that combines keywords and conceptual graphs modeling, and also
some contextual information (date and time data). The current works that
will be presented are: - fast relational symbolic indexing and retrieval
- personnalized home photographs retrieval - signal/symbols relational
indexing and retrieval. I will conclude by relating the presented works
and the ISERE project goals.
Download slides here:

Learning and Integrating Semantics for
Image Indexing and Retrieval
Lim Joo Hwee (I2R, Singapore)
To bridge the semantic gap in content-based
image retrieval, detecting meaningful visual entities (e.g. faces, sky,
foliage, buildings etc) in image content and classifying images into
semantic categories based on trained pattern classifiers have become
active research trends. In this seminar, we present dual cascading
learning frameworks that extract and combine intra-image and inter-class
semantics for image indexing and retrieval.
In the supervised learning version, support
vector detectors are trained on semantic support regions without image
segmentation. The reconciled and aggregated detection-based signatures
then serve as input for support vector learning of image classifiers to
generate class-based image indexes. During retrieval, similarities based
on both indexes are combined to rank images.
In the semi-supervised learning approach,
image classifiers are first trained on local image blocks from a small
number of labelled images. Then local semantic patterns are discovered
from clustering the image blocks with high classification output.
Training samples are induced from cluster memberships for support vector
learning to form local semantic pattern detectors. During retrieval,
similarities based on local class pattern index and discovered pattern
index are combined to rank images.
Query-by-example experiments on 2400
unconstrained consumer photos with 16 semantic queries show that the
combined matching approaches are better than matching with single
indexes. Both the supervised semantics design and the semantics
discovery approaches also outperformed the linear fusion of color and
texture features significantly in average precisions by 55% and 39%
respectively.
Download slides here:

Object recognition for semantic image
indexing.
Monique Thonnat and Nicolas Maillot
(Orion, INRIA Sophia Antipolis, France)
This talk will present how to automatically
analyse the contents of images for semantic indexing. In particular the
content of the images is described in terms of the category of the main
object. This objective is achieved by using both computer vision and
artificial intelligence techniques. Computer vision techniques are used
for image segmentation and feature extraction. Two kinds of artificial
intelligence techniques are used: 1) ontology-based knowledge
acquisition techniques for object category description in terms of
visual concepts; 2) automatic numerical learning techniques for mapping
numerical image features and visual concepts. First results are shown on
the indexing of image data base of different sources (TV news and web
images) and containing objects like airplanes and ships. Within the
Isere project we could share common data and tools (image data bases,
evaluation performances, ontologies, ...).
Download slides here:


|
|
Integrating Context |
Multichannel smart sound sensor for
perceptive spaces
Eric Castelli (MICA, Vietnam)
Sound in smart home are usually encountered
for friendly man-machine interfaces, but sound information extraction is
a complex task because of environmental noise and of multichannel
processing need. A multichannel sound processing system capable to
detect and identify sound events in noisy conditions is presented. The
multichannel sound processing allows us to localize the sound in smart
home and to select appropriate signal for identification procedure. This
sensor is real time implemented on PC. The event detection module is
carried out for each channel in real time. The classification module is
launched in a parallel task on the channel chosen by data fusion process.
The aim of this process is to select the channel with the biggest signal
to noise ratio when a multiple detection occurs. The system validation
is made on a test set and is presented with the proposed methodology of
evaluation for a medical telemonitoring application. The obtained
results are allowing us to develop smart home applications.
Download slides here:

Context Aware Health Care Delivery System
Pau-Choo Chung (Department of
Electrical Engineering Institute of Computer and Communication
Engineering National Cheng Kung University Tainan, Taiwan ROC)
This presentation will give an introduction
of our ongoing work, context aware health care delivery system. The
system consists of a daily behavior repository, a behavior analysis
module, a service repository and a service presentation module. The
daily behavior repository stores the daily behavior acquired from
statistic. The behavior analysis module analyzes the behavior of target
people. The service repository stores basic service functions, while the
service presentation module determines the services to be delivered
based on the results acquired from behavior analysis and the environment
contexts. The context aware health care delivery system is also embedded
with the medical teleconsultation system so that a necessary realtime
video conference can be activated when necessary situation arises.
Context aware infrastructure for smart
spaces
Zhang Daqing (Context Aware Systems
Department, Institute for Infocomm Research, Singapore)
Starting from the Pervasive Computing
vision by Mark Weiser, the presentation talks about the challenges in
smart space and introduces an open-standard based service infrastructure
for context aware services in the pervasive environment. The issues like
context modelling, representation, aggregation, discovery and query are
addressed using solutions like Semantic Web, UPnP, AI and Database. A
Semantic Space is proposed and built to support the automatic device/service
plug-and-play, dynamic high-level events inference and rapid application
development.

|
|
Related Projects |
SnapToTell Ubiquitous Information Access
from Camera
Lim Joo Hwee (I2R, Singapore),
Jean-Pierre Chevallet, (IPAL-CNRS, France)
With the proliferation of camera phones,
many novel applications and services will emerge. In this talk, we
present the SnapToTell system, which provides information directory
service to tourists based on pictures taken by the camera phones and
location information. We discuss key issues that motivate the design of
the system and describe the system architecture. Next we present
preliminary experimental results on scene recognition based on a
realistic data set of scenes and locations in Singapore. Last but not
least, we also discuss directions to be taken in the near future and
relation whith the ISERE Project.
Download slides here:

Adaptive and Personalized Delivery of
Multimedia Information
Leow Wee Kheng (Dept. of Computer
Science National University of Singapore)
Thematic Strategic Research Project – UWB &
Pervasive Computing Program
Download slides here:
 |