Chairperson:
Antonio Fernández-Caballero. Co-chair:
Rafael
Martínez-Tomás.
High level vision (HLV) is defined as scene interpretation rather than just mere object recognition and
tracking. This implies recognising situations, activities and interactions among the different agents
participating in a scene. But observation does not necessarily only use video cameras; modern
systems tend to combine other sensors to support, complement or confirm the information extracted
from video signal.
It is a question of linking the physical signals that reach a camera and other sensors with the
interpretation of their meaning (particularly in HLV). When a human observer interprets the meaning
of a scene, obviously, he/she uses his/her knowledge of the world, the behaviour of the things, the
laws of physics and the set of intentions governing the agents’ activities. It seems logical to use an
explicit and declarative representation of this additional knowledge not included in the signals when
designing an HLV system. To represent this knowledge, the following techniques, available in artificial
intelligence, are used: logic, rules, graphs, finite automata, frames, agents, Bayesian networks, neural
networks, constraint programming, and so on.
The spectrum of situations and needs is very wide: from the mere detection of movement in a
controlled space to an integral control system of the scene, which would include: (1) multisensory
monitoring of complex scenes with different actors, with actions occurring in parallel, requiring
concatenation of scenes in different spaces, etc, and, (2) diagnosis of the situation observed
(abstraction) and control on actuators searching for new data and findings.