Responsible for this page: webmaster , cvlwebmaster@isy.liu.se
Page last update: 2010-05-31
In most of the preceding discussion, a feedback perception-action structure has been assumed, which primarily reminds of robotics applications. Does that mean that the preceding methodological structure is only applicable to robotics?
No! Our belief is that the structure discussed is advantageous for all applications of vision including static imagery, as well as cases where the output is not a physical action but the communication of a message. An example of the latter type is man-machine interfaces, where the actions and speech of a human are registered, interpreted and communicated symbolically to a system to implement a very sophisticated control of its functions. Sufficient flexibility and adaptation requires learning for the system to deal with all contextual variations encountered in practical situations.
The training of vision systems for such advanced but non-robotic applications requires the development of mixed real-virtual training environments. In these, the system will gradually build up its knowledge of its environment with objects including humans. The learning is again implemented as association, between the learning system?s own state parameters, and the impinging perceptual parameters. The typical case is as discussed earlier that the system moves an object in front of its camera input. The movement parameters are known to the system and can be associated with the percepts appearing as results of the movements. This can in training environments be simulated in various ways such that corresponding state and percept information is made available to the system.
In such a way, competence can be built up step by step, in a mixture of real and virtual training environments. With a design allowing incremental learning, it shall be possible to start out with reasonably crude virtual environments, to give the system some tentative knowledge of object and environment space structure, which is refined in the real environment. An important feature is that copies can be made of a trained system, to effectivise production. On the other hand, it may not be possible to easily copy particular information from one trained system to a differently trained system. This is because new information has to be incorporated on the terms of the system acquiring it, i.e. connected to other stored information. This can not be implemented as a copying process, but the information has to be supplied over the normal channels, where corresponding state and percept information can be made available to the system for organisation on its own terms.
From this derives our view that the development of powerful Cognitive Vision systems inevitably has to go the path over Perception-Action mapping and Learning similar to Robotics, even if the systems will be used for interpretation of static imagery or to generate and communicate messages to other systems.