Animation of spatio-temporal generic models for 3-D shape and motion of objects and subjects, based on feature sets evaluated in parallel from several image streams, is considered to be the core of dynamic vision. Subjects are a special kind of objects capable of sensing environmental parameters and of initiating own actions in combination with stored knowledge. Object / subject recognition and scene understanding are achieved on different levels and scales. Multiple objects are tracked individually in the image streams for perceiving their actual state (‘here and now’). By analyzing motion of all relevant objects / subjects over a larger time scale on the level of state variables in the ‘scene tree representation’ known from computer graphics, the situation with respect to decision taking is assessed.
Behavioral capabilities of subjects are represented explicitly on an abstract level for characterizing their potential behaviors. These are generated by stereotypical feed-forward and feedback control applications on a separate systems dynamics level with corresponding methods close to the actuator hardware. This dual representation on an abstract level (for decision making) and on the implementation level allows for flexibility and easy adaptation or extension. Results are shown for road vehicle guidance based on three cameras on a gaze control platform.