Jeffrey E. Boyd | Main / Video Information Servers

Main / Video Information Servers

Video Information Servers and CaML

Maxwell Sayles (2002)

This document contains 4 sections:

Description of CaML.
Description of the CaML Protocol.
How to Run the Server.
Description of the class files used by the CaML Server.

Description of CaML

CaML stands for Camera Markup Language. It is meant to be a protocol for distributing almost any type of processed camera data across a network. The principle idea is that in many video processing applications, certain tasks have been well researched, well tested, and well published. Rather than reimplementing the basic framework from application to application, a CaML server will perform the basic operations required of the application and then communicate the results of those operations in an easy to read Markup Language. This markup language is the CaML.

CaML is meant to be a negotiation between the client and the server where both requests and responses take the same form. In this situation the majority of the tags (in fact all except the initial <CaML> and final </CaML> tags) may be optionally included or excluded. Furthermore, if a tag is present in the document, either the client or the server may ignore the tag.

An example CaML document describing the system state is as follows:

<?xml version="1.0" encoding="iso-8859-1"?>
<CaML version="1.0" id="morse" received_at="2557.036" generated_at="2557.037">
        <video>
                <URL>136.159.10.66:47772</URL>
                <width>320</width>
                <height>240</height>
                <format>JPEG</format>
                <quality>10</quality>
        </video>

        <objects>
                <URL>136.159.10.66:47773</URL>
        </objects>

        <calibration_matrix>
                <rows>3</rows>
                <columns>4</columns>
                <elements>
                        12.0111142336 -21.5643221086 4.4931787861 0.2897473789
                        -0.8053618985 -1.6692540724 22.8213908659 -53.2487828207
                        -0.0358006705 -0.0754323871 0.0275177356 0.0124607047
                </elements>
        </calibration_matrix>
</CaML>

This document is more likely to be generated by the server than it is to be sent by the client. This is the case because the server will define the URL for the video and centroid streams as well as the calibration matrix used by the camera.

Many of the tags may be provided by the client for on-the-fly configuration purposes. A document with similar form, sent by the client might look like:

<?xml version="1.0" encoding="iso-8859-1"?>
<CaML version="1.0">
        <video>
                <width>320</width>
                <height>240</height>
                <format>JPEG</format>
                <quality>10</quality>
        </video>
</CaML>

This document would be requesting that the video be streamed at a resolution of 320x240 in JPEG format with a quality of 10%.

However, as compression and streaming of video data is expensive, and as the CaML server would typically be able to support multiple connections, it would be expensive to allow custom streamed types for each connection. Therefore, the type of video stream provided is setup at configuration time and would not be done on-the-fly. If this document were sent to such a server (which is the case with our implementation of a CaML server) the video tag (and sub tags) would be ignored.

In the above case (and again in the case of our server) the minimal document necessary to send to a CaML server is:

<?xml version="1.0" encoding="iso-8859-1"?>
<CaML version="1.0">
</CaML>

This would cause the server to respond with a CaML document that included all the tags recognized by the server.

Another example of a document being used to describe/configure the system state is within a system that supports a motorized pan/tilt/zoom camera head. The following document may be sent by the server to describe the current pan/tilt/zoom settings:

<?xml version="1.0" encoding="iso-8859-1"?>
<CaML version="1.0">
        <camera_control>
                <min_pan>-880</min_pan>
                <max_pan>880</max_pan>
                <pan>0</pan>
                <min_tilt>-300</min_tilt>
                <max_tilt>300</max_tilt>
                <tilt>0</tilt>
                <min_zoom>0</min_zoom>
                <max_zoom>1023</max_zoom>
                <zoom>0</zoom>
        </camera_control>
</CaML>

After determining the desired pan/tilt/zoom settings from the current settings and the minimum and maximum values, the client could configure the server by sending the following document:

<?xml version="1.0" encoding="iso-8859-1"?>
<CaML version="1.0">
        <camera_control>
                <pan>400</pan>
                <tilt>400</tilt>
                <zoom>400</zoom>
        </camera_control>
</CaML>

The server would parse this document and then apply the new settings by panning, tilting, and zooming the camera. The server when then return a new document reflecting the new settings.

As mentioned above regarding the video stream parameters, many parameters of the CaML server configuration would be fixed when the server begins running. Such parameters can be specified in a configuration document, which is also a CaML document, that the server reads when it performs initialization. In our implementation of a CaML server we required information about the server name, the port the main document server runs on, the port for the streaming video, the port for the streaming object information, the calibration matrix, and optional video parameters and centroid processing parameters. The configuration file follows (mandatory tags are in black, optional tags are in blue):

 <?xml version="1.0" encoding="UTF-8"?>
 <CaML version="1.0" id="morse">
         <URL>localhost:47771</URL>

         <video>
                 <URL>localhost:47772</URL>
                 <device_path>/dev/video0</device_path>
                 <width>320</width>
                 <height>240</height>
                 <format>JPEG</format>
                 <quality>10</quality>
         </video>

         <objects>
                 <URL>localhost:47773</URL>
         </objects>

         <calibration_matrix>
                 <rows>3</rows>
                 <columns>4</columns>
                 <elements>
                         12.0111142336 -21.5643221086 4.4931787861 0.2897473789
                         -0.8053618985 -1.6692540724 22.8213908659 -53.2487828207
                         -0.0358006705 -0.0754323871 0.0275177356 0.0124607047
                 </elements>
         </calibration_matrix>
 </CaML>

Description of the CaML Protocol

CaML defines two protocols for the transfer of documents. The first defines how a client would send a document to the server specifying the desired configuration, and then how the server sends a document to the client specifying the current configuration. This is similar to a request/response system. The second involves streaming CaML documents.

The first protocol is for the client to connect to the CaML server port and send its CaML document. The client should then shutdown the sending end of the socket so that no further data may be sent from the client to the server. This is how the server knows the CaML document is complete. At this point, the CaML server will process the document and make setting adjustments as it sees fit (which may be no adjustments at all). The CaML server will then send back a CaML document describing the system state using all the tags the CaML server recognizes. Finally, the CaML server will shutdown the receiving end of the socket so that no further data may be received by the client from the server. This effectively closes the socket and disconnects the client from the server. The client (similar to the server) then parses the document applying the tags it finds useful and ignoring those that do not provide relevant information.

In the case of streaming data such as video (in the above CaML document example) or centroid data, a URL tag is provided describing the location for the client to connect to in order to begin receiving streaming data. The protocol for each stream is custom defined, however, it is recommended that when possible the data should be in an XML (preferably CaML conforming) format.

In our implementation of a CaML server, we provided a simple protocol for streaming video. First the client must send one byte of data to the server. This is for synchonization purposes so the server does not flood the pipe with sequential images that the client may not be fast enough to download. The server will ignore this byte, but in response, the image timestamp is sent as an ascii string representing a floating point value in seconds followed by one space. Then the size of the JPEG image is sent as an ascii string representing an integer value, the number of bytes in the JPEG file, followed by one space. Finally the raw JPEG bytes are sent. The server then waits for another byte to be received from the client (again for synchronization purposes).

In addition, our implementation of a CaML server calculates centroids from the camera image data. We selected this feature because the processing of image data for centroids is a common feature required by many video applications. As techniques for collecting centroid information are well researched, documented, and implemented, it does not make sense for developers who desire centroid information to reimplement centroid processing algorithms. This becomes the primary feature of CaML: a common framework is performed by the CaML server so it is no longer necessary for clients to do this work; they only need to connect to the CaML server to receive the streaming information.

The URL for the streaming centroids is sent in the main CaML document using the following tags.

<?xml version="1.0" encoding="iso-8859-1"?>
<CaML version="1.0" generated_at="2445.629">
        <objects>
                <URL>136.159.10.66:47773</URL>
        </objects>
</CaML>

The client can receive the centroid stream by connecting to the specified URL. The protocol is similar to the streaming video in that the server will wait until is has received one byte of data from the client and then it will send the current centroid data. After that, the centroid data is sent as a CaML document and the server then waits for another byte of data to be received. An example CaML document containing centroid information is as follows:

<?xml version="1.0" encoding="iso-8859-1"?>
<CaML version="1.0" generated_at="2445.629">
        <objects count="2">
                <object id="3067">
                        <center>
                                <u>29</u>
                                <v>65</v>
                        </center>
                </object>
                <object id="3068">
                        <center>
                                <u>41</u>
                                <v>17</v>
                        </center>
                </object>
        </objects>
</CaML>

We had some difficulty streaming consecutive XML documents because the SAX parser we used did not return control until the document was finished. Because an open socket will continue to receive data (or block) until the socket is closed, the SAX parser never thought the document was finished. Our solution was to encapsulate the socket class and simulate the socket being closed when the </CaML> tag was received (this tag denotes the completion of the CaML document). The socket was then simulated as reopened and another byte was sent to the server. The server would respond by sending the next set of centroids in another CaML document.

Further subtags could be added to the enclosing <object> tag. These tags appear in blue.

 <?xml version="1.0" encoding="iso-8859-1"?>
 <CaML version="1.0" generated_at="2445.629">
        <objects count="1">
                <object id="3067">
                        <center>
                                <u>29</u>
                                <v>65</v>
                        </center>

                        <bounds>
                                <top>5</top>
                                <left>89</left>
                                <bottom>157</bottom>
                                <right>117</right>
                                <width>28</width>
                                <height>152</height>
                                aspect_ratio>0.18421052631578946</aspect_ratio>
                        </bounds>

                        <mean_rgb>
                                <r>12</r>
                                <g>34</g>
                                <b>87</b>
                        </mean_rgb>
                </object>
        </objects>
 </CaML>

The above example tags would provide information about the object bounding box and mean color.

How to Run the Server

Contained in the parent directory of this HTML document is our implementation of a CaML Server. To run the server you will need the Python language interpreter. We used version 2.2.1. The latest version can be found at http://www.python.org. You will need Video For Linux (VFL) to be installed with an appropriate capture device. If the linux device is not /dev/video0, you will need to make the appropriate change in the configuration files. You will also need the Python Image Library (known as PIL, and located at http://www.pythonware.com/products/pil/) to be installed as well as the IJG JPEG library (http://www.ijg.org) needed for the JPEG video streaming. Finally, you will need the Intel Performance Primitives (IPP) to be installed. This is used for the optical flow algorithms involved in object centroid calculation. These libraries can be found at http://www.intel.com/software/products/ipp/ippvm20/index.htm.

You will need to add the path to the CaML server to your LD_LIBRARY_PATH environment variable, as well as the path to the IPP shared objects. tcsh e.g. setenv LD_LIBRARY_PATH .:/usr/lib:/usr/local/lib:/opt/intel/ipp/lib bash e.g. export LD_LIBRARY_PATH=".:/usr/lib:/usr/local/lib:/opt/intel/ipp/lib"

You will need to know a 4x3 calibration matrix for your video camera, and you will need to make a configuration file in the form of a CaML document. Example configuration files (also the configuration files we used in our CaML system) are morse_config.caml and forst_config.caml.

To startup the server type: python CaMLServer.py morse_config.caml at the command line. Where morse_config.caml is the name of your configuration file. If there are no errors your display should look something like:

>>> Reading config file: morse_config.caml
>>> Finished parsing config document.
Server Name: morse
CaML Server Port: 47771
Video Server Port: 47772
Objects Server Port: 47773

VideoProcessor: starting up...
VideoProcessor: starting up processing thread...
VideoProcessor: finished startup
VideoServer: starting up listening thread...
VideoModule: Successfully initialized.
ObjectsServer: starting up listening thread...
VideoServer: waiting for connection on port 47772
ObjectsModule: Successfully initialized.
VideoModule: Startup was successfull.
ObjectsModule: Startup was successfull.
CaMLServer: waiting for a connection on port 47771
ObjectsServer: waiting for connection on port 47773

To terminate the server, simply hit CTRL-C. If there is an error during initialization or shutdown, the server may not terminate properly. You may need to hit CTRL-Z to suspend the task and you may then need to kill the corresponding python processes.

Once the server is up and running, you can run some tests on it by changing to ../testing. To test the CaML output, run: python TestCaMLServer.py morse where morse is the machine name or IP of the machine running the CaML server. This program assumes the CaML server is running on port 47771. To test the video and centroid output type: python TestCentroids.py morse again, where morse is the machine name or IP of the machine running the CaML server. This program assumes the video server is running on port 47772 and the object server is running on port 47773.

Description of the class files used by the CaML Server.

In the parent directory to this document are the Python files used to implement a CaML server.

A brief description of each file is as follows:

AbstractModule.py - an abstract CaML module. extend this to implement your own CaML modules.
AbstractVideoProcessor.py - the interface to a video processor. the video processor is meant to process images in another thread.
CalibrationModule.py - CaML module containing calibration matrix information. only returns the calibration matrix. performs no operations.
CaMLDispatcher.py - called by the CaMLParser in order to dispatch CaML tags to the appropriate CaML modules.
CaMLDocument.py - a Python class that represents the tags of a CaML document.
CaMLGenerator.py - used to generate a CaML document.
CaMLParser.py - basically a wrapper between the GenericParser and the CaMLDispatcher.
CaMLServer.py - this is the core program. instantiates all necessary modules and starts the main CaML server.
ControllerModule.py - CaML module for controlling a pan/tilt/zoom camera. will adjust the pan/tilt/zoom settings from a client document. will return the current settings to the client.
EVI_Controller.py - class for controlling a pan/tilt/zoom camera.
GenericParser.py - a generic data structure for any XML document.
ObjectsModule.py - CaML module for object data. Dispatches the URL of the ObjectServer to the client, as well as dispatches new object data to the object server.
ObjectsServer.py - Serves a stream of CaML documents containing object data.
Settings.py - Contains general settings for the CaML server, such as port info and calibration matrix.
TimeServer.py - serves time... marries bubba. (actually, just provides a time function that gives time relative to server startup)
VideoModule.py - CaML module for video stream. contains the URL of the video server, and dispatches new video data to the video server.
VideoProcessor.py - performs all the image processing for object information. uses Lucas-Kanade optical flow to determine object centers.
VideoServer.py - serves up a JPEG image stream.

The files that will be of concern to people who want to extend our implementation of a CaML server will be: AbstractModule.py, AbstractVideoProcessor.py, and CaMLServer.py.

Suppose you want to write a module the provides image settings such as brightness, hue, and contrast. You would need to write a module that extends AbstractModule, and provides implementations for the methods: getName, and generateStats (this is the minimum set of methods that must be implemented). For example:

from AbstractModule import AbstractModule

class ImageSettingsModule (AbstractModule):

        def getName (self):
                return "image_settings"

        def generateStats (self, xmlGenerator):
                xmlGenerator.writeElement ('brightness', str(self.brightness))
                xmlGenerator.writeElement ('hue', str(self.hue))
                xmlGenerator.writeElement ('contrast', str(self.contrast))

Then you would need to register this module with the CaMLDispatcher. In the file CaMLServer.py, around line 70, you would add the line:

 dispatcher.addModule (ImageSettingsModule())

You would also need to make sure that your module is imported at the top of CaMLServer.py.

After starting the server up, documents that the client receives from the server would have the additional tags (in blue):

 <?xml version="1.0" encoding="iso-8859-1"?>
 <CaML version="1.0">
        <image_settings>
                <brightness>100</brightness>
                <hue>50</hue>
                <contrast>25</contrast>
        </image_settings>
 </CaML>

See the file AbstractModule.py for a complete list of methods that may be overriden to provide functionality in a CaML module.

If you would like to do additional processing, other than centroid calculation, there is no fixed way that this must be done. It may however be helpful to implement the interface found in AbstractVideoProcessor.py. By implementing this interface, many CaML modules can be notified when new information becomes available from the processor. Furthermore, the VideoModule can still be used to provide streaming video images. As an example of how to extend the AbstractVideoProcessor see VideoProcessor.py. Data made available from an AbstractVideoProcessor will likely need to be streamed. There are two examples of this in VideoServer.py and ObjectsServer.py.