<--planning, continued ^--Soigan--^ development-->

Soigan - a Multicast XML monitoring system - even more planning

Even more planning

Configuration files - XML

Now that we've got an idea of how Workers, Servers and plugins will communicate with each other, we should figure out how to get them to work with themselves. We'll start by figuring out the kinds of things that need to be configured. Here are a few off the top of my head: A Server will usually act as a Worker as well, since a Server's local, anonymous plugins can be run in the same manner as remote plugins on Workers. Whether the Server executable has a Worker built-in, or whether it's a separate process is up to the implementation.

Networking

In this section, we need to configure what port we listen to for unicast (direct) connections, as well as any multicast configuration. Since I don't know what's needed to set up a multicast connection, I'll have to come back and add that once I've read up on it.

As mentioned above, a Server usually has to act as a Worker to itself, when supporting local plugins. This might cause a problem if we consider the following.

Let's say that Soigan works over a specified port. Say, port 5016. This is the port that a Server will use to talk to a Worker, and is also the port a Worker uses to talk back to the Server. What port, then, does the Worker that's running on the Server listen to?

If the Server and Worker are the same process, then it can receive all connections and distribute them to the appropriate code. But what if we want to run them as separate processes (and really, this makes sense, since we're writing a Worker-only piece of code anyway)?

Instead, I think we should use one port for Workers and one for Servers. A Worker gets its Request on port 5016, and then sends its Response to the Server on port 5017. In the case of a Server talking to its own Worker, it too will talk on port 5016, and listen on port 5017, both through localhost.

<network>
  <worker>
    <port>5106</port>
  </worker>
  <server>
    <port>5107</port>
  </server>
</network>
Note that Workers would only have the <worker> entry above. I'm not sure we need much more than that for now. Maybe implementation will reveal more configuration options.

Permissions

This section will define who is and is not allowed to access the different parts of Soigan. There are three things we need to be concerned with: permission to send Requests to Workers, permission to send Responses to Servers, and permission to get Results from a Server.

The first two we will differentiate by having two different sections, much as we did for the <network> section.

<permissions>
  <worker>
    <allow>soigan.cpsc.ucalgary.ca</allow>
  </worker>
  <server>
    <allow>*.cpsc.ucalgary.ca</allow>
    <deny>public.cpsc.ucalgary.ca</allow>
  </server>
</permissions>
Permission will work on a "only if allowed AND not denied" system. In the above example, all Computer Science machines may send up Responses to the Server, except for public.cpsc.ucalgary.ca. Now, this might be a problem. There's no way of telling that the response from a machine is from a Soigan Worker or from any other XML-RPC program.

Optimally, this would be solved by using XML-RPC the way it was meant to be used, where the Worker would respond to the Request over the connection that was made by the Server, so the Server could know to trust the Response. But this clashes with one of the main tenets of Soigan, which is the support of multicast Requests, which in themselves don't have a connection made.

So, we have to ask ourselves, "how important is security"? Well, we're already talking about a <permissions> section in our configuration, so it must be somewhat important. Right now, none of our communication is encrypted, so it seems that, if we are to continue using XML-RPC, we're not concerned with others reading the information (unless XML-RPC can be done over https? That's something to look into). So we're then concerned with people writing information -- that is, injecting information into Soigan, right?

Perhaps. So how do we fix it? One easy way with the unicast connections is to have the Worker return a secret key when it responds to the Request from the Server. The Server would then require that a Response from that Worker contained that key. But the key is in plaintext, so if someone is also listening to the communication as well as trying to inject information, they can still do so. And this still wouldn't solve the problem of receiving Responses after a multicast Request. Even if the Server sent a key in the Request, listeners could still grab it from plaintext.

So for now, we're going to ignore permissions, only until we get the system going in an environment that we trust (by obscurity, in that no one has heard of Soigan, so wouldn't know how to talk to it and therefore how to mess with it). But for "completeness" for now, we'll address the third permissions issue now, which is for Clients to get Results from a Server:

<permissions>
  <worker>
    <allow>soigan.cpsc.ucalgary.ca</allow>
  </worker>
  <server>
    <allow>*.cpsc.ucalgary.ca</allow>
    <deny>public.cpsc.ucalgary.ca</allow>
    <allow access="client">public.cpsc.ucalgary.ca</allow>
  </server>
</permissions>
By default, the access attribute of <allow> and <deny> would be all, so if a machine is denied access to all, but allowed access to client, it could retrieve the data but not supply it.

When we come back to look at permissions again, we might also consider only allowing permission to certain plugins from certain hosts. Whether this belongs in the <plugin> section or the <permissions> section, I'm not sure, but I can waffle on it for now and ignore it completely.

Plugins

We're finally at a configuration section that will take a little thought. Here we need to tell Workers what plugins they have available, what the corresponding command-line is, and how to pass parameters from the Request to that command. Additionally, we might need to specify how to take the output of the command and turn it into the right structure needed for the Response.

Let's start with the two example plugins from before.

<plugins>
  <plugin>
    <name>ping</name>
    <params>
      <param name="pinghost">host</param>
      <param>timeout</param>
    </params>
    <command type="nagios">
      <exec>check_ping</exec>
      <run>-H $pinghost -t $timeout -w1000,100% -c1000,100%</run>
    </command>
  </plugin>

  <plugin>
    <name>users</name>
    <params>
    </params>
    <command>
      <exec>soigan_users</exec>
    </command>
  </plugin>
</plugins>
The first half of the example, for ping, shows off two different features in the plugins configuration. First is the way that Soigan should pass parameters to the command-line programs. Each <param> specifies a parameter from the Request that should be put into an environment variable before the command-line is executed. We see that the host parameter is being set, but instead of placing it into $host (in case it's already used, maybe), we tell it to use $pinghost instead. The timeout parameter keeps its name. We then see the command-line for running ping, using the environment variables we've just set, as well as supplying any other flags or parameters needed. Note that the <command> has the type attribute set to "nagios", so the Worker knows to treat the output accordingly. The <run> element is optional, but specifies the command-line options to use if any are required. <

The second half is the simplest configuration possible; there are no parameters passed in from the Request, and no command-line options for the plugin.

Services

Here the Server is told which Workers to contact, and when; what Requests should be done over multicast, and what Workers might be sending unsolicited information to it.
<workers>
  <worker>
    <host>public</host>
    <services>
      <service>
        <name>users</name>
	<params/>
        <period units="minutes">5</period>
      </service>
      <service>
        <name>df</name>
	<params>
          <param name="flags">-k</param>
        </params>
        <period units="hours">1</period>
      </service>
    </services>
  </worker>

  <worker>
    <host>localhost</host>
    <services>
      <service>
        <name>ping</name>
        <params>
          <param name="host">mailhost</param>
          <param name="timeout">1</param>
        </params>
        <period units="minutes">5</period>
      </service>
    </services>
  </worker>

  <worker type="multicast">
    <host>224.0.0.1</host>
    <services>
      <service>
        <name>zombie</name>
        <params/>
        <period units="hours">1</period>
      </service>
    </services>
  </worker>

  <worker type="unsolicited">
    <host>imap</host>
    <services>
      <service>
        <name>ps</name>
        <params>
          <param name="process">imapd</param>
        </params>
        <period/>
      </service>
    </services>
  </worker>
</workers>
Let's look at each part separately.
  <worker>
    <host>public</host>
    <services>
      <service>
        <name>users</name>
	<params/>
        <period units="minutes">5</period>
      </service>
      <service>
        <name>df</name>
	<params>
          <param name="flags">-k</param>
        </params>
        <period units="hours">1</period>
      </service>
    </services>
  </worker>
This configures two Services (I'm not sure it's necessarily the best term, but it'll do for now) for a single host. This is a unicast configuration, where the Server will call this Worker directly with XML-RPC. We see two Services, users and df.

users takes no parameters (this version of the plugin just returns how many users and their logins, as we saw before), but the element is currently required, so it is left blank. The <period> element has a required attribute, units, that states what the following value denotes. I don't know if I really like the term "period", or the idea of supporting units this way, but until I come up with something better, it'll do. Nagios uses minutes as its unit, and sets a notification_interval. I might decide that's the way to go.

So we've configured our Server to call the Worker public every 5 minutes to get a list of users. The other Service, df, will return the disk usage on public every hour. We also have a parameter here called flags with the value of "-k". This will be passed in the Request and onto the plugin, which will return the disk usage in kilobytes. We might also support supplying a device or filesystem to specifically check, but in this case, we did not.

  <worker>
    <host>localhost</host>
    <services>
      <service>
        <name>ping</name>
        <params>
          <param name="host">mailhost</param>
          <param name="timeout">1</param>
        </params>
        <period units="minutes">5</period>
      </service>
    </services>
  </worker>
Here we're again setting up a unicast call, but this time it's to the Worker running on the Server, so it'll be an anonymous plugin. Sure enough, it's our ping plugin from before. The setup is the same as the previous example, but is shown here to show that calling local plugins is no different than remote ones.
  <worker type="multicast">
    <host>224.0.0.1</host>
    <services>
      <service>
        <name>zombie</name>
        <params/>
        <period units="hours">1</period>
      </service>
    </services>
  </worker>
The multicast type of <worker> sets up a broadcast Request. The <host> element contains the multicast address to use -- this may not be correct, as I haven't learned how to do multicast yet -- and might be accompanied by other information that may be needed to set up a multicast request. Other than that, the rest of the <worker> section is pretty much the same as the others. Here we've got a plugin called zombie that will report back any zombie processes running on any Worker, every hour.
  <worker type="unsolicited">
    <host>imap</host>
    <services>
      <service>
        <name>ps</name>
        <params/>
        <period/>
      </service>
    </services>
  </worker>
</workers>
The final example lets the Server be prepared for unsolicited Responses from imap. Note that both the <params> and <period> elements are empty; this is because the Server doesn't have any control over these -- the Worker has been told to send the plugin's results to the Server at some unknown frequency, and with some unknown parameters.

But... how does a Worker know that it's supposed to send these unsolicited Responses? This is the only use for the Services section of the configuration file for a Worker

  <worker type="unsolicited">
    <host>soigan</host>
    <services>
      <service>
        <name>ps</name>
        <params>
          <param name="process">imapd</param>
        </params>
        <period units="minutes">5</period>
      </service>
    </services>
  </worker>
</workers>

Other considerations

Grouping

There might be a need to group Workers, Servers or plugins into different groups, to easier supply rules for them. For instance, we might want to allow a specific set of 34 machines to have access to five different plugins. Instead of having many, many lines in the configuration file, a <group> tag might help. Nagios does this quite well, and uses it for host groups and contact groups.
<--planning, continued ^--Soigan--^ development-->
©2002-2017 Wayne Pearson