<--next stage, continued ^--Soigan--^ stage 4-->

Soigan - a Multicast XML monitoring system - even more of the next stage

Even more of the next stage

So what now? Well, during my stress tests, I did see some behaviour I didn't like.

Right now, when service.run() is called, a one-time event is added into the queue. To ensure that service.run() gets the correct response (in the event that the same Service already exists on the Server as a periodic request), the name is kludged to be onetimeplugin@host. Hokey.

Also, my LoggerListener class, which dumps all content to screen, doesn't differentiate between plugin Responses and schema Responses. As a matter of fact, the ResultListener setup I have takes Listeners as objects that register the fact that they're interested in specific Services. But what if I'm returning a Result for a one-off, or from a duplicate added with service.add(), or it's a schema instead of the plugin's output?

I think I'm going to expand the Service class once again, to go from "plugin@host" to "plugin[:specifier]@host". This specifier would be "schema" for schemas (who:schema@nsd), and perhaps an identifier for one-off runs or specific instances (who:0x12ab34@nsd). Maybe "callback" for unsolicited Services (last:callback@worker). This allows the following scenario:

A Client wants to make a Query about the ps@* Service (this would be the showproc program we've talked about). As it stands, a one-time Request would be made by the Server, over multicast, and a short-lived Listener would be setup within the Server to handle the Responses and turn them into Results. All the Workers would send Responses back to the Server, which would get passed to the Listener, which passes them back to the Client.

It sounds good, but there are two problems (well, three, but we'll address the third one separately). The first is that there may be other Listeners to "ps@*", "*@worker" or "ps@worker" that don't want to know about this one off call -- they're set up to just hear the Results from a regularly-scheduled event. The second is that since this short-lived Listener is listening for "ps@*", it might also get other "ps@*" Responses that it doesn't want -- ones that were generated by a different Request (most likely the one that the previous problem's Listener was waiting for).

This can be solved by using the specifier. The Client would still run service.run("ps@*",["user=crwth"]). The Server would create a Listener, as before, but it might use the Listener's address in memory as the specifier (or a timestamp, or some other identifier - we want it to be distinct from the "regular" Requests going around, and even if we get a duplicate specifier (two Clients make the same request at the same millisecond, getting the same timestamp), the results will be equivalent (unless the Plugin has a side-effect, of course!)). So the Listener is told to listen for "ps:0x12ab34@*". The Server then creates a Request for "ps:0x12ab34@*" over multicast.

The Workers now get the Request they would check the specifier to see if it affects them (if it were ":schema", for instance, they would respond differently). The Plugin is run, and the Response sent to the Server would be for the "ps:0x12ab34@worker" Service. When the ResultListener looks for Listeners for that, it would ignore someone listening for "ps@worker", "ps*@" and "*@worker", because this Response has a specifier. It would, however, give the Response to "ps:0x12ab34@worker", "*:0x12ab34@worker", "ps:0x12ab34@*" and "*:0x12ab34@*". All (and only) the correct Listeners get the Response and send a Result back to the Client.

I think this looks good. I don't see any immediate problems (except that we're using the specifier for more than just an identifier, by using "schema" and possibly future strings). Because we're dealing with "specialized" versions of Services, it seems that the schema really falls into this category, and we've hinted at that just now. Because of this, I'm going to drop the two functions schema.get() and schema.results() from the Worker and Server respectively. Instead, the Server will send a Service with ":schema" in it to the Worker, and the Worker will handle this specifically by returning the schema instead of the data. This will still be instigated by the Client calling service.schema() on the Server, and from the Client's point-of-view, the Results are the same.

So what was that third problem I mentioned? It comes down to the multicasting. When a Client makes a Query for a Service like "ps@*", how long should we wait for Responses from Clients? This is kind of tied in with my original thinking with multicasting, where Workers might need to add a self-imposed delay to their Responses to ensure the Server doesn't get overwhelmed. I haven't added that yet, but might have to think about how (an additional parameter in plugin.run(), most likely) if that does happen.

The nice thing about using the Request/Response method is that there's no inherent time limit -- the Worker can take all the time it wants and the Server isn't going to care. A Client might, though, especially since the service.run() Query it just made is waiting for an answer (service.register() allows the Client to relax and wait for Results). Right now, I've got a hard-coded value in there - 3 seconds - where the short-lived Listener will gather all of the Responses it hears in that time and bundle them into one Result. After that, any straggler Responses aren't heard by and Listeners, and their information gets lost. So what is 3 seconds isn't enough? What is the Plugin takes longer than that to run? Or the Worker is busy running twenty other Requests first? (Okay, it's threaded, but the CPU load might be high or something.)

I'm not sure what the answer to this is, yet. Right now, the classes I use allow me to pass in this timeout value for the Listener. But should it be up to the Client to say how long it's willing to wait for Results? Or should the Server decide? Maybe the Workers are best suited to say how long they'll take -- but if they're taking too long, maybe the Worker can't even tell us THAT.

I'm off to implement the specifier change, and will think about this timeout issue. Security is still an issue that I haven't addressed. Also, I should probably stop writing "test" clients and actually write a showproc client as a proof-of-concept, now that multicasting has been added.
<--next stage, continued ^--Soigan--^ stage 4-->

©2002-2017 Wayne Pearson