In Klir, G.J., Ed. Applied General Systems Research. pp. 91-104. New York, USA: Plenum Press,1978
B. R. Gaines
Man-Machine
Systems Lab., Dept. of E. E. Science
University of Essex, Colchester, Essex, U.K.
The term identification was introduced by Zadeh (1956) as a generic expression for the problem of "determining the input-output relationships of a black box by experimental means." He cited the various terminologies then prevalent for the same problem: "characterization" "measurement," "evaluation," "gedanken experiments" etc., and noted that the term "identification" states "the crux of the problem with greater clarity than the more standard terms above."
Zadeh formulates the general identification problem as: given
determine, by observing the response of x to various inputs, a member of M which is equivalent to x in the sense that its responses to all time functions in the input space of x are identical to those of x.
Over the next twenty years the identification problem became an essential area of study for modern control theory justifying a continuing series of major I.F.A.C. symposia concerned with this topic alone, and books such as that of Eykhoff (1974) concerned primarily with system identification. The major effort in control research has tended to be with systems modelled as linear and continuous in their state variables and either continuous, or uniformly sampled, in time. Such work has found a wide range of practical applications in plant measurement and, to a lesser but significant, extent in on-line adaptive control. Similar developments have found important applications in signal processing for radar and telecommunications where adaptive filtering techniques based on channel identification are now routinely applied in commercial systems.
As usual, techniques based on linearity and continuity begin to break down when applied to non-artificial systems, for example biological and economic modelling. Significant practical use of the linear "describing function" has been made, for example of the pilot in aircraft design. However, human motor control is known to be based on discontinuous decision-making leading to discrete corrections, rather than the smooth, linear motion of classical servomechanisms. Interesting studies have been made of the human operator using alternative modelling techniques, ranging from sampled-data linear systems, through Wiener kernels, to finite-state machines (Gaines, 1969).
As the extremes of biological systems behavior are approached, for example, in animal ethological studies (Dawkins and Dawkins, 1974; Dawkins, 1976; Vowles, 1970), where the data is often purely descriptive having no metrical structure, linear systems techniques become totally inapplicable. Here, the times-series to be modelled are strings of arbitrary symbols and one has crossed into the domain of automata theory and the problems of grammatical inference (Fu and Booth, 1975). Similarly, in studies of picture processing, one obtains spatial, rather than time, series which are most appropriately studied in grammatical terms (Fu, 1974).
It is interesting to note that Zadeh recognized this spectrum of problems in his discussion of system "identification" some twenty-one years ago, although the main part of his paper was concerned with continuous system identification. Indeed, it was probably Moore's classic paper on "Gedanken Experiments on Sequential Machines," published the previous year, that triggered off the generalization of a variety of approaches to system measurement to being a general problem of identification.
This paper is a brief survey of the current state-of-the-art of system identification, concentrating on the general problem rather than the massive amount of work on special cases. Eykhoff's book, the I.F.A.C. symposia, and I.E.E.E. transactions on Control, Information Theory, Circuits and Systems, System Man and Cybernetics, etc., give accounts of specific states-of-the-art. I have previously published a comprehensive formulation of general systems identification giving over 200 related references (Gaines, 1977a). This present note abstracts the key features of that formulation and brings the earlier report up-to-date with notes on recent developments.
System identification is, in one sense, a comparatively new concept, yet in another it is ancient -- the process of acquisition of knowledge about the nature, or structure, of the world from observations of its behaviour generates the classic epistemological problem described so graphically in Plato's Republic through the "simile of the cave." It is a central problem in modern philosophies of science, and, as demonstrated by Hume, in an important sense it has no solution (the derivation of a "solution" itself being an epistemological problem requiring a further "solution," ad infinitum).
This lack of a general solution to the problem of knowledge acquisition leads to a new problem, one of ontology -- we actually have to presuppose the nature of "reality," of being, of the world, before we can acquire knowledge about it. Here lies a basic conflict at the heart of the problem of identification -- between pure epistemology on the one hand (knowledge is the raw material of our experience and is prior to all "metaphysical" speculation about being) and ontology on the other (we have a priori reasons to suppose that the world has a certain nature independent of our knowledge of it).
In general systems theory we cannot avoid operating in this region of conflict. The ontological approach to many questions is very satisfying -- one can hypothesize and create structures that are adequate, complete and consistent. Yet invariably the question will arise as to whether this comforting sense of closure is "real" -- if I cannot know what is my justification in saying -- if there is not at least a potential test what is the value of speculation?
There has been a tendency in system theory to adopt the extreme positivist view that the ontological problem is subsumed under the epistemological one. We can talk only of structures we can identify, and our preferred way of talking about them is in terms of the procedures for their identification. We are interested primarily in the best procedures for system identification -- the optimum, correct and unique procedures that would define our channels of access to knowledge and hence, in terms of the above rationale, the nature of being.
In recent years there have been developments that appear to bring us closer to a formulation and solution of the problem of identification than ever before. In concrete terms these command attention because they weave together the many strands of uncertain system theory, probability theory, etc., into a coherent whole, and give the correct answers when applied to such problems as stochastic grammatical inference (Gaines, 1976b, 1977a). In abstract terms, however, these developments turn the wheel a full circle because they highlight starkly the degree of ontological commitment necessary to formulate an epistemological problem in an operationally soluble form. There are decisions to be made that are necessarily arbitrary -- not only is uniqueness and a clear sense of "correctness" lacking, but they are replaced by a degree of universality in which almost any decisions are viable.
I have argued elsewhere (Gaines, 1977b) that it is such dialectical conflicts that have been, and are, crucial to the development of systems theory. It is at the point that we accept that there is no ultimate resolution and begin to examine the conflict itself (rather than indulge in it!) that progress is most rapid. In this context the interesting problems now are, for example, the interplay between presupposition, observation, and modelling. What can we gain in terms of resources saved through pre-suppositions, and conversely, what do we lose in terms of incorrect results when the pre-suppositions are incorrect? There also arise meta-problems, such as, what kind of data will lead us to change not just our model, but rather our presuppositions?
Zadeh's definition given previously forms a convenient framework in which to discuss the general problem of identification. It already exhibits two key features of the problem:
What Zadeh's original definition did not attempt to cover are the additional key features:
These six aspects of the identification problem take it out of the realm of passive data analysis and lead to its rich philosophical foundations requiring rather more subtle formulation in system-theoretic terms than might be expected. In the following section, I will summarize the formulation of the problem given in detail previously, and then go on to discuss some of the problems stated and the current state-of-the-art within this framework.
It is early yet to give a "state-of-the-art" review of identification -- there has been much progress in recent years, but problems remain, particularly in the practical feasibility of applications based on theoretical and computational techniques.
In the sixties, the key developments were the studies of inductive inference by: Watanabe (1960) arising out of his fundamental studies in physics (Watanabe, 1955); Solomonoff (1964) triggered off (Solomonoff, 1959) by Chomsky and Miller's (1957) studies of grammar discovery; and Gold (1967) who was concerned with the fundamental constraints on such inductive inference. These lead to two main centres of activity in the late sixties and early seventies: that developed by Feldman (1967, 1972; Biermann and Feldman, 1972) at Stanford, concentrating on deterministic grammars and leading to the key theoretical studies of Horning (1969) and the applications to automatic programming of Biermann and Krishnaswamy (1976); and that developed by Booth at Connecticut, concentrating on stochastic grammar and leading to the studies of Patel (1972) and Maryanski (1974) on computational procedures for stochastic grammatical inference.
These developments all depended largely on heuristic methods that had high degrees of plausibility but lacked secure formal foundations. A key breakthrough here came in the early seventies, with the results of Goguen (1973), Arbib & manes (1974), and Ehrig (1974), who showed that for wide classes of deterministic systems the solution to the identification problem could be expressed as an adjunction between categories of behaviour and structure. This was a very elegant general framework in which to place identification, and the common characterization of the problem for linear continuous systems and for nonlinear discrete systems gave substance to Zadeh's suggestion some eighteen years earlier (echoed in Arbib's work (1966) on automata and control theory) that there was a single, system-theoretic principle involved.
The next step forward was clearly to extend these results to non-deterministic systems. This appeared a small step, but was not -- in retrospect, it involved the foundations of probability theory which were themselves going through an era of change in the late sixties and early seventies with both "subjective" and "complexity" based approaches gaining strength (Fine, 1973). The problem in going from deterministic to stochastic systems is that the behaviour/structure adjunction seemed crucially dependent on the uniqueness (up to an isomorphism) of the solution to the identification problem. For stochastic systems, such a unique relationship between the minimal structure which will account for a behaviour and the behaviour itself disappears. The structure can now only approximate the behaviour and the more complex the structure, the better the approximation. A similar phenomenon had been noted by Feldman (1972) for the inference of deterministic grammars when the example set is incomplete (as it will usually be for practical examples).
In the limit, this process of improved approximation with increased complexity can lead to ridiculous results. Gaines (1976a) showed that if zero-memory (Bernoulli) probabilistic sequences were modelled by deterministic automata then the ratio of the mean number of states in the automaton required to the length of the observed behaviour was asymptotical unity! In retrospect again, this is a computational-complexity result that nearly all random sequences are maximally complex (Kolmogorov, 1968). However, its epistemological significance is profound. Finite automata form an adequate set of models for all finite sequences of behaviour, i.e., for every behaviour we can always find a model which exactly matches it. The result showed that this concept of "adequacy" was not sufficient to give meaningful solutions to the identification problem.
The solution to these difficulties proved to be to face the lack of uniqueness of the solution to the stochastic system identification problem head-on and define the "solution" to be an admissible subset of models, such that any member of the solution subset, if it is more complex than another, is also a better approximation. This establishes a Galois connection between the orders of approximation and complexity in the subset which Ralescu (1978) integrated recently into an elegant category-theoretic formulation of the identification problem for deterministic, stochastic and fuzzy systems, as an adjunction. No longer is the solution unique, but the subset of admissible solutions is, and it can be characterized in this way.
In one sense the work of Gaines and Ralescu completes the earlier studies of Zadeh, et al., in that both the identification problem and its solution have been formulated in a general system-theoretic framework embedded in category-theory. However, identification is a real-world problem so that the question of the applicability of the results is also highly significant.
A first question is clearly: where do these measures of approximation and complexity come from? At one level the answer is that they are arbitrary -- complexity is our preference ordering on models in themselves, and approximation is our preference ordering on the relationship between a behaviour and models. Any step beyond this involves additional presuppositions.
I doubt that many people will find the statement of the arbitrariness of complexity and approximation satisfactory or, perhaps, satisfying. However, the only resolution is through statement (3) of the previous section that identification has a purpose; the specific purpose of a specific proposal for system identification may lead to acceptable presuppositions that constrain otherwise arbitrary decisions. For example, if we are modelling a time-series and our purpose is prediction and we know the pay-offs for errors, etc., then a definite measure of approximation may suggest itself. This has been studied by Savage (1971), de Finetti (1972), Pearl (1975), et al., in the context of subjective probabilities. Wharton (1974) has given a survey of approximation measures for grammatical inference.
The problem of complexity definition has been treated in the literature by attempts to give intensional definitions of complexity classes, i.e. a logical structure that is plausible in itself and generates a complexity order over a class of models. Sober (1975) discussed this in a philosophical context in his book on Simplicity, and it has become an important topic in system theory (Cornacchio, 1977). One very interesting approach developed by Horning (1969) in the context of grammatical inference is to define grammar-grammars that are themselves meta-grammars for generating grammars. This gives a natural complexity measure for a grammar in terms of the length of its derivation, and has been used as the basis of an effective stochastic grammar inference program by Van der Mude and Walker (1978). Their algorithm has the interesting property of taking into account the precision of estimates of probability in the complexity measure, analogous to the intuitive feeling that there is less content in describing a probability as 0.5 rather than 0.523. Gaines (1976b, 1977a) has so far used only measures of complexity based on simple enumeration of states, or links, in automata -- such measures and their variants have been analysed by Zeigler (1976).
Computational algorithms to determine the admissible subset of models require the controlled enumeration of models in order of complexity -- "controlled" so that time is not wasted in generating duplicate or impossible candidates. Wharton (1977) has recently analysed and surveyed approaches to such enumeration. The practical algorithms for structure inference that exist are crucially dependent on the efficient control of candidate model generation, and Klir (1976; Klir and Uyttenhove, 1976, 1977), in particular, has studied structure generation as a central component to realistic identification schemes. In particular his notion of an epistemological hierarchy itself gives structure to the ontological decisions necessary at various levels of any identification scheme.
Another potentially fruitful approach to the removal of arbitrariness in measures of complexity and approximation is to study the effects of varying requirements for linguistic invariance, i.e., that the model should not change when the behaviour description language is varied, the basis of Goodman's "grue" paradox (Sober, 1975). While a requirement for total invariance is clearly impossible, limited forms of invariance act as intentional constraints upon possible complexity/ approximation orders and may prove to give intuitive meaning to what would otherwise be arbitrary formal relations.
The links of system identification to problems in the philosophy of science are obvious and important. In general, we are trying to form a model of the world through our observations of it. Carnap's confirmation theory appears as our incremental adjustment of a model resulting from new data. Popper's falsification theory appears as our removal of models from the admissible subset following new data that decreases their degree of approximation. Kuhnian "scientific revolutions" appear as changes in the complexity ordering on models which maintain the simplicity of all previously simple behavior but allow more behavior to be regarded as simple. And so on -- there is a rich literature essentially relating to system identification in the philosophy of science.
The links of stochastic system identification to probability theory are also close. Gaines (1976b, 1977a) obtains a solution to the problem of inferring stochastic finite-state grammars by using the subjective probability eliciting measures of Finetti and Savage. The expected loss of an optimal identifier turns out to be an entropy function. A random sequence may be defined as one that is maximally complex relative to a class of models. One may, in fact, reformulate probability theory in these terms thus integrating the "subjective," "complexity," "information-theoretic," and "axiomatic" foundations.
The links of system identification at the general level discussed in this paper to practical problems are still tenuous, and yet sufficiently strong to be the driving force behind most current studies. Biermann and Krishnaswamy (1976) have used their (deterministic) inferencer as a component of an experimental auto-programming system, and such program inference is the goal of several ongoing studies. Blum and Blum (1975) have given theoretical foundations for what can be achieved in terms of program compression programs based on grammatical inference, and Crespi-Reghizzi, Melankoff and Lichten (1973) have studied its use in programming language design. Gaines (1976b) used autoprogramming with errors to illustrate the use of stochastic grammatical inference and is currently studying the analysis of animal ethological data. Walker (1976) is applying his program to medical case histories.
One limitation to all current inferencers is speed of computation. Models with up to about 10 states of sequences of length about 1,000 over 10-symbol alphabets seem to be the point at which most implemented algorithms run of steam. One way of obviating this is to look for sub-optimal but fast algorithms that give good models but not necessarily admissible ones. Witten (1977) has re-analysed some of Gaines' examples using such a scheme based on the modelling techniques of Andreae's "Purr-Puss" learning scheme (Andreae and Cleary, 1976). Another approach, however, is to note that when the current schemes break down they are involved in searching millions of models; might it not be that the set of models is unsuited to the application? Gaines obtained a ten times speed-up in reanalysing Maryanski's examples by changing from a complexity measure based on states to one based on links between states. This was a sensible Kuhnian "paradigm shift" because, in retrospect, the artificial examples used to generate the data were not highly connected. Hence, a change in complexity ordering that differentially decreased the complexity of low connectivity models relative to high connectivity models made the "real-world" generated by Maryanski simpler and easier to understand. Such results point to the need to consider carefully the ontological implications of the class of models used (including the complexity order on it) when attempting to analyse real-world data.
I have perhaps over-emphasized in a paper on "system identification" the story of grammatical inference. This is because the inference of automata and Turing machines solves a very general form of systems identification. However, there are other important studies that would require papers in themselves to describe, in particular: Hajek and Havranek's (1977) studies in Czechoslovakia during the last decade of the GUHA system of inductive inference that has been applied to a variety of medical data analysis problems (Hajek, 1978), and Michalski's work at Champaign, Illinois, on inductive logics applied to the inference of plant disease taxonomies (Chilausky, Jacobson and Michalski, 1976). These, together with a wide range of studies making more specialized assumptions about the class of models (eventually impinging on classical statistical estimation and linear systems theory) fit into the general framework for identification outlined in this paper. Much work needs to be done on the role, and interrelationships, of the various pre-suppositions involved and the complexity/approximation measures used or implied.
I also do not have the space to discuss points (2) and (4) raised earlier, that identification is an active process of guiding data acquisition and may conflict with other objectives. We have been so far away from formal foundations for identification without these added problems that they have so far received little attention, although they are key aspects of practical "learning machines." Now that the basic problem of the semantics behavior/structure inference is acquiring rigorous foundations, one may expect more attention to be paid to the dynamics of the process.
Work in this area has been made most pleasant by the wide range of correspondence and discussion it has generated. My thanks to John and Peter Andreae, Michael Arbib, Richard Dawkins, Joe Goguen, Susan Haack, James Horning, George Klir, Ladislav Kohout, David Miller, Judea Pearl, Dan Ralescu, Ian Witten and Lotfi Zadeh for information, comments and discussion that influenced these studies.
J. H. Andreae, and J. G. Cleary (1976), "A New Mechanism for the Brain," International Journal of Man-Machine Studies, 8, pp. 89-119, 1976.
M. A. Arbib (1966), "Automata and Control Theory -- A Rapprochement," Automatica, 3, pp. 161-189.
M. A. Arbib, and E. G. Manes (1974), "Foundations of Systems Theory: Decomposable Systems," Automatica, 10, pp. 285-302.
A. Biermann, and J. Feldman (1972), "A Survey of Results in Grammatical Inference," in S. Watanabe (ed), Frontiers of Pattern Recognition, Academic Press, New York, pp. 31-53.
A. W. Biermann, and R. Krishnaswamy (1976), "Constructing Programs from Example Computations," IEEE Trans. Software Engineering, SE-2, P2. 141-153.
L. Blum, and M. Blum (1975), "Toward a Mathematical Theory of Inductive Inference," Information and Control, 28, pp. 125-155, 1975.
R. Chilausky, B. Jacobson, and R. S. Michalski (1976), "An Application of Variable-Valued Logic to Inductive Learning of Plant Disease Diagnostic Rules," Proc. Sixth Int. Symp. on Multiple Valued Logic, IEEE 76 CH 1111-4C, Utah, pp. 233-240.
N. Chomsky, and G. A. Miller (1957), Pattern Conception, Report AFCRC-TN-57-57.
J. V. Cornacchio (1977), "System Complexity -- A Bibliography," International Journal of General Systems, 3, pp. 267-271.
S. Crespi-Reghizzi, M. A. Melankoff, and L. Lichten (1973), "The use of Grammatical Inference for Designing Programming Languages," Communications of Association for Computing Machinery, 16, pp. 83-90.
M. Dawkins, and R. Dawkins (1974), "Some Descriptive and Explanatory Stochastic Models of Decision-Making," Motivational Control Systems Analysis, Academic Press, pp. 119-168.
R. Dawkins (1976), "Hierarchical Organization: A Candidate Principle for Ethology," in P. P. G. Bateson and R. A. Hinde (ed), Growing Points in Ethology, Cambridge University Press, pp. 7-54.
H. Ehrig (1974), Universal Theory of Automata, B. G. Teubner, Stuttgart.
P. Eykhoff (1974), System Identification, J. Wiley, London.
J. A. Feldman (1967), "First Thoughts on Grammatical Inference," A. I. Memo 55, Stanford University.
J. Feldman (1972), "Some Decidability Results on Grammatical Inference and Complexity," Information and Control, 20, pp. 244-262.
T. L. Fine (1973), Theories of Probability, Academic Press, New York.
B. de Finetti (1972), Probability, Induction and Statistics, John Wiley, London.
K. S. Fu (1974), Syntactic Methods in Pattern Recognition, Academic Press, New York.
K. S. Fu, and T. L. Booth (1975), "Grammatical Inference: Introduction and Survey," IEEE Trans, Systems, Man and Cybernetics, SMC-5, pp. 95-111 (Part I), 409-423 (Part II).
B. R. Gaines (1969), "Linear and NonLinear Models of the Human Controller," International Journal of Man-Machine Studies, 1, pp. 333-360.
B. R. Gaines (1976a), "On the Complexity of Causal Models," IEEE Trans. On Systems, Man and Cybernetics,. SMC-6, pp. 56-59.
B. R. Gaines (1976b), "Behaviour/Structure Transformations Under Uncertainty," International Journal of Man-Machine Studies, 8, pp. 337-365.
B. R. Gaines (1977a), "System Identification, Approximation and Complexity," International Journal of General Systems, 3, pp. 145-174.
B. R. Gaines (1977b), "Progress in General Systems Research," Proceedings of First International Conference on Applied General Systems Research: Recent Developments and Trends, Binghamton, N.Y.
J. A. Goguen (1973), "Realization is Universal," Mathematical Systems Theory, 6, pp. 359-374.
E. M. Gold (1967), "Language Identification in the Limit," Information and Control, 10, pp. 447-474, 1967.
P. Hajek, Ed., (1978) Special issue on the GUHA method and its applications, International Journal of Man-Machine Studies, 10.
P. Hajek, and T. Havranek (1977), "On Generation of Inductive Hypotheses," International Journal of Man-Machine Studies, 9.
J. J. Horning (1969), A Study of Grammatical Inference, Ph.D. Thesis, Stanford University, (University Microfilms 70-10, 465).
G. J. Klir (1976), "Identification of Generative Structures in Empirical Data," International Journal of General Systems, 3, pp. 89-104.
G. J. Klir, and H. J. J. Uyttenhove (1976), "Computerized Methodology for Structure Modelling," Annals of Systems Research, 5, DD. 29-66.
G. J. Klir, and H. J. J. Uyttenhove (1977), "On the Role of ComputerAided Structure Identification: Some Experimental Observations and Guidelines," International Journal of Man-Machine Studies, 9.
A. Kolmogorov (1968), "Logical Basis for Information Theory and Probability Theory," IEEE Trans. Information Theory, IT , pp. 662-664.
F. J. Maryanski (1974), Inference of Probabilistic Grammars, Ph.D. Thesis, University of Connecticut, (University Microfilms 75-10 645).
A. Van der Mude and A. Walker (1978), "On the Inference of Stochastic Regular Grammars," Information and Control, 38(3): 310-329.
A. R. Patel (1972), Grammatical Inference for Probabilistic Finite State Grammars, Ph.D. Thesis, University of Connecticut, (University Microfilms 72-32 241).
J. Pearl (1975), "An Economic Basis for Certain Methods of Evaluating Probabilistic Forecasts," UCLA-ENG-7561, School of Engineering and Applied Science, UCLA, California, July 1975.
D. A. Ralescu (1978), "Abstract Models for System Identification," FacultŽ de Science ƒconomique et de Gestion, Institut de MathŽmatiques ƒconomiques, Dijon, France.
L. J. Savage (1971), "Elicitation of Personal Probabilities and Their Assessment," Journal Americcan Statistical Association, 66, pp. 783-801.
E. Sober (1975), Simplicity, Clarendon Press, Oxford.
R. Solomonoff (1959), "A New Method for Discovering the Grammars for Phrase Structure Languages," Proc. Int. Conf. Information Processing, UNESCO, Paris, Butterworths, pp. 285-290.
R. Solomonoff (1964), "A Formal Theory of Inductive Inference," Information and Control, 7, pp. 1-22 (Part 1), 224-254 (Part 2).
D. M. Vowles (1970), "Neuroethology, Evolution and Grammar," in L. R. Aronson et al. (eds), Development and Evolution of Behaviour, W. H. Freeman, pp. 194-215.
A. Walker (1976), "A Framework for Model Construction and Model-Based Deduction in Systems with Causal Loops," Proceedings 3rd, Illinois Conf. on Med. Inf. Processing, Department of Information Engineering, University of Illinois at Chicago Circle.
S. Watanabe (1955), "Symmetry in Physical Laws, Part III, Prediction and Retrodiction," Reviews of Modern Physics, 27, pp. 179-186.
S. Watanabe (1960), "Information-Theoretical Aspects of Inductive and Deductive Inference," IBM Journal Research and Development, 4, pp. 208-231.
R. M. Wharton (1974), "Approximate Language Identification," Information and Control, 26, pp. 236-255.
R. M. Wharton (1977), "Grammar Enumeration and Inference," Information and Control, 33, pp. 253-272.
I. H. Witten (1977), "Non-Deterministic Modelling of Behaviour Sequences," EES-MMS-MOD-77, Department of Electrical Engineering Science, University of Essex.
L. A. Zadeh (1956), "On the Identification Problem," IRE Transactions on Circuit Theory, 3, pp. 277-281.
B. P. Zeigler (1976), "Simulation Based Structural Complexity," International Journal of General Systems, 2, pp. 217-223.