<Book Index>

Chapter 6

An Interpretation of the logic of J. S. Mill

September 1999

This chapter was updated stating January 25, 2006 with a focus on applications of Service Oriented Architectures as part of an OntologyStream Inc contract

 

The new version is here

 

Abstract

An interpretation of Quasi Axiomatic Theory and Mill’s logic is made to support the concept of situational logic.

Introduction

Aristotle described an inference method called induction by simple enumeration. The method proposes that: if we have a number of uniform facts and we do not know of any contrary facts we can make a generalization about these facts.   This type of induction is “weaker” than a method that would falsify a theory.  However, the induction by simple enumeration may be close to how natural language forms, through use.  It is conjectural on my part to suggest that Aristotle viewed natural language formation in this way, but I do make the conjecture that natural language forms in a fashion that involves categorical processes. 

Using Quasi Axiomatic Theories (QAT) developed by V. Finn (1991), a set of "facts" can be placed inside a deductive framework.  The framework can become situationally grounded though categorization and induction. However, the validity of deductive algorithms depends on the validity of a class of underlying assumptions.

Some of the Aristotelian assumptions can be understood by considering his theory of causation. The classes of Aristotelian laws

causation; formal, material, effective and final

provides examples of induction reasoning about causation relationships.

For Aristotle, at least in the interpretation of some, the phenomenon of cause is related to similarities within a temporal sequence.  The similarity relates elements and can become a model of states of situations.  The similarity between two things can be stated as < a, r, b > where r is the relationship.  Dis-similarity can be provided a corresponding notation. 

At least in how Aristotle’s metaphysics was incorporated into Newtonian science, these similarities must be crisp in nature and no critical hidden entanglement between similarity classes can be tolerated.  It is this crispness and absence of entanglements that might be challenged given modern science and modern understanding of phenomenon like natural language and human consciousness.

It is conjectured that living systems have hidden causation due to intention and other phenomenon.  We suggest that living system be regarded as open complex systems, and further suggest that Aristotle’s logic is closed and simple.  The complexity of real systems is exposed only as a matter of modern evidence. 

Hidden categorical entanglement is a sufficient reason why Aristotelian logic does not describe all causation in open complex systems.  Something may make a transformation from one category into another category, as in metabolic activities where a molecular element is given a specific function by a catalytic process.  A number of elements may be brought together and transformed into a whole that is not the same as the crisp sum of the parts.  Again, there is a non-reductionist categorical entanglement (in formal models of such processes.)

In addition to categorical entanglement, we consider possible insufficiency in sampling and in description.  The measurement of behaviors of a human being is an example of measurement insufficiency. 

Aristotelian logic comes to state laws of causation by generalizing from descriptions of many possible cases of causation. The generalization is from a specific set of examples and assumes validity to the descriptions of the examples. However, the choice of examples, and the description of examples in some type of formal syllogistic language is more problematic than Aristotelian logic pre-supposes.

"        Logic, in the Middle Ages, and down to the present day in teaching, meant no more than a scholastic collection of technical terms and rules of syllogistic inference. Aristotle has spoken, and it was the part of humbler men merely to repeat the lesson after him. . . . Every since the beginning of the seventeenth century, all vigorous minds that have concerned themselves with inference have abandoned the medieval tradition, . . . .

          The first extension was the introduction of the inductive method by Bacon and Galileo – by the former in a theoretical and largely misunderstood form, by the latter in actual use in establishing the foundations of modern physics and astronomy. ... But induction, important as it is when regarded as a method of investigation, does not seem to remain when its work is done: in the final form of a perfected science, it would seem that everything ought to be deductive. If induction remains at all, . . . , it remains merely as one of the principles according to which deductions are effected. Thus the ultimate result of the introduction of the inductive method seems not the creation of a new kind of non-deductive reasoning, but rather a widening of the scope of deduction . . ." (Russell (1914)

If the system under observation, for example Galileo’s observation of the invariants of falling objects, is very stable, then eventually deductive syllogisms might be constructed. However, in open systems, the system has fundamentally changing internal dynamics and thus non-monotonic logics can arise as the sampling set is updated.

The metaphysics of Aristotle does not have the richness of modern theories of causation. Even though Aristotelian logic has been applied to a range of phenomenon, his methods only work if the phenomenon is fully constrained by known universal law. This is clearly not the case with a class of phenomenon such as psychological motivation.

It is difficult to regard human inference in terms of the monotonic / non-monotonic fulcrum. Induction is a cognitive process. It has a temporal aspect that accounts for fundamental changes in non-stationary ontology. The notion of computed truth is also not so clear. However, the quasi axiomatic theories provide algorithms that are suited for modeling open systems and thus for modeling the inductive processes involved in human understanding of open systems. Non-monotonic reasoning is supported algorithmically through plausible reasoning and periodic updates to axiom sets.

Finn’s work is based on the work of Francis Bacon and J. S. Mill. All three have further developed a theory of causation based on "induction by simple enumeration". At the core of this theory is a similarity analysis that defines what are classes of instances and what are facts and conjectures framed within the context of these instances. Mill gave a general analysis of the theories of inductive proof and provided a set of formula and criteria related to the problems of scientific reasoning. More specifically, Mill formulated five "canons of reasoning" about casual hypotheses.

In private discussions with the author, Peter Kugler summarized the Mill’s Canons following (Mill, 1872).

First Canon [Method of Agreement]: If two or more instances of the phenomenon under investigation have only one circumstance in common, the circumstance in which alone all the instances agree is the cause (or effect) of the given phenomenon.

Second Canon [Method of Difference]: If an instance in which the phenomenon under investigation occurs, and an instance in which it does not occur, have every circumstance in common save one, that one occurring only in the former; the circumstance in which the two instances differ is the effect, or the cause, or an indispensable part of the cause, of the phenomenon.

Third Canon [Joint Method of Agreement and Difference]: If two or more instances in which the phenomenon occurs have only one circumstance in common, while two of more instances in which it does not occur have nothing in common save the absence of that circumstance, the circumstance in which the two sets of instances differ is the effect, or the cause, or an indispensable part of the cause, of the phenomenon.

Fourth Canon [Method of Residues]: Subduct from any phenomenon such part as is known by previous inductions to be the effect of certain antecedents, and the residue of the phenomenon is the effect of the remaining antecedents.

Fifth Canon [Method of Concomitant Variations]: Whatever phenomenon varies in any manner whenever another phenomenon varies in some particular manner, is either a cause or an effect of that phenomenon, or is connected with it through some fact of causation.

Kugler describes Mill’s motivation as formulating a theory of inferential inductive knowledge based on the concept of natural law. For Mill, natural law referred to relationships between antecedent and consequent events that are universally invariant.

The validity of the inductive generalization was grounded in the invariance of these natural laws. As we will see, this is a point of disagreement between Mill and Pierce. We interpret this point to build situational logics that are bi-level and thus separated except during a meta-phenomenon of emergence.

We take a "Piercean interpretation" of the canons by making two changes in philosophy. First, the cause we are looking for is a "compositional cause" where basic elements are composed into emergent wholes. Pierce used the metaphor of chemical compounds having been composed by atoms. The compositional cause of chemical properties are then ascribed as the presence or absence of specific atoms in the chemical composition.

Second, the invariances that we look for are situational invariants that are defined across basins of similarity within specific organizational contexts. These issues are addressed in a "General Framework for Computational Intelligence" where a deeper motivation is given for the application of Mill’s logic to text understanding.

All of the canons have a common feature:

there are descriptions of an occurrence of some phenomenon under investigation and there are related descriptions in which the phenomenon does not occur.

Based on formal means, conclusions are drawn regarding the causes of phenomenon in situational context.

The starting situation assumes that the proposition, "p is a property of object O" (written p Þ1 O in [1]), is given to be true. The assumption is taken as an empirical observation.

When objects have more than one property, and/or have the possibility of having more than one property; then the situation is more complex, but still empirical in nature. This situation is addressed in the last of Mill’s canons.

The Canon of Agreement.

This Canon consists of three variations. All of them begin with the same starting situation:

a property, p, of a class of objects { Oj } has been identified and we require evidence regarding the possible cause, c, of an object having this property.

The Variation for Direct Agreement list all situations in which the property p is present. An intersection, c, is defined over the descriptions of all situations in this list.

c = Ç Ti

where { Ti } is the collection of representational sets for description of all objects Oi that are known to have the property p.

If this intersection exist and is not empty the intersection is added to a list of meaningful "positive" descriptive components and a conjecture is made that property p is connected by a plausible relation to the descriptive intersection element c :

If the descriptive structure c is a part of the description of the object then it is plausible that the object has property p.

Whereas the analysis is over a class of objects { Oj } that each have a specific property p, the inference is about whether a specific object O, not in{ Oj }, has this property.

Let p be from a list of possible properties of an object O. We assume that the truth of p has been assessed. This assessment is stated in the form:

p Þ1 O

Which is read: "it is plausible that object O has property p."

Then we interpret the Variation for Direct Agreement using the following modification of QAT’s second partially defined relationship, Þ2 :

c Þ2 O

This should be read: "it is plausible that a description c is related to a cause of a property similar to property p and that object O has this property. However, using some equivalence classes we get the following statement:

substructure c is a plausible cause of property p being a property of object O.

Again, note that the question of which property is under discussion is not explicitly stated in the expression "c Þ2 O ". We talk here only of a single property.

The Variation for Inverse Agreement first lists all situations where a property p is absent. An intersection, d, is defined over the description of all situations in this list.

d = Ç Ti

where { Ti } is the collection of representational sets for description of all objects Oi that are known not to have the property p.

If an intersection, c, in the description’s representational elements exist and is not empty then this intersection set is added to a list of meaningful "negative" descriptive components and a conjecture is made that property p connected by a plausible relation

d Þ2 ~ O.

Figure 1: The representational set c - d.

 

This is read, " the presence, in O, of the substructure d implies that the object O does not have the property p".

We could also interpret this to mean:

~d Þ2 O,

but only under restricted circumstances.

There is a statistical nature that comes from looking for an invariance across multiple situations.

Let M+p a set of positive examples of objects having a specific property, p, and M-p be the class of similar objects that do not have this property.

Note that ~d Þ2 O and c Þ2 O could imply that c - d Þ2 O, where c - d is set c take away the elements of set d. In this case, c Ç d is said to "block" some of the representational elements in c. The consequences of this are hard to interpret in general; however, if c is already an intersection of "positive" representational sets, then the additional removal of some of elements may provide a more minimal concept structure by which to refer to a cause of the property p. However; in each case, this possibility must be tested empirically.

The Double Variation of Agreement is exactly this combination of Variation for Direct Agreement and Variation for Inverse Agreement.

It seems that two different possibilities exist for the Double Variation of Agreement. In both cases, we identify an intersection of a class of examples. One is a class of negative examples and one is a class of positive examples. In both cases, we treat the agreement as over a number of examples.

An intersection c, of representational sets, can be the basis of a conjecture about a positive cause of the property p. Likewise, the intersection d, of representational sets, can be related a conjecture about a negative cause of the property p. The subsets c – d (read, "c take away d) (see shaded area in Figure 1) and d – c can be used in some cases to refine the relationship between causes and properties. Thus three types of conjectures can be derived with the first canon.

How the classes of positive and negative examples are selected is relevant, and this selection criterion is also at the root of variations on the second and third Mill canons.

The Canon of Difference.

For the Canon of Difference we again obtain descriptions of a class of situations. Certain of the objects in the situations are described as having property p. For example, again we may consider the properties as related strongly to the declarative placement of objects into one of q categories.

Again, we assume that the description includes a list of representations about the composition of the objects. These descriptions are made as logical statements, such as Standard Query Languages (SQL) statements, that use representational elements from the set A. As before, let M+p , be a set of positive examples of an object having a specific property, p, and M-p be the class of similar objects that does not have this property.

Figure 2: The intersection between representational sets c and d.

Let Oi be a single element of M+p and Oj be a single element of M-p. Let c be the representational set for Oi and d be the representational set for Oj. This intersection can be conjectured to be the description of how the two objects are "entangled". The set, c – d, is the effect, or the cause, or an indispensable part of the cause, of the property p. Note that the object might be a category representational set or even an intersection of some type derived from the canon of agreement.

Joint Canon of Agreement and Difference.

Again, suppose we have a set of positive examples, M+p, of objects having a specific property, p, and a set of negative examples , M-p, when similar objects do not have this property.

We let Cq be the category defined by M+p. An intersection V+ of the compositional representations of the positive examples M+p is made. The intersection V- is defined over the set M-p.

We also look for one example of an object, O, that was not placed into category Cq while at the same time this object’s representational set, d, has an non-empty intersection with M+p.

In the case we have that

V+ Ç d Þ2 ~ O

The same is done with the negative examples to produce the subset of representations M-. One positive example is chosen and its representational set, c, used to produce a conjecture about a positive cause.

V- Ç c Þ2 O

The plausible inferences: V+ Ç d Þ2 ~ O and V- Ç c Þ2 O are defined as "dual formal (positive and negative) causes" of p. The use of such dual statements produces a distributed assessment of category placement.

Extension of notation

We will follow and further develop a notation already introduced in discussions of bi-level voting procedures. However, the objects will be generalized from text passages to generic objects. Categorization policies are generalized to similarity classes. We are interested in the property that a "description" is the "formal cause" of an object being placed into a similarity class. This is clearly a "synthetic" property that is to be defined by careful empirical methods and by forming good representations of objects.

The introduction of the category theory behind the class of voting procedures [5], requires some motivation. Let

O = { O1 , O2 , . . . , Om }

be some collection of objects.

Some device is used to compute an "observation" Dr about the objects. We use the following notation to indicate this:

Dr : Oi à { t1 , t2 , . . . , tn }

This notation is read "the observation Dr of the object Oi produces the representational set

{ t1 , t2 , . . . , tn }"

We now combine these object level representations to form a category representation.

·         each "observation", Dr, of the objects in the training set O has a representational set

Dr : Oi à Tk = { t1 , t2 , . . . , tn }

·         Let P be the union of the individual object representational sets Tk.

P = È Tk.

This set P is the representation set for the complete collection O1.

·         The set P can be partitioned, with overlaps, to match the assignment of objects to categories C = { Cq }. Let T*q be the union of all elements of the representation sets Tk for all objects that are assigned to Cq.

T*q = Ç { Tk | object Ok, is assigned to category Cq. }

In this way, the category representation set, T*q, is defined for each category Cq.

The overlap between category representation T*q, and T*s, is one statistical measure of the "entanglement" between categories Cq and Cs. This fact leads to a method for identifying the minimal intersections of descriptions of structural features from the category representational sets and matching these minimal intersections to logical atoms in quasi axiomatic theory.

On lattices

We now introduce some additional mathematical constructions that are used in QAT-like systems to keep books on the set of all subsets of the representational elements used in descriptions. These subsets are nodes of the lattice of subsets with smallest element the empty set and largest element the set of all representational elements, the universal set, from a class of descriptions.

The notion of minimal meaningful intersections can be seen using a picture of the lattice. In Figure 3 we see some representational sets and some subsets. The nodes of the lattice stand for subsets, arranged by the partial relationship "set inclusion". The nodes form a large diamond shape with the universal set at the top and the empty set at the bottom.

 

Note that set inclusion is not a total order since, for example T1 and T2 are not ordered by this relationship. In the figure, the node m1 could be the intersection of T1 and T2 and m2 could be the intersection of T1 , T2 and Ti

Figure 3: Some substructures and relationships in the lattice of all subsets of the set of all representational elements.

Note that if some manageable set of lattice nodes are identified as having properties and internode relationships then we have some of the constructions seen in semantic nets. These constructions have the syntagmatic form < a, r, b > where a and b are locations and r is a relational property.

It is also worth noting that the size of the lattice is the number 2 to the power of the size of the universal set. In text understanding systems the universal set can be many thousands of elements. Thus the lattice is very large indeed. However all intersections of passage (object) representational sets will be in a relatively small part of the bottom of the lattice.

The case where the object has multiple properties

An object O not only has the possibility of having one of several different properties, but also has the possibility of having multiple properties at the same time. The first three canons assume that only one property is being considered. The last two canons treat the more complex case.

In QAT-like systems, we have three classes of logical atoms; O (objects), P (properties), and A (substructures.) Only to the degree that it is reasonable to make an assumption of independence between the causes of properties, we can speak about residues and concomitant variation.

Suppose we have established k conjectures of the form:

For i = 1, . . . k; pi Þ1 O and ai Þ2 O.

This is read, "For i = 1, . . . k, the property pi is a property of the object O and substructure ai is the cause of property pi in object O." Under the assumption of k independent casual linkages, we can use the compact notation:

(p1 , . . . , pk) Þ1 O and (a1 , . . . , ak) Þ2 O.

or just,

(a1 , . . . , ak) Þ2 O

in the case that the property set (p1 , . . . , pk) has already been identified.

In the case where it is necessary to make the relationship between substructure and property explicit, then we use the notation:

ai Þ2 (O, pi)

which is to read " substructure ai is the plausible cause of the object O having the property pi. This notation assumes that pi Þ1 O.

Canon of Residues

Both the Canon of Residues and the Canon of Concomitant Variation deal with complex causes and complex properties.

The first three canons can be used to identify the meaningful subsets of the set of representational elements A. The last two canons are used, in our interpretation, to further delineate causal linkages between substructures and properties.

Let C = { Ci } be a class of categories of objects. We assume that this class is a reasonably complete description of the similarity classes of the set of emergent wholes that are produced by a set of substructural elements (atoms).

Again, reflect on the example of the atomic elements, with its periodic table, and chemistry. The level of observation of properties is separated.

Let T be a generalized product of some subsets, {a1 , . . . , aq} , of the set A of substructures:

T = (a1 , . . . , aq)

that are observed to describe a complex set of properties P:

P = {p1 , . . . , pr}

Suppose further that r = q and we know that for each i: i= 1, 2, . . . , q-1

a1 Þ2 (O, p1),

a2 Þ2 (O, p2),

. . . ,

ar-1 Þ2 (O, pq-1),

Using the Canon of residues, we conjecture that ar Þ2 (O, pr).

There is also a context for this conjecture. The context is the set of substructures involved in composing objects belonging to one of the categories in C = { Ci }.

The Canon of Concomitant Variation

In this Canon we have descriptions of the properties of two objects A and B.

Linkages are conjectured. Perhaps the objects are two winter storms A and B and we are noting that two of the system observables seem to be proportionately varying. The connection is observed by differences seen in a common property. The cause of the variation in the property is conjectured to be though a specific variation in the substructure.

Define a non-specific composition function comp(.) to be a transformation of some set of substructural elements into a whole that has a set of properties. We suppose here that the properties are all functional properties of whole objects. We again suppose that structural / functional relationships have some degree of independence; i.e., that the functional properties are distinct and that, at least as a part of the whole, that distinct structural components are composed into distinct properties.

This is expressed:

comp(d + c) ~ comp(d) + comp(c)

where ~ is the connective "is similar to", and d , c are substructures. Of course, this is a strong assumption which is hedged by the use of the similarity connective.

Let A and B have a complex of properties:

(p1 , p2 , …., pn-1 , comp(c)) Þ1 A

(pn+1 , pn+2 , . . . , pn+m-1 , comp(d)) Þ1 B

and the degree of the presence of substructures c and d is ordered. We suppose that

c Þ2 (A, pn,),

and

d Þ2 (B, pn+m),

where pn = pn+m, is a common property shared by object A and object B..

Let c+ and d+ denote an increase in c and d correspondingly and c- and d- denote a decrease of c and d. Since d is a substructure, d+ and d- maybe defined either quantitatively or qualitatively (through substructural similarity analysis.)

Then if the situation:

(p1 , p2 , …., pn-1 , comp(c+)) Þ1 A

coincides with the situation:

(pn+1 , pn+2 , . . . , pn+m-1 , comp(d+)) Þ1 B

then we can say that c and d are directly related. A similar relationship exists when comp(c-) and comp(d-) vary directly to produce B and A .

In the opposite case, if the situation

(p1 , p2 , …., pn-1 , comp(c-)) Þ1 A

coincides with the situation

(pn+2 , pn+3 , . . . , pn+m-1 , comp(d+)) Þ1 B

we say that c and d are inversely related. A similar relationship exists when comp(c+) and comp(d-) vary inversely.

Clearly, the above notation only begins to define the full set of possibilities for an algorithmic calculus based on Mill’s reasoning. There must be; however, some finesse in it’s application to complex problems. Mill’s logic breaks down to the degree that the set of observables, both of properties and substructures, are not composable into independent causal linkages. Moreover, natural complex systems might not be fully reducible to independent causal linkages, and a degree of skepticism is required regarding both reductionism and it’s alternatives.

The problem we see is not the viability of complex descriptions of bi-level causation, but rather that these descriptions must be situational in nature. A viable situational form of extensions to Mill’s logic might be based on behavioral evidence that natural systems behave more predictably in well defined situational context.

Situational language and bi-level reasoning

Using our interpretation of QAT-like formal languages, we have conjectured that the J. S. Mill’s method creates deductive machinery that is situational in nature. Acting on this conjecture, we have initiated an experimental program applying this interpretation to autonomous text understanding. We use formal categorical properties of declarative human message routing decisions in commercial and governmental activities [5]. We assume that message routing decisions are produced by human inductive decisions based on individual memories and personal awareness of a ground truth related to the routing environment.

The apparatus of the QAT languages stores information and provides a meta-formalism that does not depend on specific situations, since we have developed a separate formalism that deals only with the "disembodied" substructure of classes of objects. The methodology attempts to build a complete set of representational symbols for sufficient reference to all formal causes for any object property.

The representational problem is treated independently. This independence is justified on practical grounds. First, the representational problem is not solved perfectly by any known algorithmic system – nor even, it is debated, by cognitive processes. Second, a certain amount of failure in representational fidelity is compensated by adaptation.

In situational logics the logical atoms are specific to situational classes. The object of analysis is assumed to be in a context that maps to one of a known situational class. When the current situation can not be mapped to the assumed situational class, then the logic must be recomputed from an elementary re-measurement of class and substructure invariants. In this case, either the representational fidelity or the logical formalism is inadequate.

The logic is bi-level. Object prototypes are considered as situational classes, as are modal properties of the environment. Substructural elements are also considered prototypes, but at a distinct level of organization that is not locally meaningful to the situational classes of assembled wholes.

The meaningful subsets of representational elements have both internal and external linkages, the discovery of which leads to one of many possible situational logics. We interpret the internal linkages to be structural in nature and the external linkages to be functional in nature. Two levels of organization are identified and maintained in separated data structures.

Structural components are the cause of functional properties that result from the formation of a whole that is greater than the sum of the structural components. Water from hydrogen and oxygen is an example. The compound, water, does not depend on having specific examples of an oxygen atom, but rather any one of a class of atoms that is the prototype class for all oxygen atoms.