Chapter 6
An Interpretation of the logic of J. S. Mill
September 1999
This chapter was updated stating January
25, 2006 with a focus on applications of Service Oriented Architectures as part
of an OntologyStream Inc contract
The new version is here
Abstract
An interpretation of Quasi Axiomatic Theory and Mill’s logic is made to
support the concept of situational logic.
Introduction
Aristotle described an inference method called induction by simple
enumeration. The method proposes that: if we have a number of uniform facts and
we do not know of any contrary facts we can make a generalization about these
facts. This type of induction is
“weaker” than a method that would falsify a theory. However, the induction by simple enumeration may be close to how
natural language forms, through use. It
is conjectural on my part to suggest that Aristotle viewed natural language
formation in this way, but I do make the conjecture that natural language forms
in a fashion that involves categorical processes.
Using Quasi Axiomatic Theories (QAT) developed by V. Finn (1991), a set
of "facts" can be placed inside a deductive framework. The framework can become situationally
grounded though categorization and induction. However, the validity of
deductive algorithms depends on the validity of a class of underlying
assumptions.
Some of the Aristotelian assumptions can be understood by considering
his theory of causation. The classes of Aristotelian laws
causation; formal, material, effective and final
provides examples of induction reasoning about causation relationships.
For Aristotle, at least in the interpretation of some, the phenomenon
of cause is related to similarities within a temporal sequence. The similarity relates elements and can
become a model of states of situations.
The similarity between two things can be stated as < a, r, b >
where r is the relationship.
Dis-similarity can be provided a corresponding notation.
At least in how Aristotle’s metaphysics was incorporated into Newtonian
science, these similarities must be crisp in nature and no critical hidden
entanglement between similarity classes can be tolerated. It is this crispness and absence of
entanglements that might be challenged given modern science and modern
understanding of phenomenon like natural language and human consciousness.
It is conjectured that living systems have hidden causation due to
intention and other phenomenon. We
suggest that living system be regarded as open complex systems, and further
suggest that Aristotle’s logic is closed and simple. The complexity of real systems is exposed only as a matter of
modern evidence.
Hidden categorical entanglement is a sufficient reason why Aristotelian
logic does not describe all causation in open complex systems. Something may make a transformation from one
category into another category, as in metabolic activities where a molecular
element is given a specific function by a catalytic process. A number of elements may be brought together
and transformed into a whole that is not the same as the crisp sum of the
parts. Again, there is a
non-reductionist categorical entanglement (in formal models of such processes.)
In addition to categorical entanglement, we consider possible
insufficiency in sampling and in description.
The measurement of behaviors of a human being is an example of
measurement insufficiency.
Aristotelian logic comes to state laws of causation by generalizing
from descriptions of many possible cases of causation. The generalization is
from a specific set of examples and assumes validity to the descriptions of the
examples. However, the choice of examples, and the description of examples in
some type of formal syllogistic language is more problematic than Aristotelian
logic pre-supposes.
" Logic,
in the Middle Ages, and down to the present day in teaching, meant no more than
a scholastic collection of technical terms and rules of syllogistic inference.
Aristotle has spoken, and it was the part of humbler men merely to repeat the
lesson after him. . . . Every since the beginning of the seventeenth century,
all vigorous minds that have concerned themselves with inference have abandoned
the medieval tradition, . . . .
The
first extension was the introduction of the inductive method by Bacon and
Galileo – by the former in a theoretical and largely misunderstood form, by the
latter in actual use in establishing the foundations of modern physics and
astronomy. ... But induction, important as it is when regarded as a method of
investigation, does not seem to remain when its work is done: in the final form
of a perfected science, it would seem that everything ought to be deductive. If
induction remains at all, . . . , it remains merely as one of the principles
according to which deductions are effected. Thus the ultimate result of the
introduction of the inductive method seems not the creation of a new kind of
non-deductive reasoning, but rather a widening of the scope of deduction . .
." (Russell
(1914)
If the system under observation, for example Galileo’s observation of
the invariants of falling objects, is very stable, then eventually deductive
syllogisms might be constructed. However, in open systems, the system has
fundamentally changing internal dynamics and thus non-monotonic logics can
arise as the sampling set is updated.
The metaphysics of Aristotle does not have the richness of modern
theories of causation. Even though Aristotelian logic has been applied to a
range of phenomenon, his methods only work if the phenomenon is fully
constrained by known universal law. This is clearly not the case with a class
of phenomenon such as psychological motivation.
It is difficult to regard human inference in terms of the monotonic /
non-monotonic fulcrum. Induction is a cognitive process. It has a temporal
aspect that accounts for fundamental changes in non-stationary ontology. The
notion of computed truth is also not so clear. However, the quasi axiomatic
theories provide algorithms that are suited for modeling open systems and thus
for modeling the inductive processes involved in human understanding of open
systems. Non-monotonic reasoning is supported algorithmically through plausible
reasoning and periodic updates to axiom sets.
Finn’s work is based on the work of Francis Bacon and J. S. Mill. All
three have further developed a theory of causation based on "induction by
simple enumeration". At the core of this theory is a similarity analysis
that defines what are classes of instances and what are facts and conjectures
framed within the context of these instances. Mill gave a general analysis of
the theories of inductive proof and provided a set of formula and criteria
related to the problems of scientific reasoning. More specifically, Mill
formulated five "canons of reasoning" about casual hypotheses.
In private discussions with the author, Peter Kugler summarized the
Mill’s Canons following (Mill, 1872).
First Canon [Method of Agreement]: If two or more
instances of the phenomenon under investigation have only one circumstance in
common, the circumstance in which alone all the instances agree is the cause
(or effect) of the given phenomenon.
Second Canon [Method of Difference]: If an instance
in which the phenomenon under investigation occurs, and an instance in which it
does not occur, have every circumstance in common save one, that one occurring
only in the former; the circumstance in which the two instances differ is the
effect, or the cause, or an indispensable part of the cause, of the phenomenon.
Third Canon [Joint Method of Agreement and
Difference]: If two or more instances in which the phenomenon occurs have only
one circumstance in common, while two of more instances in which it does not occur
have nothing in common save the absence of that circumstance, the circumstance
in which the two sets of instances differ is the effect, or the cause, or an
indispensable part of the cause, of the phenomenon.
Fourth Canon [Method of Residues]: Subduct from any
phenomenon such part as is known by previous inductions to be the effect of
certain antecedents, and the residue of the phenomenon is the effect of the
remaining antecedents.
Fifth Canon [Method of Concomitant Variations]:
Whatever phenomenon varies in any manner whenever another phenomenon varies in
some particular manner, is either a cause or an effect of that phenomenon, or
is connected with it through some fact of causation.
Kugler describes Mill’s motivation as formulating a theory of inferential
inductive knowledge based on the concept of natural law. For Mill, natural law
referred to relationships between antecedent and consequent events that are
universally invariant.
The validity of the inductive generalization was grounded in the
invariance of these natural laws. As we will see, this is a point of
disagreement between Mill and Pierce. We interpret this point to build
situational logics that are bi-level and thus separated except during a
meta-phenomenon of emergence.
We take a "Piercean interpretation" of the canons by making
two changes in philosophy. First, the cause we are looking for is a
"compositional cause" where basic elements are composed into emergent
wholes. Pierce used the metaphor of chemical compounds having been composed by
atoms. The compositional cause of chemical properties are then ascribed as the
presence or absence of specific atoms in the chemical composition.
Second, the invariances that we look for are situational invariants
that are defined across basins of similarity within specific organizational
contexts. These issues are addressed in a "General Framework for
Computational Intelligence" where a deeper motivation is given for the
application of Mill’s logic to text understanding.
All of the canons have a common feature:
there are descriptions of an occurrence of some
phenomenon under investigation and there are related descriptions in which the
phenomenon does not occur.
Based on formal means, conclusions are drawn regarding the causes of
phenomenon in situational context.
The starting situation assumes that the proposition, "p is a
property of object O" (written p Þ1 O in [1]), is given to be
true. The assumption is taken as an empirical observation.
When objects have more than one property, and/or have the possibility
of having more than one property; then the situation is more complex, but still
empirical in nature. This situation is addressed in the last of Mill’s canons.
The Canon of Agreement.
This Canon consists of three variations. All of them begin with the
same starting situation:
a property, p, of a class of objects { Oj
} has been identified and we require evidence regarding the possible cause, c,
of an object having this property.
The Variation for Direct Agreement list all situations in which
the property p is present. An intersection, c, is defined over the descriptions
of all situations in this list.
c = Ç Ti
where { Ti } is the collection of representational sets for
description of all objects Oi that are known to have the property p.
If this intersection exist and is not empty the intersection is added
to a list of meaningful "positive" descriptive components and a
conjecture is made that property p is connected by a plausible relation to the
descriptive intersection element c :
If the descriptive structure c is a part of the
description of the object then it is plausible that the object has property p.
Whereas the analysis is over a class of objects { Oj } that
each have a specific property p, the inference is about whether a specific
object O, not in{ Oj }, has this property.
Let p be from a list of possible properties of an object O. We assume
that the truth of p has been assessed. This assessment is stated in the form:
p Þ1
O
Which is read: "it is plausible that object O has property
p."
Then we interpret the Variation for Direct Agreement using the
following modification of QAT’s second partially defined relationship, Þ2 :
c Þ2
O
This should be read: "it is plausible that a description c is related
to a cause of a property similar to property p and that object O has this
property. However, using some equivalence classes we get the following
statement:
substructure c is a plausible cause of property p
being a property of object O.
Again, note that the question of which property is under discussion is
not explicitly stated in the expression "c Þ2 O ". We talk here only
of a single property.
The Variation for Inverse Agreement first lists all situations
where a property p is absent. An intersection, d, is defined over the
description of all situations in this list.
d = Ç Ti
where { Ti } is the collection of representational sets for
description of all objects Oi that are known not to have the
property p.
If an intersection, c, in the description’s representational elements
exist and is not empty then this intersection set is added to a list of
meaningful "negative" descriptive components and a conjecture is made
that property p connected by a plausible relation
d Þ2
~ O.

Figure 1: The representational set c - d.
This is read, " the presence, in O, of the substructure d implies
that the object O does not have the property p".
We could also interpret this to mean:
~d Þ2
O,
but only under restricted circumstances.
There is a statistical nature that comes from looking for an invariance
across multiple situations.
Let M+p a set of positive examples of objects
having a specific property, p, and M-p be the class of
similar objects that do not have this property.
Note that ~d Þ2 O and c Þ2 O could imply that c - d Þ2 O, where c - d is set c
take away the elements of set d. In this case, c Ç d is said to
"block" some of the representational elements in c. The consequences
of this are hard to interpret in general; however, if c is already an
intersection of "positive" representational sets, then the additional
removal of some of elements may provide a more minimal concept structure by
which to refer to a cause of the property p. However; in each case, this
possibility must be tested empirically.
The Double Variation of Agreement is exactly this combination of
Variation for Direct Agreement and Variation for Inverse Agreement.
It seems that two different possibilities exist for the Double
Variation of Agreement. In both cases, we identify an intersection of a
class of examples. One is a class of negative examples and one is a class of
positive examples. In both cases, we treat the agreement as over a number of
examples.
An intersection c, of representational sets, can be the basis of a
conjecture about a positive cause of the property p. Likewise, the intersection
d, of representational sets, can be related a conjecture about a negative cause
of the property p. The subsets c – d (read, "c take away d) (see shaded
area in Figure 1) and d – c can be used in some cases to refine the
relationship between causes and properties. Thus three types of conjectures can
be derived with the first canon.
How the classes of positive and negative examples are selected is
relevant, and this selection criterion is also at the root of variations on the
second and third Mill canons.
The Canon of Difference.
For the Canon of Difference we again obtain descriptions of a
class of situations. Certain of the objects in the situations are described as
having property p. For example, again we may consider the properties as related
strongly to the declarative placement of objects into one of q categories.
Again, we assume that the description includes a list of
representations about the composition of the objects. These descriptions are
made as logical statements, such as Standard Query Languages (SQL) statements,
that use representational elements from the set A. As before, let
M+p , be a set of positive examples of an object having a
specific property, p, and M-p be the class of similar
objects that does not have this property.

Figure 2: The intersection between representational sets c and d.
Let Oi be a single element of M+p and
Oj be a single element of M-p. Let c be the
representational set for Oi and d be the representational set for Oj.
This intersection can be conjectured to be the description of how the two
objects are "entangled". The set, c – d, is the effect, or the cause,
or an indispensable part of the cause, of the property p. Note that the object
might be a category representational set or even an intersection of some type
derived from the canon of agreement.
Joint Canon of Agreement and Difference.
Again, suppose we have a set of positive examples, M+p,
of objects having a specific property, p, and a set of negative examples , M-p,
when similar objects do not have this property.
We let Cq be the category defined by M+p.
An intersection V+ of the compositional representations of the
positive examples M+p is made. The intersection V-
is defined over the set M-p.
We also look for one example of an object, O, that was not placed into
category Cq while at the same time this object’s representational
set, d, has an non-empty intersection with M+p.
In the case we have that
V+ Ç d Þ2
~ O
The same is done with the negative examples to produce the subset of
representations M-. One positive example is chosen and its
representational set, c, used to produce a conjecture about a positive cause.
V- Ç c Þ2
O
The plausible inferences: V+ Ç d Þ2 ~ O and V- Ç c Þ2 O are defined as "dual
formal (positive and negative) causes" of p. The use of such dual
statements produces a distributed assessment of category placement.
Extension of notation
We will follow and further develop a notation already introduced in
discussions of bi-level voting procedures. However, the objects will be
generalized from text passages to generic objects. Categorization policies are
generalized to similarity classes. We are interested in the property that a
"description" is the "formal cause" of an object being
placed into a similarity class. This is clearly a "synthetic"
property that is to be defined by careful empirical methods and by forming good
representations of objects.
The introduction of the category theory behind the class of voting
procedures [5], requires some motivation. Let
O = { O1 , O2 , . . . , Om }
be some collection of objects.
Some device is used to compute an "observation" Dr
about the objects. We use the following notation to indicate this:
Dr : Oi à { t1 , t2
, . . . , tn }
This notation is read "the observation Dr of the object
Oi produces the representational set
{ t1 , t2 , . . . , tn }"
We now combine these object level representations to form a category
representation.
·
each
"observation", Dr, of the objects in the training set O
has a representational set
Dr : Oi à Tk = { t1
, t2 , . . . , tn }
·
Let
P be the union of the individual object representational sets Tk.
P =
È Tk.
This set P is the representation set for the
complete collection O1.
·
The
set P can be partitioned, with overlaps, to match the assignment of
objects to categories C = { Cq }. Let T*q be
the union of all elements of the representation sets Tk for
all objects that are assigned to Cq.
T*q = Ç { Tk | object Ok, is assigned to category
Cq. }
In this way, the category representation set, T*q,
is defined for each category Cq.
The overlap between category representation T*q, and T*s,
is one statistical measure of the "entanglement" between
categories Cq and Cs. This fact leads to a method for
identifying the minimal intersections of descriptions of structural features
from the category representational sets and matching these minimal
intersections to logical atoms in quasi axiomatic theory.
On lattices
We now introduce some additional mathematical constructions that are
used in QAT-like systems to keep books on the set of all subsets of the
representational elements used in descriptions. These subsets are nodes of the
lattice of subsets with smallest element the empty set and largest element the
set of all representational elements, the universal set, from a class of
descriptions.
The notion of minimal meaningful intersections can be seen using a
picture of the lattice. In Figure 3 we see some representational sets and some
subsets. The nodes of the lattice stand for subsets, arranged by the partial
relationship "set inclusion". The nodes form a large diamond shape
with the universal set at the top and the empty set at the bottom.
Note that set inclusion is not a total order since, for example T1
and T2 are not ordered by this relationship. In the figure, the node
m1 could be the intersection of T1 and T2 and
m2 could be the intersection of T1 , T2 and Ti

Figure 3: Some substructures and relationships in the lattice of all subsets of
the set of all representational elements.
Note that if some manageable set of lattice nodes are identified as
having properties and internode relationships then we have some of the
constructions seen in semantic nets. These constructions have the syntagmatic
form < a, r, b > where a and b are locations and r is a relational
property.
It is also worth noting that the size of the lattice is the number 2 to
the power of the size of the universal set. In text understanding systems the
universal set can be many thousands of elements. Thus the lattice is very large
indeed. However all intersections of passage (object) representational sets
will be in a relatively small part of the bottom of the lattice.
The case where the object has multiple properties
An object O not only has the possibility of having one of several
different properties, but also has the possibility of having multiple
properties at the same time. The first three canons assume that only one
property is being considered. The last two canons treat the more complex case.
In QAT-like systems, we have three classes of logical atoms; O
(objects), P (properties), and A (substructures.) Only to the
degree that it is reasonable to make an assumption of independence between the
causes of properties, we can speak about residues and concomitant variation.
Suppose we have established k conjectures of the form:
For i = 1, . . . k; pi Þ1 O and ai Þ2 O.
This is read, "For i = 1, . . . k, the property pi is a
property of the object O and substructure ai is the cause of
property pi in object O." Under the assumption of k independent
casual linkages, we can use the compact notation:
(p1 , . . . , pk) Þ1 O and (a1 , . .
. , ak) Þ2 O.
or just,
(a1 , . . . , ak) Þ2 O
in the case that the property set (p1 , . . . , pk)
has already been identified.
In the case where it is necessary to make the relationship between
substructure and property explicit, then we use the notation:
ai Þ2 (O, pi)
which is to read " substructure ai is the plausible
cause of the object O having the property pi. This notation assumes
that pi Þ1 O.
Canon of Residues
Both the Canon of Residues and the Canon of Concomitant Variation deal
with complex causes and complex properties.
The first three canons can be used to identify the meaningful subsets
of the set of representational elements A. The last two canons are used,
in our interpretation, to further delineate causal linkages between
substructures and properties.
Let C = { Ci } be a class of categories of objects.
We assume that this class is a reasonably complete description of the similarity
classes of the set of emergent wholes that are produced by a set of
substructural elements (atoms).
Again, reflect on the example of the atomic elements, with its periodic
table, and chemistry. The level of observation of properties is separated.
Let T be a generalized product of some subsets, {a1 ,
. . . , aq} , of the set A of substructures:
T =
(a1 , . . . , aq)
that are observed to describe a complex set of properties P:
P =
{p1 , . . . , pr}
Suppose further that r = q and we know that for each i: i= 1, 2, . . .
, q-1
a1 Þ2 (O, p1),
a2 Þ2 (O, p2),
. . . ,
ar-1 Þ2 (O, pq-1),
Using the Canon of residues, we conjecture that ar Þ2 (O, pr).
There is also a context for this conjecture. The context is the set of
substructures involved in composing objects belonging to one of the categories
in C = { Ci }.
The Canon of Concomitant Variation
In this Canon we have descriptions of the properties of two objects A
and B.
Linkages are conjectured. Perhaps the objects are two winter storms A
and B and we are noting that two of the system observables seem to be
proportionately varying. The connection is observed by differences seen in a
common property. The cause of the variation in the property is conjectured to
be though a specific variation in the substructure.
Define a non-specific composition function comp(.) to be a
transformation of some set of substructural elements into a whole that has a
set of properties. We suppose here that the properties are all functional
properties of whole objects. We again suppose that structural / functional
relationships have some degree of independence; i.e., that the functional
properties are distinct and that, at least as a part of the whole, that
distinct structural components are composed into distinct properties.
This is expressed:
comp(d + c) ~ comp(d) + comp(c)
where ~ is the connective "is similar to", and d , c are
substructures. Of course, this is a strong assumption which is hedged by the
use of the similarity connective.
Let A and B have a complex of properties:
(p1 , p2 , …., pn-1 , comp(c)) Þ1 A
(pn+1 , pn+2 , . . . , pn+m-1 ,
comp(d)) Þ1
B
and the degree of the presence of substructures c and d is ordered. We
suppose that
c Þ2 (A, pn,),
and
d Þ2 (B, pn+m),
where pn = pn+m, is a common property shared by object
A and object B..
Let c+ and d+ denote an increase in c and d
correspondingly and c- and d- denote a decrease of c and
d. Since d is a substructure, d+ and d- maybe defined
either quantitatively or qualitatively (through substructural similarity analysis.)
Then if the situation:
(p1 , p2 , …., pn-1 , comp(c+))
Þ1 A
coincides with the situation:
(pn+1 , pn+2 , . . . , pn+m-1 , comp(d+))
Þ1 B
then we can say that c and d are directly related. A similar
relationship exists when comp(c-) and comp(d-) vary
directly to produce B and A .
In the opposite case, if the situation
(p1 , p2 , …., pn-1 , comp(c-))
Þ1 A
coincides with the situation
(pn+2 , pn+3 , . . . , pn+m-1 , comp(d+))
Þ1 B
we say that c and d are inversely related. A similar relationship exists
when comp(c+) and comp(d-) vary inversely.
Clearly, the above notation only begins to define the full set of
possibilities for an algorithmic calculus based on Mill’s reasoning. There must
be; however, some finesse in it’s application to complex problems. Mill’s logic
breaks down to the degree that the set of observables, both of properties and
substructures, are not composable into independent causal linkages. Moreover,
natural complex systems might not be fully reducible to independent causal linkages,
and a degree of skepticism is required regarding both reductionism and it’s
alternatives.
The problem we see is not the viability of complex descriptions of
bi-level causation, but rather that these descriptions must be situational in
nature. A viable situational form of extensions to Mill’s logic might be based
on behavioral evidence that natural systems behave more predictably in well
defined situational context.
Situational language and bi-level reasoning
Using our interpretation of QAT-like formal languages, we have
conjectured that the J. S. Mill’s method creates deductive machinery that is
situational in nature. Acting on this conjecture, we have initiated an
experimental program applying this interpretation to autonomous text
understanding. We use formal categorical properties of declarative human
message routing decisions in commercial and governmental activities [5]. We
assume that message routing decisions are produced by human inductive decisions
based on individual memories and personal awareness of a ground truth related
to the routing environment.
The apparatus of the QAT languages stores information and provides a
meta-formalism that does not depend on specific situations, since we have
developed a separate formalism that deals only with the "disembodied"
substructure of classes of objects. The methodology attempts to build a
complete set of representational symbols for sufficient reference to all formal
causes for any object property.
The representational problem is treated independently. This
independence is justified on practical grounds. First, the representational
problem is not solved perfectly by any known algorithmic system – nor even, it
is debated, by cognitive processes. Second, a certain amount of failure in
representational fidelity is compensated by adaptation.
In situational logics the logical atoms are specific to situational
classes. The object of analysis is assumed to be in a context that maps to one
of a known situational class. When the current situation can not be mapped to
the assumed situational class, then the logic must be recomputed from an
elementary re-measurement of class and substructure invariants. In this case,
either the representational fidelity or the logical formalism is inadequate.
The logic is bi-level. Object prototypes are considered as situational
classes, as are modal properties of the environment. Substructural elements are
also considered prototypes, but at a distinct level of organization that is not
locally meaningful to the situational classes of assembled wholes.
The meaningful subsets of representational elements have both internal
and external linkages, the discovery of which leads to one of many possible
situational logics. We interpret the internal linkages to be structural in
nature and the external linkages to be functional in nature. Two levels of
organization are identified and maintained in separated data structures.
Structural components are the cause of functional properties that
result from the formation of a whole that is greater than the sum of the
structural components. Water from hydrogen and oxygen is an example. The
compound, water, does not depend on having specific examples of an oxygen atom,
but rather any one of a class of atoms that is the prototype class for all
oxygen atoms.