Saturday, February 11, 2006
Generative Methodology Glass Bead Games
On using RDF to model web services
Link into the discussion on the Rosen eforum -> [167]
Index to sections
-------- Closed World Assumption verses Open World Assumption
-------- Non-monotonicity vs Monotonicity
-------- Approach to the solution
Notes by Paul in italics
Hi Paul,
as usual, this reply is very verbose and pedantic but it is the only way I am able to give an effective (I hope) explanation to what I think are the answers to your questions. I know someone does not like to be overwhelmed with words, so please, if I am saying something that you already know or that you find straightforward, then skip to the end and proceed backwards to find the motivations of my statements :-) [1]
OK, I am not sure it is not a matter of theory, as you say.
At least two theoretical issues are involved here: the first is monotonicity vs non-monotonicity, and the second is the so-called "Closed World vs Open World Assumption". I will first address them separately, but it will turn out that they actually are intertwined. [2]
Adopting the Closed World Assumption (CWA) means that, by giving a set of statements (an RDF document, say) S and a set of inference rules R (which you can think to be obtained from a set of semantic conditions), you are implicitly specifying a total truth-value assignment to the set of all possible statements P.
Let's call “I” the set of statements that contains all of the statements in S plus all of the statements which can be proved to be true by applying inference rules in R to statements in S (formally, the closure of S w.r.t. R). Then, if the CWA is adopted, any document containing a set of statements defines a *total* function isTrue:P-->{T,F) such that every statement in S' is assigned the truth value "TRUE", and every statement in (P-S') is assigned the truth value "FALSE". There is no "uncertainty".
On the contrary, under the Open World Assumption (OWA), it is never the case that a statement is considered to be "FALSE" just because you cannot prove (or, as a particular case, just because you do not have explicitly asserted) that it is "TRUE". [3]
In order to say that, you would have to prove that it is actually "FALSE" [4]. A straightforward example:
ontologystream:Paul mynamespace:knowsAbout rosentheory:complexity
computerscience:Andrea mynamespace:isFriendOf ontologystream:Paul
in the above, ontologystream, mynamespace,
rosentheory, computerScience are all namespaces where what follows after the
“:” is supposed to be defined within a context where the term is unique within
that namespace.
What does this document tell? If you want to give an "a priori" answer, it tells the things it asserts. Not that great. Now, if we adopt the CWA, then we are able to conclude much more things than those it asserts: for example, that it is "FALSE" that "computerscience:Andrea mynamespace:knowsAbout rosentheory:complexity", and that it is also "FALSE" that "ontologystream:Paul mynamespace:isFriendOf computerscience:Andrea", because these statements are neither asserted nor inferred (we have no inference rule yet).
Now, let's keep our CWA assumption but add some inference rule. Let's suppose that friends of a person know about everything that person knows about [5] (because of communication, say), and that friendship is a symmetrical relationship [6]. Then, you can now assign a value of "TRUE" also to statements "computerscience:Andrea mynamespace:knowsAbout rosentheory:complexity" and "ontologystream:Paul mynamespace:isFriendOf computerscience:Andrea", and "FALSE" to all others. [7]
Instead, under the OWA, you can say nothing of what is neither asserted nor proved by inference rules: mathematically, the above isTrue function is not total, indeed it is undefined.
Now, given an RDF document, it is important to recognize that while the choice of the inference rules is a "late binding" in the sense that semantic conditions are not a property of the document but rather of the answering engine (as I told you in a previous post), with regards to the OWA vs CWA issue there is an "early binding", in the sense that everything you write in RDF syntax *must* be seen under the OWA.
I am slowly approaching your questions, please be patient. At this point, I need to generalize. First, let's define function truthValueOf:P-->{T,F,U} (where U stands for "UNKNOWN") as:
truthValueOf(s) = "U" if isTrue is not defined on s,
truthValueOf(s) = isTrue(s) otherwise
And let's introduce the monotonicity issue. Suppose to have two sets of RDF statements (i.e. RDF documents) T1 and T2, T1 being a subset of T2, and denote function truthValueOf for document T1 as tv1, and function truthValueOf for document T2 as tv2. Then, you have a *monotonic* logic whenever
for all statements t, (not(tv1(t)=tv2(t)))->(tv1(t)=U),
that is, whenever adding statements *does not change* the truth value of any statement which was already known to be "TRUE" or "FALSE", but can change the truth value of statements that was "UNKNOWN" to either "TRUE" or "FALSE". In other words, adding information only reinforces your certainties.
Viceversa, in a non-monotonic environment this not always occurs: in such a setting, in fact, adding new statements to a document could change the truth assignments for statements that were previously known to be "TRUE" or "FALSE".
To illustrate the difference, the following example is commonly used. Suppose one is in a monotonic setting, and to have the following set of statements (no need to be RDF, it's a general issue):
"Birds can fly. Penguins are birds. Jack is a penguin."
From them, you can infer that "Jack can fly". Statement "Jack can fly" assumes value "TRUE". Now, suppose you want to model an *exception*, that is birds *usually* can fly but penguins cannot, and add statement "Penguins cannot fly": from this, you can infer that "Jack cannot fly". However, in a monotonic setting, statements that was previously "TRUE" remain "TRUE", so having both "Jack can fly" and "Jack cannot fly" assigned to "TRUE" yields to an inconsistency. Full stop. Your document is nonsensical.
In a non-monotonic setting, instead, you are allowed to assert something like this:
"By default, birds can fly. Penguins are birds. Jack is a penguin."
And this is equivalent to tell your answering engine "well my friend, if you cannot otherwise prove that a bird cannot fly, then please assume that every bird can fly". Here, you are establishing an inference meta-rule. Now, suppose to add the sentence "Penguins cannot fly": no inconsistency arises, because your engine can now "retract" the previously inferred statement "Jack can fly" (more precisely, it can reconsider its assignment to value "TRUTH") and you don't have as a result that two conflicting statements are both evaluated as "TRUE". This process is called "default reasoning". [8]
Ah, eventually, I come to the point: what does this stuff has to do with your questions.
In case of "Action" and "Event", you say
"I have an 'entity' called 'Action' with what the reference document terms 'a parent' of'Event'", and "Some of the attributes of Action are not attributes of Event".
Assuming that 'parent' means rdfs:subclassOf, that is to say "all Events are also Actions", then what you have is a non-monotonic modeling. BY DEFAULT, Actions have this and this and this property, BUT some of them do not apply to Events (remember? BY DEFAULT, birds can fly, BUT that does not apply to penguins).
Non-monotonicity is a convenient facility, but unfortunately (for you, obviously there are motivations for having monotonicity in RDF) it is not trivial to rewrite a model written for a non-monotonic environment into an equivalent model written for a monotonic environment, where for *equivalent* I mean that their truthValueOf functions are identical (assign the same truth value to all statements).
So, in order to correctly map your specification into an document RDF, you have to transform it so that it assigns every sentence involving Events, Actions, their properties, and so on, the same truth values they are assigned in the non-monotonic, original environment. [9]
The good news is that this is a very general problem, and that design patterns exist. I am not a guru, though. So, I just give you some example in order to let you understand the nature of the problem, but please do not expect that I can give you the universal solution or algorithm because I don't even know whether it exists.
The rules or thumb for converting OO data schema to OWL would be useful if these can be enumerated.
I had come to feel the same as your note below expresses regarding there being "no" OWL type subclass relationships in the OO paradigm. The concept in OO is that there is an object with private data. Messages are sent between objects. There is an object hierarchy possible. But, perhaps the object hierarchy has more to do with the model of how the computer is working (like the GUI) then the "external world". A certain type of GUI window is a more basic window with some changes to the internal data and the behavior (methods). On the other hand, the OWL class hierarchy is designed specifically to support description logic.
Frames would seem to be more consistent with the OO model, where slot = a property that is a relationship. But again, it is may understand, likely wrong but I do not know how to correct this understanding, that the Protege Frames notion of a frame is not the same as the notion of a frame (a context) that was developed by Frank Schank.
http://www.informatics.susx.ac.uk/books/computers-and-thought/chap3/node9.html
So, it might be that the rules for mapping between OO and OWL would be to flatten the set of "objects", in this case the SOA entities, into a set of classes having no subsumption relationships. Then two types of properties would be defined. One type would be restrictions such as functional and inverse function restriction on individuals? Many of the class restrictions (if I am saying this correctly) like disjoint intersection might not come up in the mapping rules.
One consequence of working through all of the issues regarding the SOA IM model, is that we might be able to address the question about "is there any implicit information that can be shown using a OWL reasoner, once the rules for mapping have been applied to a OO model?"
At this point I cannot even guess. In spite of the perhaps, in some respects, different intended uses for OO and OWL, both are used to organize data, persist data, and offer various means to use the data selectively.
A foundational and informed paper on this mapping task is needed.
More extended discussion at à [4]
[1] The purpose of a colloquy
is to expose the issues in a scholarly way.
This requires the type of extended discussion that you are presenting
and we are appreciative of this extended discussion.
[2] Yes, this is of course true
that the debate has been couched in this way.
There is the appearance of completeness since there is one side and then
the opposing viewpoint. But a third
viewpoint exists outside of this debate, and should not be shaped by the
specifics of the debate between these two intertwined theoretical issues. The third viewpoint is informed by this
debate, but not shaped by it, because the third viewpoint states a definition
of natural complexity as being something that cannot be reduced “perfectly” to
formalism of classical logics, the so called expressive logics, or Hilbert type
mathematics.
This is the position of the second school
of semantic science. The intertwined
two theoretical issues that you raise are those of the first school. The close verses open world discussion does
not allow the principled discussion of complexity as defined by Rosen, because
the emphasis in this discussion is about the assertion that the close world
assumption can make sense as a complete theory of the natural world. Once this discussion stops, then the consequences
of Rosen’s definition of natural complex can be explored without getting
derailed by the debate occurring within the first school.
The second school must avoid stepping into
that debate, and this avoidance is very difficult.
[3] Here is precisely where the
polemic starts. Some one in the second
school would never say that the open world assumption does not allow false
statements. Certainly this is not the
interpretation that someone in the second school would allow to start the
“definition” of the position of the second school.
Openness means, to the second school, that
an induction has to occur to measure the natural world, and that this induction
has to occur in real time based on the functioning of quantum neuro properties
as discussed by Hameroff and Penrose and others. Formalism is a subject that comes in quite a lot later in a
discussion of the consequences of cognitive category formation, and the
resulting definition of formalisms like logic and mathematics.
[4] This attempt to prove
something false is not how the second school approaches the question of
reification or the acceptance of a statement as having interest to (future)
science. The second school might point
out that the reduction of statements of truth, such as “I love my wife” are
deeply problematic. But this point
draws the second school into the first school’s polemics. The point is that the true alternative to
the first school does not approach the question of truth is a simple
fashion. Precision and exactness is ok
in cases where the dynamics (the entailment) is simple (as defined by
Rosen). But in cases where the
entailment is complex, one has to regard a “demand” for precision and exactness
with reservations. The reservations are
not merely hotty, but are from a mature viewpoint about the limitations of
formalism and the nature of reality.
[5] This is, in practical
terms, a hypothetical. In diplomatic
conversation there is often an unwillingness to understand what the other’s
person is saying.
[6] In fact, a formal property
of symmetry regarding any two “friends” is a truncation of the reality that
actually exists between these two individuals.
In a strong way, one can say as Wittgenstein did, that there is a
language game and that formal inference rules of this type are used only as a
convenience. In the Blue and Brown
Books (later Wittgenstein) he talks about patterns that a key chain makes when
throwing on a table. Like Hostettler
later on, this part of the Blue and Brown books talks about how similarity of patterns is the key to
understanding how categories form that are then used to point to similarities
and differences. This type of “logic”
is then extended by quasi axiomatic theory (Finn and Pospelov) to a analysis of
the similarities of function given similarity of structure in biological
expression, see:
[7] This is the key criticism
of the first school, by the second school, e.g. that this inferred assertion is
taken as a computed truth. The problems
only start with taking the result of computation as an asserted truth. But rarely can the first school get beyond
this first problem.
[8] See the works by Alan
Rector on the problem with OWL definition of exceptions and defaults.
[9] This is exciting. In my mind one has to start out with all
concepts of a ontology as having no sub-subsumption relationships, ie no class
– sub-class distinctions. Then the properties
can be defined as the attributes are defined in the OASIS SOA-IM model. The properties can be aggregated into
categories using “sameAs” and differentFrom” properties on properties. The resulting properties now have almost
the sense of slots within the frames concept of Schank. This frames concept of Schank is overwritten
in Protégé-frames to allow the class – sub-class distinction to be
introduced. As a result the original
insights of Schank are over written by the Protégé paradigm.