Thursday, October 14, 2004

Manhattan Project to Integrate

Human-centric Information Production

Algorithm development using Orbs

(under development : 10/14/2004 9:01 PM)

Abstract

An algorithmic notation is presented relating Orbs and the Hilbert encoding, e.g. the 2003 Primentia patent. The Orb encoding of the voting procedure (Prueitt 1995) allows a sorting of textual units into the categories of occurrence expressed as a voting policy.

(See appendix to Knowledge Foundations)

The notation suggests a provably optimal process that works with any quantity of textual units. The algorithm performs the Prueitt Voting Procedure in what might be provably optimal time. The result is both routing of textual elements based on a map of concepts, as well as an evolutional set of concept map artifacts that can be situational or universal.

The voting procedure produces an adjusting set of categories of occurrence at three levels of organization, consistent with the foundational concepts in mathematical formulations on stratification theory. At the lower level the set of categories, called a category policy, are composed of elements derived from the measurement of

1) letter n-grams;

2) a compound of, 2 3 or 4, letters; acquired using a rule set and generalized n-gram algorithm; or

3) word level n-grams as in the CCM patent

4) generalized work level n-grams

These elements are encoded into the simple Orb construction, i.e., into a set of “syntagmatic units” (Pospelov’s term):

{ < a, r, b > }

As Ballard has remarked on many occasions, the more general construction is an n-ary in the form:

{< r, a(1), . . . , a(i) > }

with some dependencies on the nodes, { a(j) | j = 1 to n }, and on the relationship r.

The n-aries are encoded into either a Berkeley hash table management system or into a Hilbert-type encoding developed by myself.

The atoms of the n-aries, the a(i)s, are ordered and encoded into a “loaded number”. By this, we mean that, like hash tables, the data encoding has the form:

( number | container )

In the Hilbert encoding, the number and the container both have a known size, and thus the set of Orb elements, what we call “loaded points”, can be laid out sequentially in real memory, or in virtual memory. It is due to this fact that the set membership problem is solved in a provably fewest steps.

The set membership problem

The set membership problem is stated in the following way. Given an ordered set of elements and an element that is randomly selected from all possible elements for this set, discover if the randomly selected element is in the ordered set and if so provide the location of that element.

Note 1: There are a number of reasons why this set membership problem is hard to state in the abstract. The first reason is that the set of all possible element of a set is a troublesome concept. In practice terms one is simply trying to say that a set can have only certain types of elements. The set of words cannot have apples in that set. In fact an apple cannot ever be in a set because an apple is a non-abstract thing and a set is an abstraction.

Note 2: A set and an ordered set are usually regarded as being two different things. In the theory underlying the Hilbert encoding, one shifts back and forth between regarding a set as being an ordered set and a set as being a set. The issue is that the way we order the elements of the set is by categories that collapse all occurrences of the same thing into a set “element”. In category theory, we actually allow any random element of a category to stand in for all elements of the category. The “loaded points” are exactly these representatives of the category. Example, the ordered set { run, run, quick } is collapsed to the ordered set { run, quick }. One consequence of this “collapsed ordered set” is that the size of the Orb “ordered set” is the same as the size of the set of Orb elements [1].

The ordering of the categories in the form of these points creates a formal construction, and this formal construction has a precise relationship to a means to map the loaded points in computer memory. We can take an illustrative example. Suppose that the set of points are the even integers between 2 and 1000.

[1] One simply has to play around with the very elementary notions here to convenience oneself that there is in fact something that is being talked about here. It may not seem that this is important at first. But the proof on the optimality of the Hilbert encoding depends on these distinctions.