Referential Bases; Part II


Continuation of the Paper:

Foundational Paper on Referential bases


Paul S Prueitt, PhD

November 1, 2006



One of the Founders wrote:




I'd like to point out that you're talking about two well known concepts

in computer science in here.  First, binary search is the name of the

method for successively choosing the middle element in an ordered array.

(Of course, this is used in binary insertion of elements as well.)


Second, mapping strings to unique numbers:


for example, given 26

letters, one can map any string to a unique number of the form


n1 * 26^0 + n2 * 26^1 + n3 * 26^2 + ...


<end quote>


First we make the observation that the legacy problem carries with it processor and processor instructions based on the cost of memory and on the very first generation operating systems, like DOS. The year 2000 problem was only one example of legacy problems. 


In binary search in general, there is often an index.  If the data is already integers, then there is no reason to regard the string as a number expressed with the elements of the alphabet (which ever alphabet is being used).  But if one does this, then there is a technical infringement on the 2002 Gruenwald patent.


Let us set the ownership issues aside.  The steps (machine cycles) required to use the index are sometimes "many".  This "many" introduces both conceptual issues in the design of information systems, but also is multiplied by scalability issues as the number of transactions grows into the trillions.  The best system that one gets using indices has an overhead that eventually breaks the system down. 


We (OntologyStream Inc and BCNGroup Inc) are looking to use a derivative of keyless hash (of our own design) to (locally) control the encoding of data patterns and the decompression of these patterns in support of streaming high definition video.  We cannot achieve the scalability needed if there are any indexes. 


In the ".vir" standard, the encoding and decoding processes also have to communicate with a micro-transaction recording (accounting) system, which has to also then communicate into a hierarchical system for aggregating information on the use of the digital objects (providing provable digital rights management - world wide).  The hierarchical system has a trillion transactions per hour (conceivably).  Indexed and centralized systems cannot even imagine doing this.   


And then there is the work on the measurement of the entire field of transactions using the SLIP (shallow link analysis and iterated parcelation) techniques that are now part of some cyber security systems (based on work given to TASC in 1999).  Non-published work on a theory of vulnerabilities and threats using the SLIP techniques is maturing.  This work supports the claim that standardized “.vir” instrumentation will provably identify precisely where digital property is in violation of licensing agreements.  This transparency opens the door to new types of positive commerce. [1]   


During the aggregation, the “.vir” standards use stratified theory and category formation into "n"-ary ontological models of the patterns in the data stream. [2] We cannot have indices producing artificial data invariance.  We have to measure the structure of the data without any other over head structure.  This measurement has to be pure, or else the "compression of data invariance" into relevant information structures cannot be done.  The problem is the removing the invariance due to indices is a huge artificial task that must be avoided.  (It might be done using noise in the channel techniques, but why introduce this problem?)


Just look at Protege and the OWL standard, the premier ontology tool invested-in by DARPA and the W3C.  The effort is all about getting the GUI to work when the underlying standards are based on a very restrictive triple - when a non restrictive "n"-ary is needed.  The OWL foundation is wrong and overly complicated.  So the system produce confusion and dysfunction - which is precisely what is of great value to the software vendors - since this insures continuation funding to solve artificial problems (like the data non-interoperable problem). 


The structure of information and the machine loops required to perform the binary search (say on a column) has some overhead.  Let us just take the column in a database and see how many events occur in completing a binary search. 


Now let us go to a pure one dimensional array, a vector.  Here the binary search would have to start out with rewriting the elements of the vector into the natural order as determined by the contents of the cells of the vector.  To continue to have information about the original position one changes the vector cell to a container with a front and back part.  The front has the content and the part is a number that is the location information.  


But to "regard the database or hash table encoded string as a base 64 number for the purposes of organizing data" is an infringement of the Gruenwald patent. I have written about this else where.


the current legal fact is that anyone actually doing this is in technical violation of the law.   (This may seem interesting, but I claim that the legal facts on the ground as of today would make the use of this representation the same as the theft of physical property.)  There are many millions of programs (perhaps billions) that make this infringement. 


The patent is the reason why I talk about optimality in " Foundational Paper on Referential Bases ". 


The mere replace ment of the hash function with a change in the definition of the string, from a text string to a integer is the core start to the Gruenwald patent.  There is more to the Hilbert Technology Inc technology


then the simplification to an provably optimal data encoding (where provably is defined within the ".vir" standard) and this work is very good.  But the problem remains that a direction to improving information systems is blocked by the ownership issue. 


One of my problems is that there cannot, one assumes, be an open and completely informed discussion about this due to the proprietary issues.


However, let us discuss this further.  Being "correct" is essential to the capitalization of Orb and Rib technology as part of the ".vir" standard. 

[1] Prueitt, Paul S.  (2006) “The Coming Revolution in Information Science”, at URL:

[2] Prueitt, Paul S.  (2004) “Global Information Framework and Knowledge Management”