2/22/2004 9:30 AM


I'm looking for a lossless compression text format that finds repeated words or patterns in a text and stores them in a dictionary. In the body of the text the words/patterns are 'transcluded,' to use a new word, by reference to the dictionary. In other words, if you have the word 'ontology' repeated in a text (or web/wiki page/site) 1000 times - you only write it out once in the dictionary.


In the text body is just the ID# of the word 'ontology.' when you read the text, the word is there (transcluded), not the ID# of course. If you change the word in the dictionary, every location of that word in the text is changed.



Hi Ken,


Would you have an interest Ken, (even conversantly) in a bare-minimum parser that would run well in a document reader on a Pocket PC - re the content of your Ontology forum query below?


The use would fit into a larger picture of an eTown Hall and/or eGovernment with a full text retrieval of the history of information flow through the virtual collaboration (government).


Your question nearly states the run-time category engine I've worked with in the past, and hope to rev up again as a Pocket PC project using the a Microsoft development program (WinCE).


Your art is really nice!  Such a full categorizing parser can also render visual abstractions that are defined by the as-found categories with frequency information from the category "hits".


The visual abstraction can produce an iconic-language of the reality-bound category parser, where a human need to learn the significance of the patterns (by definition of semiotics) is all that is required to place one very close to a semiotic processor of a reality-bound event categorizer.  The last chore is to learn the middle grammar of useful "verbs" that fill in the middle between events and assumptions of a measured reality.


This seems to be close to what I think is the norm for forming a taxonomy of words from documents of a community of practice.  The verbs then are the functions of the ontology that emerges --if I get the picture right :)