[32]                               home                            [34]

  ORB Visualization

(soon)

 

 

Instant Index - Python Library (II-PL)

 

Knowledge Technology Toolkit for Kids CD

 

We are working on the visual icons for the II-Python Behavioral model.  The set of visual icons, when completed, will be an iconic descriptive enumeration of the behavioral aspects to the II-PL.  This set of icons are to be geared towards the open curiosity of kids (K – 12), and will be used in animating the log files developed as the II-PL product is used. 

 

BCNGroup Discussion Forums > Development > Python >

Python timing and function call logging

 

By studying the original code, developed by the inventor Ted, for the II core engine, we isolated fourteen functions into three types: index creators (5) ,  index searchers (5) , and retrieval verification (4). 

 

The Instant Index core engine has a Python interface that creates objects and then using these objects to interact with the core engine. 

 

The descriptive enumeration of interaction events is exactly this two level taxonomy

 

Figure 1:  Fourteen Functions

 

The core Instant Index shared object/library for Python, for creating and searching indexes includes the following object (  to be described  ).


 

High level description  

 

A shared II-Python library of objects allows programmers to use the Instant Index engine capabilities within their Python programs.  The II core engine is used as a black box. 

 

The internal processes are preserved as a Trade Secret of Instant Index, and is illegal to publish if reverse engineered.   (This is because the Trade Secret is a property and publishing this property would be a form of theft.) 

 

The shared II-Python library is, however, public property and may be used within the standards of the Open Source software common law.  The shared library supports the following tasks

 

1.     creating indexes,

 

2.     searching indexes, and

 

3.     retrieval verification

 

 


Index Creation (Five Functions)

 

Index Creation One (C1) :  

_iindexpy.new_iindx99a(indexdir, tripletsfile)

Tells InstantIndex you wish to create an index file. You must place this into a variable to use within other index creation related functions. See above synopsis for example.

Arguments

          indexdir – the location to put the index files

          tripletsfile – the location of a required file called  triplets.txt

Index Creation Two: 

_iindexpy.iindx99a_lowmem_invert_set(indexobject, flag)

Sets the inversion type for indexing, where type is between fast processing/high memory usage, and slow processing/low memory usage.

          Arguments

                   indexobject – the variable the new index was placed into

                   flag

0 for fast inversion (takes up more memory during process)

                             1 for slow inversion (takes up less memory during process)

Index Creation Three:

_iindexpy.iindx99a_selectIndexSize(indexobject, flag)

Change the size of the resulting index files as a percentage of the original set of indexed files.

Arguments

                   indexobject - the variable the new index was placed into

flag

0 to make the index size 7% of the original files indexed (best for text sets under 500MB in size)

1 to make the index size 14% of the original files indexed (best for text sets over 500MB in size)

Index Creation Four: 

_iindexpy.iindx99a_setup_fnlist( indexobject, filelist)

          Tell InstantIndex which files to index

          Arguments

                   indexobject - the variable the new index was placed into

                   filelist – a list of fully qualified files to index

Index Creation Five:  

_iindexpy.iindx99a_index_create( indexobject,  indexname )

          After setting all the above variables, create the actual index.

          Arguments

                   indexobject – the variable the new index was placed into

                   indexname – the title/name of the index being created

 


Index Searching (Five Functions)

 

Index Searching One: (S1) 

_iindexpy.new_iisS99a ( indexdir, tripletsfile )

Tells Instant Index you wish to create a new search. This must be placed into a variable for use in other searching functions.

Arguments

          indexdir – the location to put the index files

          tripletsfile – the location of a required file called  triplets.txt

Index Searching Two: (S2) 

_iindexpy.iisS99a_OnSelectDiskIndex (searchobject, base)

Select a previously created index for searching.  One can use several indexes within one search operation. 

          Arguments

          searchobject – the variable your new search was placed into

base – when an index is created, it is given a base number, starting at 0 for the first index in that directory, and incrementing by one for every index created after that.

Index Searching  Three:  (S3) 

_iindexpy.iisS99a_ucodeflag_set( searchobject, flag )

          Set whether or not Unicode is being used by the program for searching.

          Arguments

          searchobject -  the variable your new search was placed into

                   flag

                             0 to define that Unicode is not in use

                             1 to define that Unicode is in use

Index Searching Four: (S4) 

_iindexpy.iisS99a_getSegment( searchobject, file, offset, size )

          Retrieves a segment of text to display the area of text a hit occurs.

          Arguments

          searchobject -  the variable your new search was placed into

file – The first element of an unverified hit, after splitting it. Signifies the filename the hit is found in

offset – The second element of an unverified hit, after splitting it. Denotes the offset of the body of text which the hit occurred

                   size – The size of the segment of text (in bytes) to retrieve

Index Searching Five:  (S5)

_iindexpy.iisS99a_OnSearch (searchobject, listlen, tokenlist )

Sets the strings (set of words or characters) to search for. This function returns a number less than 0 (< 0) if over 5,000 Unverified hits are returned

          Arguments

                   searchobject -  the variable your new search was placed into

                   listlen – the length of the list of tokens used in the search

                   tokenlist – a list of the search terms


Retrieval Verifications (Four Functions)

 

Retrieval Verification  One: (V1)

_iindexpy._iisS99a_slowoverride_set(searchobject, flag)

Tells the search process whether or not to continue searching regardless of the number of unverified hits. If this is not set, the search will stop after receiving over 5,000 unverified hits.

Arguments

          searchobject -  the variable your new search was placed into

          flag

                   0 do not continue search if there are too many unverified hits

                   1 continue the search regardless of the number of unverified hits

 

Retrieval Verification Two: (V2)

_iindexpy.iisS99a_getuvhits (searchobject)

Get list of unverified hits. These need to be placed into a list (see synopsis). Each hit is placed into the list in the format:

 file offset

          Arguments

                   searchobject -  the variable your new search was placed into

 

Retrieval Verification Three: (V3)

  _iindexpy.iisS99a_verify_by_sector(searchobject, file, offset, tokenlist)

Verify a specific hit, where all search terms occur within a 2048-byte sector. Returns 1 if the hit is verified, and 0 if it is not.

Arguments

          searchobject -  the variable your new search was placed into

file – The first element of an unverified hit, after splitting it. Signifies the filename the hit is found in

offset – The second element of an unverified hit, after splitting it. Denotes the offset of the body of text which the hit occurred

tokenlist – List of search strings used during original search

 

Retrieval Verification Four: (V4) 

 _iindexpy.iisS99a_verify_by_proximity(searchobject, file, offset, tokenlist)

Verify a specific hit, where all the search terms occur within a 2-line proximity. Returns 1 if the hit is verified, and 0 if it is not.

Arguments

          searchobject -  the variable your new search was placed into

file – The first element of an unverified hit, after splitting it. Signifies the filename the hit is found in

offset – The second element of an unverified hit, after splitting it. Denotes the offset of the body of text which the hit occurred

tokenlist – List of search strings used during original search

 


Process model  (this process model is a separate enumeration from the functional enumerations… and will be developed to give a different view of the process involved in creating and using II-PL.

 

First process

# create new index object

 

Second process

# set index size, inversion type, and files to index

 

Third process

# create an index with indexname as it’s name

 

Fourth process

# create new index search object

 

Fifth process

# select index to search

 

Sixth process

# don’t use unicode

 

Seventh process

# split the search string into alist and execute the search

 

Eighth process

# tell search object to ignore number of unverified hits

 

Ninth process

# pick up list of unverified hits


Tenth process

#verify the hits

 

Eleventh process

# delete search object