Appendix

Description of the Minimal Voting Procedure (MVP)

First Published, in Moscow Russia: 1997

Description of the Minimal Voting Procedure (MVP)

First Published, in Moscow Russia: 1997

Appendix: Description of the Minimal Voting Procedure (MVP)

To instantiate a voting procedure, we need the following triple < C, O

·
A set of categories C = { C_{q} } as defined by a training set O_{1}.

· A means to produce a document representational set for members of O_{1}.

· A means to produce a document representational set for members of a test set, O_{2}
.

· A means to produce a document representational set for members of O

· A means to produce a document representational set for members of a test set, O

We assume that we have a training collection O

O_{1} = { d_{1} , d_{2}
, . . . , d_{m} }

Documents that are not single passages can be substituted here. The notion introduced above can be generalized to replace documents with a more abstract notion of an "object".

Objects

O = { O_{1} , O_{2} ,
. . . , O_{m} }

can be documents, semantic passages that are discontinuously expressed in the text of documents, or other classes of objects, such as electromagnetic events, or the coefficients of spectral transforms.

Some representational procedure is used to compute an "observation" Dr about the semantics of the passages. The subscript r is used to remind us that various types of observations are possible and that each of these may result in a different representational set. For linguistic analysis, each observation produces a set of theme phrases. We use the following notion to indicate this:

D_{r} : --> { t_{1}
, t_{2} , . . . , t_{n} }

This notion is read "the observation D

We now combine these passage level representations to form a category representation.

Each "observation", D

D_{r} : --> T_{k }=
{ t_{1} , t_{2} , . . . , t_{n}}

Let A be the union of the individual passage representational sets T

A = Union T_{k}.

This set A is the representation set for the complete training collection O

The set A can be partitioned, with overlaps, to match the categories to which the passages were assigned. Let T*

T*_{q} = Union T_{k}
such that, d_{k}, is assigned to the category q.

The category representation set, T*

The overlap between category representation T*

J. S. Mill’s logics relies on the discovery of meaningful subsets of representational elements. The first principles of J S Mill’s argumentation are:

1. that negative evidence should be
acquired as well as positive evidence

2. that a bi-level argumentation should involve a decomposition of passages and categories into a set of representational phrases

3. that the comparison of passage and category representation should generalize (provide the grounding for computational induction) from the training set to the test set .

2. that a bi-level argumentation should involve a decomposition of passages and categories into a set of representational phrases

3. that the comparison of passage and category representation should generalize (provide the grounding for computational induction) from the training set to the test set .

It is assumed that each "observation", D

This general framework provides for situational reasoning and computational argumentation about natural systems.

For the time being, it is assumed that the set of basic elements is the full phrase representational set

A = Union T_{k}.

for the training collection O

Given the data:

T*_{q} for each C _{q}
, q = 1, . . , n

and the representational sets T

This hypothesis will be voted on by using each phrase in the representational set for D

1. does an observation of a passage, D_{k},
have the property p, where p is the property that this specific
representational element, ti , is also a member
of the representational set T*_{q} for category q.

2. does an observation of a passage, D_{k}, have the
property p, where p is the property that this specific representational
element, ti , is
not a member of the representational set T*_{q} for
category q.

2. does an observation of a passage, D

Truth of the first inquiry produces a positive vote, from the single passage level representational element, that the passage is in the category.

Truth of the second inquiry produces a negative vote, from the single representational element, that the passage is not in the category. These votes are tallied.

Data structure for recording the votes

For each passage, d

Each element t

a_{i,j }= -1 if the phrase is
not in T*_{q}

or

a_{i,j }= 1 if the phrase is
in T*_{q}

Matrix A

This linear model produces ties for first place, and places a semi-order (having ties for places) on the categories by counting discrete votes for and against the hypothesis that the document is in that category.

A second data structure to record weighted votes

A non-linear (weighted) model uses internal and external weighting to reduce the probability of ties to near zero and to account for structural relationships between themes.

Matrix B

b_{i,j }= a_{i,j }*
weight of the phrase in T_{k}

if the phrase is not in T*

or

b_{i,j }= a_{i,j }*
weight of the phrase in T*_{q}

if the phrase is in T*

This difference between the two multipliers is necessary and sufficient to break ties resulting from the linear model (matrix A

Data structure to record the results

For each passage representation and each category, the tally is made from the matrix Bk and stored in a matrix C having the same number of records as the size of the document collection, and having h columns – one column for each category.

The information in matrix C is transformed into a matrix D having the same dimension as C. The elements of each row in C are reordered by the tally values. To illustrate, suppose we have only 4 categories and passage 1 tallies {-1214,-835,451,1242} for categories 1, 2, 3 and 4 respectively. So

cat1 --> -1214, cat2 --> -835,
cat3 --> 451 and cat4 --> 1242.

By holding these assignments constant and ordering the elements by size of tally we have the permutation of the

ordering ( 1, 2, 3, 4) to the ordering (4, 2, 3, 1).

( 1, 2, 3, 4) --> ( 4, 2, 3,
1).

This results show that for passage 1, the first place placement is category 4, the second place is category 2, etc. The matrix D would then have (4, 2, 3, 1), as its first row.