SergeyBPshenichnikov 23 апр 2021 в 13:01

Context category

12 мин

1.4K

Поисковые технологии*Семантика*Алгоритмы*Natural Language Processing*

Перевод

Similarity and sameness

The mathematical model of signed sequences with repetitions (texts) is a multiset. The multiset was defined by D. Knuth in 1969 and later studied in detail by A. B. Petrovsky [1]. The universal property of a multiset is the existence of identical elements. The limiting case of a multiset with unit multiplicities of elements is a set. A set with unit multiplicities corresponding to a multiset is called its generating set or domain. A set with zero multiplicity is an empty set.

The problem is determining whether the elements are the same. The similarity depends on the properties of these elements that are taken into account. Cucumbers and watermelons are similar in color externally, but it is difficult to call them the same in gastronomic use, although the botanical description is largely the same.

According to G. Frege, any object that has relations with other objects and their combinations has as many properties (values) as these relations. The part of the values taken into account is called the meaning that the object is represented in this situation. The name of an object by a number, symbol, word, picture, sound, gesture for its short description is called an object sign (this is one of the values).

All possible parts of the object’s values (meaning) correspond to a single sign. This is the main problem of recognizing meaning, but at the same time the basis for making do with minimal sets of characters. It is not possible to assign a unique sign to each subset of values. The objects of information exchange are the minimum sets of characters (notes, alphabet, language dictionary). The meaning of signs is usually not calculated, but determined by the sign contexts (neighborhoods) intuitively.

The solution to the problem of ambiguity of signs is the semantic markup of the text. The semantic markup can be explained by the example of extreme unambiguity. On Russian accounts, the text is a sequence of identical characters (knuckles). According to [2], the dictionary of such a text consists of one word. It is impossible to use such texts without semantic markup. Therefore, the dictionary changes, and the characters are divided into groups – units, tens, hundreds, etc. These group names (numbers) are unique word numbers. The dictionary D is the numbers from 0 to nine. Each knuckle is represented by a matrix unit on such a Cartesian abacus. For example, the number 2021 on a matrix abacus is represented by the sum of four matrix units:

$E_{1000,2}+E_{100,0}+E_{10,2}+E_{1,1},$

where the subscripts are the Cartesian coordinates of the matrix word (numbers in this case). There was a transformation of identical objects into similar ones. The measure of similarity is the values of the coordinates of the words. In addition to positional numbers, repetitions of numbers from the dictionary occur when performing arithmetic operations. Equivalence relations are established:

$E_{i,0} \sim E_{i+1,1}$

If, after an arithmetic operation, the number 9 + 1, is obtained, then 0 appears in this position, and 1 is added to the next digit. On the abacus, all the knuckles are shifted to their original (zero) position, and one is added to the next digit (wire). On the matrix abacus, the transformation is performed:

$E_{i+1,1}=E_{i+1,i}E_{i,0}E_{0,i},$

If you set a measure of the similarity of signs, then the tolerance (similarity) ratio can again be turned into an equivalence (sameness) ratio for this measure. For example, by rounding numbers. The difference between tolerance and equivalence can be recognized by the violation of transitivity. For a relationship of tolerance, it can be violated. For example, let the element A be similar to B in one sense. If the meaning of B does not coincide with the meaning of the element C, then A can be similar to C only in terms of the intersection of their meanings (part of the properties). The transitivity of the relationship is restored (closed), but only for this general part of the meaning. After the sameness achieved by specifying the meaning, A will be equivalent to C. For example, the above transformation (closure) on some coordinates ensures the execution of arithmetic operations on the matrix abacus.

Another example of the contextual dependence of signs is chess. It is even stronger in double chess [3]. In this modification of chess, it is allowed to make a finite number of double moves during the game at any given time. The game remains consistent. The rest of the rules are the same as in normal chess, with the exception of two: the first move is a single move and castling is allowed during the check. The author of the game in the case when all the moves are double is prof. Zaitsev G. A.

For chess, the dictionary of their matrix text is the numbers of one of the pieces of each color and the move separator (from 1 to 11). A word in a chess text is a matrix unit. The first coordinate of it is unique and is the number of the cell on the chessboard (from 1 to 64). The second coordinate of the word is from the dictionary. The chess matrix text at any point in the game is the sum of the matrix units, each of which shows a piece at the corresponding place on the chessboard.Repetitions in the text appear both because of the duplication of figures, and because of the constant transitions during the game from similarity to sameness and vice versa for all figures except the king. The game consists in the implementation of the most effective such transitions and the actual classification of shapes. Pawns that are the same at first then become similar only by the rule of the move, and sometimes the pawn becomes the same as the queen.

A tool for analyzing matrix texts is the transitivity control to check the difference between similarity and sameness. The lack of transitivity control is an algebraic explication of a misunderstanding for language texts, a loss in chess, or errors in numerical calculations.

Transitivity of relations is a condition for turning a set of objects into a mathematical category. The semantic markup of the text can be the calculation of its categories by means of transitive closure. The objects of the category are the contexts of matrix words [2], morphisms are the transformation matrices of these contexts.

Context

The context of the word E_k,j of the matrix text [2] is its fragment F^j_i,k – he sum of matrix units (words) between two matrix words-repetitions E_i,j and E_k,j:

$F^j_{i,k}=E_{i+1,D_R}+E_{i+2,D_R}+\ldots +E_{k-1,D_R}, \ \ \ \ \ \ \ \ \ \ \ (1)$

where the index D R means that any index from the right dictionary D R of the matrix text [2] can stand in this place, including the characters of the text-forming fragments. The context is all the words of the matrix text between the repeated characters of the dictionary D R . For example, between repeated words, repeated dots, signs of paragraphs, chapters, volumes of language texts or phrases, periods, and parts of musical works.

The signs of text-forming fragments look the same, but they are also homonymous signs-their context is fragments (1). The context of a language fragment (explication or explanation) can be not only a language text, but also a sound (for example, music), figurative (photo) or joint (video). The context of a musical text can be a language text (for example, a libretto).

Matrix words correspond to their matrix contexts, represented as algebraic objects (1). All possible relations between these objects are the subject of analysis when determining the meaning of words. For the study of such constructions, category theory is useful because it is based on the concept of transitivity.

Context category

Let F₁^j , ..., F_n^j – these are all contexts F^j_i,k words E_j,j ∈ D_R in text P, while D^j_1R, ..., D^j_nR– right dictionaries of these contexts:

$F_1^j=F_1^jD^j_{1R}, \ldots ,F_n^j=F_n^jD^j_{nR}$

By k = i + 1 in (1) a special case of a fragment is a matrix word E_i+1,DR .

Context category Cat(E_j,j) text sign E_j,j ∈ D_R defined as follows:

Category objects – pairwise multiple [2] contexts F₁^j , ..., F_n^j.
For each pair of multiple objects, there is [2] a set of morphisms F_ij : F_i = F_ijF_j,, each morphism corresponds to the singular F_i and F_j .
For a pair of morphisms F_ij and F_jk such a composition of them is defined (the product of square matrices) F_ijF_jk, that if F_i = F_ijF_j и F_j = F_jkF_k, thenF_i = F_ijF_jkF_k (transitivity condition).
For each object F_i the identity morphism is defined as the unit matrix E: F_i = EF_iE. The category associativity follows from the associativity of matrix multiplication.

Context reduction

The intersection (in general words) of matrix dictionaries is their product:

$\prod D_i \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (2)$

The proof follows from the defining property of matrix units (6) [2] and the definition of dictionaries (9) [2] and (15) [2]. When multiplying the matrix units of dictionaries (the subscripts are the same in each unit), the product of their matrix words (units) with different indexes is zero. In the product (2), only common words with matching lower indices from all the factors (2) will remain.

The union of any pair of dictionaries D_i and D_j is their sum minus the intersection (2):

$D_i+D_j - D_iD_j \ \ \ \ \ \ \ \ \ \ \ \ \ (3)$

Because of the properties (10) [2] in (3) in the sum D_i + D_j removed repetitions of matrix units.

The minimal dictionary of a matrix text fragment is called such a dictionary D_R text P, that D_R and P mutually multiple:

$\begin{gathered} \exists F_{PD_R} : P =F_{PD_R}D_R, \\ \exists F_{D_RP} : D_R=F_{D_RP}P \end{gathered}$

For mutually multiples of P and D_R non-zero matrices F_PDR and F_DRP exist.

Sums of matrix units F_PDR and F_DRP exist if the matrix units are P and D_R they contain the same number of second indexes (coordinates) and do not contain any other second indexes.

The concept of a minimal dictionary is introduced due to the fact that the properties of matrix units always hold:

$PD_R=P(D_R+D_{1R})$

where D_1R it can consist of words (matrix units) that are missing (those very others) in D_R . For example, for F₁^j = F₁^jD_1R , ..., F_n^j = F_n^j D_nR always running:

$F_{1}^j= F_{1}^j(D_{1R}+ \ldots + D_{nR}), \ldots , F_n^j= F_n^j(D_{1R}+ \ldots + D_{nR})$

Minimum dictionaries D_minR1 , ..., D_minRn fragments F₁^j , ..., F_n^j do not contain matrix words (second indexes of matrix units) that are not present in the corresponding text fragment.

Context equivalence classes are defined by common minimal right-hand dictionaries D_minR. If a pair of contexts has a minimal common dictionary, then these contexts are mutually multiple. Hence, there are their mutual transformations (matrices).

If the contexts are F₁^j , ..., F_n^j Words signs E_j,j have a minimal common right dictionary D_R, then they are multiples of each other. In the future, the dictionaries of text fragments mean their minimal dictionaries.

If the specified contexts are F₁^j , ..., F_n^j multiply on the right by such a dictionary D^j_R, that each resulting context will have the right dictionary (minimal) D^j_R, then they are called reduced contexts:

$F_1^j D_R^j, \ldots , F_n^jD_R^j \ \ \ \ \ \ \ \ \ \ \ \ \ \ (4)$

When reducing (multiplying on the right) the part of the matrix units with the second indices, which are not in the D^j_R deleted in each of the F₁^j , ..., F_n^j. If at least one of the dictionary indexes is missing in some of the received fragments, then it should not fall into (4).

Categorization

Contexts with common dictionaries, for example, after the reduction (4) of the sign word E_j,j, are objects of the sign category Cat(E_j,j). All matrix texts (4) by construction are multiples of each other by (20) [2], have a common (and minimal) dictionary, therefore, there are always transformation matrices F^j_1,k as morphisms of the sign category Cat(E_j,j):

$F_1^jD_R^j= F_{1,k}^jF_k^jD_R^j \ \ \ \ \ \ \ \ \ \ \ \ (5)$

Relations (5) are the smallest transitive relations on the set F₁^j , ..., F_n^j and are the transitive closure of this set due to the fact that from the contexts F₁^j , ..., F_n^j operation (4) removes all matrix words that are not present in the general dictionary D^j_R.

The remaining categorical axioms are fulfilled due to the properties of square matrices of the same dimension.

The transitive closure (5) can be defined for any subset (m < n)

$F_1^j, \ldots, F_m^j \subset F_1^j, \ldots, F_n^j, \ \ \ \ \ \ \ \ \ \ \ \ \ \ \ (6)$

setting for F₁^j , ..., F_m^j by (2) their general vocabulary D^j_mR ⊇ D^j_R (D^j_R is a subset of D^j_mR by properties (2)). In this case, the transitive closure (5) is performed by the dictionary D^j_mR:

$F_1^jD_{mR}^j=F_{1,k}^jF_k^jD_{mR}^j. \ \ \ \ \ \ \ \ \ \ \ \ (7)$

Example

As an example of a matrix text, (5) [2] is used, in which there are four identical signs of the word «set» E_1,1, E_5,1, E_10,1, E_14,1. These four signs, in turn, have four contexts F¹_1,5 , F¹_5,10, F¹_10,14, F¹_14,17:

$\begin{gathered} F_{1,5}^{1}\equiv F_{1}^{1}+E_{2,2}+E_{3,3}+E_{4,4}=F_{1,1}D_{1}^{1} \\ F_{5,10}^{1}\equiv F_{2}^{1}+E_{6,3}+E_{7,7}+E_{8,8}+E_{9,2}=F_{2}^{1}D_{2}^{1}\\ F_{10,14}^{1}\equiv F_{3}^{1}=E_{11,3}+E_{12,12}+E_{13,4}=F_{3}^{1}D_{3}^{1}\\ F_{14,17}^{1}\equiv F_{4}^{1}=E_{15,3}+E_{16,16}+E_{17,17}=F_{4}^{1}D_{4}^{1}\\ D_1^1 = E_{2,2}+E_{3,3}+E_{4,4}, \\ D_2^1 = E_{2,2}+E_{3,3}+E_{7,7}+E_{8,8}, \\ D_3^1 = E_{3,3}+E_{4,4}+E_{12,12}, \\ D_4^1 = E_{3,3}+E_{16,16}+E_{7,7}, \end{gathered} \ \ \ \ \ \ \ \ \ \ (8)$

where D¹₁ , D¹₂ , D¹₃ , D¹₄ – these are dictionaries of the corresponding contexts, in the latter context F¹_14,17 the second index is not equal to the number of the last repetition of the sign that is missing in the text dictionary, but to the number of the last word in the text in order to determine the end of the context.

The problem statement is the calculation of the similarity and difference of words E_1,1, E_5,1, E_10,1, E_14,1 depending on the similarity and difference in some measure (modulus) of their contexts F¹_1,5 , F¹_5,10, F¹_10,14, F¹_14,17. The similarity of contexts is determined by the presence of common dictionaries, which are used as a module for comparing contexts. The difference is determined by the context deductions for the same module. Deductions will define their equivalence classes (deduction classes) and deduction categories, since transitivity closure can also occur for them.

A general dictionary of four contexts F¹_1,5 , F¹_5,10, F¹_10,14, F¹_14,17 according to (2):

$D_R^1=D_1^1D_2^1D_3^1 D_4^1= E_{3,3} \ \ \ \ \ \ \ \ \ \ \ (9)$

Transitive closure (4) on the general dictionary-module leads to the removal of "extra" words:

$\begin{gathered} F_{1}^{1}\rightarrow F_{1}^{1}D_{R}^{1}=E_{3,3},\\ F_2^1\rightarrow F_2^1D_{R}^{1}=E_{6,3},\\ F_{3}^1\rightarrow F_{3}^{1}D_{R}^{1}=E_{11,3}, \\ F_{4}^{1}\rightarrow F_{4}^{1}D_{R}^{1}=E_{15,3} \end{gathered} \ \ \ \ \ \ \ \ \ \ \ \ \ \ (10)$

Thus, reduced (abbreviated) contexts of the sign-word E_1,1 («set») are four words E_3,3, E_6,3, E_11,3 and E_15,3. These words have the same sign E_3,3 («object») in the combined software (3) dictionary for D¹₁ , D¹₂ , D¹₃ , D¹₄:

$\begin{gathered} D_1^1+D_2^1-D_1^1D_2^1 = E_{2,2}+E_{3,3}+E_{4,4}+E_{7,7}+E_{8,8} \\ \left(E_{2,2}+E_{3,3}+E_{4,4}+E_{7,7}+E_{8,8} \right) + \\ + D_3^1 - \left(E_{2,2}+E_{3,3}+E_{4,4}+E_{7,7}+E_{8,8} \right) D_3^1= \\ = E_{2,2}+E_{3,3}+E_{4,4}+E_{7,7}+E_{8,8}+E_{12,12} \\ \left(E_{2,2}+E_{3,3}+E_{4,4}+E_{7,7}+E_{8,8}+E_{12,12} \right) + \\ + D_4^1 - \left(E_{2,2}+E_{3,3}+E_{4,4}+E_{7,7}+E_{8,8}+E_{12,12} \right)D_4^1=\\ = E_{2,2}+E_{3,3}+E_{4,4}+E_{7,7}+E_{8,8}+E_{12,12}+E_{16,16}, \end{gathered}$

where each formula is a sequentially pairwise union of dictionaries (3).

Words E_1,1, E_5,1, E_10,1, E_14,1 in the sense of their reduced (reduced) contexts E_3,3, E_6,3, E_11,3 and E_15,3 they can be the same or different. Setting the comparison measure E_3,3, E_6,3, E_11,3 and E_15,3 defines the result of the comparison E_1,1, E_5,1, E_10,1, E_14,1. In the simplest case, if the values are assumed to be the same E_3,3, E_6,3, E_11,3 and E_15,3, then they will be the same and E_1,1, E_5,1, E_10,1, E_14,1. This is the case, for example, when words are understood only as signs-letters in the dictionary-alphabet, and their context dependence is absent.

To solve the problem of comparing the meaning of words, it is useful to calculate the corresponding category of signs of these words. Sign Cat(E_3,3) consists of four reduced context objects (10).

$F_1^1 \sim E_{3,3}, F_2^1\sim E_{6,3}, F_3^1 \sim E_{11,3}, F_4^1\sim E_{15,3} \ \ \ \ \ \ \ \ \ \ \ \ (11)$

Morphisms Cat(E_1,1) are the four matrices E_6,3, E_11,6, E_11,3 и E_15,3:

$F_2^1 = E_{6,3}F_1^1, F_3^1 = E_{11,6}F_2^1, F_4^1 = E_{15,11}F_3^1, F_4^1 = E_{15,3}F_1^1 \ \ \ \ \ \ \ \ \ (12)$

The composition of morphisms is the relation:

$E_{15,11}E_{11,6}E_{6,3}=E_{15,3}. \ \ \ \ \ \ \ \ \ \ \ \ \ \ (13)$

The composition (13) is an expression of the interval markup of the word E_3,3 (45) [2] in the language of category theory, and reduction (10) - is an example of solving a system of comparisons modulo F_m (39) [2]. The usefulness of using category theory is that its approach is more general and allows you to use methods from different sections of algebra.

So all four pieces of text are F¹_1,5 , F¹_5,10, F¹_10,14, F¹_14,17 are the same (equivalent) in the sense of the sign-word E_3,3 (comparable in modulus E_3,3). There are matrix-morphisms E_15,11, E_11,6, E_6,3, E_15,3, converting these texts according to (12) into each other. By analogy with the library catalog, all four texts are F¹_1,5 , F¹_5,10, F¹_10,14, F¹_14,17 (objects of the sign category Cat(E_3,3)) they are in the same catalog box with the name of the sign E_3,3. This is an example of a rough classification of texts by keywords. The contextual meaning of words is not taken into account, all such words as signs are the same, and all cases of their appearance in the text can be added to calculate the significance of keywords by frequency of use.

The resulting result means that, in the first approximation, all four words «set» are contextually related to the word «object». The words «set» E_1,1, E_5,1, E_10,1, E_14,1 can be the same or differ as much as their reduced (reduced) contexts are the same or different E_3,3, E_6,3, E_11,3 и E_15,3.

In [2] it was shown that modulo comparisons are performed for matrix texts. The remainder of the division of fragments of matrix texts into other fragments (modules) can have residues (deductions), which, like modules, are classifying features.

A sign of the divisibility (multiplicity ⋮ ) of fragments of matrix texts is the divisibility (multiplicity) of their right dictionaries (20) [2]. The remainder of the division of dictionaries (subtractions of dictionaries) of fragments are the dictionaries of the remainder of the division of these fragments.

To calculate the similarities and differences of words E_3,3, E_6,3, E_11,3 and E_15,3 you need to compare the contexts F¹_1,5 , F¹_5,10, F¹_10,14, F¹_14,17 by module E_3,3.

Then the deductions of each context modulo E_3,3 equal to:

$\begin{gathered} \textrm{res} (F_1^1) \sim \textrm{res} D_1^1 = \\ = E_{2,2}+E_{3,3}+E_{4,4} - (E_{2,2}+E_{3,3}+E_{4,4}) E_{3,3}= \\ = E_{2,2}+E_{4,4} \\ \textrm{res}(F_2^1)\sim \textrm{res} D_2^1 = \\ = E_{2,2}+E_{3,3}+E_{7,7}+E_{8,8} - (E_{2,2}+E_{3,3}+E_{7,7}+E_{8,8}) E_{3,3}= \\ =E_{2,2}+E_{7,7}+E_{8,8} \\ \textrm{res}(F_3^1)\sim \textrm{res} D_3^1 = \\ = E_{3,3}+E_{4,4}+E_{12,12} - (E_{3,3}+E_{4,4}+E_{12,12}) E_{3,3}= \\ =E_{4,4}+E_{12,12} \\ \textrm{res}(F_4^1)\sim \textrm{res} D_4^1 = \\ = E_{3,3}+E_{16,16}+E_{7,7} - (E_{3,3}+E_{16,16}+E_{7,7}) E_{3,3} = \\ = E_{16,16}+E_{7,7} \end{gathered} \ \ \ \ \ \ \ \ (14)$

It follows from (14) that all F¹_1,5 , F¹_5,10, F¹_10,14, F¹_14,17 (hence, the words «set» E_1,1, E_5,1, E_10,1, E_14,1) incomparable in modulus E_3,3. The deductions are not pairwise multiples and do not form any class of deductions pairwise. This means that all the words E_1,1, E_5,1, E_10,1, E_14,1 they are different in meaning (context).

The similarity is found in the next step (for deductions), if for pairs of deductions we calculate by (2) the general dictionaries and reduce (4). The general dictionary for all deductions D^j_res does not exist:

$\left(E_{2,2}+E_{4,4}\right) \left(E_{2,2}+E_{7,7}+E_{8,8}\right) \left(E_{4,4}+E_{12,12}\right) \left(E_{16,16}+E_{7,7}\right) = 0 \ \ \ \ \ (15)$

Equality (15) is the reason for the absence of a general class of deductions and a corresponding category Cat_res(E_3,3). But some pairs of deductions (14) have common dictionaries:

$\begin{gathered} (E_{2,2}+E_{4,4})(E_{2,2}+E_{7,7}+E_{8,8}) = E_{2,2},\\ (E_{2,2}+E_{4,4})(E_{4,4}+E_{12,12})= E_{4,4},\\ (E_{2,2}+E_{7,7}+E_{8,8})(E_{16,16}+E_{7,7}) =E_{7,7}. \end{gathered}$

Then these pairs of deductions after reducing (4) form classes and categories of deductions with names E_2,2, E_4,4 and E_7,7. To a folder named E_2,2 fragments will get there F¹₁ and F¹₂, in directory with the name E_4,4 - fragments F¹₁ and F¹₃, to a folder named E_7,7 – fragments F¹₂ and F¹₄.

Word E_8,8 it is an annuler (zero divisor) of three deductions (14)

$E_{8,8}\textrm{res}(F_1^1) = E_{8,8}\textrm{res}(F_3^1) = E_{8,8}\textrm{res}(F_4^1) = 0 \ \ \ \ \ \ \ \ \ \ (16)$

Word E_12,12 – annuler

$E_{12,12}\textrm{res}(F_1^1) = E_{12,12}\textrm{res}(F_2^1) = E_{12,12}\textrm{res}(F_4^1) = 0 \ \ \ \ \ \ \ \ \ (17)$

Word E_16,16 – annuler

$E_{16,16}\textrm{res}(F_1^1) = E_{16,16}\textrm{res}(F_2^1) = E_{16,16}\textrm{res}(F_3^1) = 0 \ \ \ \ \ \ \ \ \ \ \ \ \ (18)$

These are words of the matrix text that have no context (the last three terms in the context dictionary (49) [2]) – when multiplying a deduction by an annuler, the product is different from zero if the deduction contains this annuler.

So, the problem statement of the given example was the calculation of the similarity and difference of words E_1,1, E_5,1, E_10,1, E_14,1 depending on the similarity and difference of their contexts F¹_1,5 , F¹_5,10, F¹_10,14, F¹_14,17 by some measure (modulus).

Solution received: words E_1,1, E_5,1, E_10,1, E_14,1 (as their contexts are F¹_1,5 , F¹_5,10, F¹_10,14, F¹_14,17) comparable in modulus E_3,3 and are not comparable (different) in modules E_8,8, E_12,12, E_16,16.

This means that the reduction (10) should not be performed according to the general dictionary (9), which consists of a single sign word E_3,3. As it turned out, this word-sign has a different meaning in different places of the text. Taking into account (16), (17), (18):

$\begin{gathered} F_1^1 \rightarrow F_1^1D_R^1 = E_{3,3} \\ F_2^1 \rightarrow F_2^1(D_R^1 + E_{8,8}) = E_{6,3}+E_{8,8} \\ F_3^1 \rightarrow F_3^1(D_R^1 +E_{12,12})= E_{11,3}++E_{12,12} \\ F_4^1 \rightarrow F_4^1(D_R^1 + E_{16,16})= E_{15,3}+E_{16,16} \end{gathered} \ \ \ \ \ \ \ \ \ \ (19)$

To the right dictionary D_R (9) [2] text (5) [2] then the extension is required:

$\begin{gathered} E_{3,3} \rightarrow E_{3,3}\\ F_2^1 \rightarrow F_2^1(D_R^1 + E_{8,8}) = E_{6,3}+E_{8,8} \\ F_3^1 \rightarrow F_3^1(D_R^1 +E_{12,12})= E_{11,3}++E_{12,12} \\ F_4^1 \rightarrow F_4^1 \\ (D_R^1 + E_{16,16})= E_{15,3}+E_{16,16} \end{gathered} \ \ \ \ \ \ \ \ \ \ \ \ (20)$

The source dictionary (9) [2] has been converted to the context dictionary (20). To the word signs E_3,3, E_6,3, E_11,3 and E_15,3 added additional words using category calculation E_8,8, E_12,12, E_16,16. With these additional words E_8,8, E_12,12, E_16,16 words E_6,3, E_11,3 and E_15,3 they differ from each other.

The above classification is a categorization of matrix texts by dictionary. When categorizing, classes and their names are calculated as algebraic functions of the text. The categorization was calculated by dictionaries, since the classifying features (category names) were determined by the mutual intersection of dictionaries (2). This categorization does not take into account the order of words in the text, but can be used later in the construction of a more subtle categorization that takes into account the mutual order of words. In this case, the comparison modules are not parts of dictionaries, but fragments of contexts. When replacing dictionary fragments with text fragments, word repetitions may appear in contexts. There is ambiguity in the division (construction of morphisms of the category) [2]. That is why, first, a comparison is made modulo dictionaries, and similarities and differences (divisors and residuals) are determined by this measure. Then, after establishing the similarity and difference of the repeated words in the contexts, the dictionary comparison module is replaced with a text fragment that already takes into account the word order. The category names are the text fragments.

The general method of calculating classifying features gives an analog of CRT for matrix texts.

Chinese Remainder Theorem (CRT)

The Chinese remainder theorem for matrix texts is formulated as follows. Let be given:

D_1R , ..., D_kR pairwise non-multiple minimal dictionaries of matrix text fragments F₁, ..., F_k.
D_R = D_1R + ... + D_kR – right dictionary of some text P.
D'_R = D'_1R +. . . + D'_mR – right dictionary of some text P', m < k.
P' ⊂ P : D'_R ⊂ D_R (text P' is a part of P in the sense that its dictionary D'_R it is part of the dictionary D_R)
Tuple (r₁ , ... , r_k), where r₁ ≡ P' ( mod D'_1R ), ..., r_k ≡ P' ( mod D'_kR ) (this means that P' = P' D'_1R+r₁, ..., P'= P'D'_1R+r_k).

Then there is a one-to-one correspondence:

$P'\longleftrightarrow (r_1, \ldots, r_k) \ \ \ \ \ \ \ \ \ (21)$

It is proved by induction using the definition of the multiplicity of the polynomials of matrix units and the minimality of the dictionary.

Deduction tuple (r₁ , ..., r_k ) it is a classifying feature of all possible multiples of each other texts that have a dictionary D'_R or any part of it. It is according to (21) that classifiers of language and other sign sequences should be constructed.

References

A.B. Petrovsky. Theory of countable sets and multisets. M. Nauka, 2018.
S. B. Pshenichnikov. Algebra of text. Researchgate Preprint, 2021.
S. B. Pshenichnikov. Computer game "Double chess". certificate of state registration of the computer program. 4.12.1992 No 920129.

Теги:

Хабы:

Context category

Similarity and sameness

Context

Context category

Context reduction

Categorization

Example

Chinese Remainder Theorem (CRT)

References

Публикации

Истории

Ближайшие события