KNOWLEDGE MANAGEMENT APPLICATIONS BASED ON USER ACTIVITIES FEEDBACK

This research is devoted to the problems of searching information in knowledge management system of some company using keywords and categories list. Authors analyzed different approaches to data storage in management systems and several existing KM systems were considered. The necessity of knowledge presenting in simple and understandable form was mentioned, but also the structuring system requirements and self-learning ability was underscored. Authors suggested using request history with categories and keywords analytics to realize the system ability to change keywords (and categories) weight coefficients. It allows the system to be self-learning using user activity results. Associative rules search task was implemented as well. This analysis makes it possible to find keywords bonds inside user requests content, and helps to increase the time of requested knowledge extracting. Besides, the analysis of the user requests statistics helps to determine individuals, who are in charge for filling the system with new materials, which can be helpful for motivational goals.


INTRODUCTION
Knowledge bases (KB) and database management systems (DBMS), which were built using KB, are already widely used in corporate information systems (СIS) as intelligent components.Besides, they are also used in many decision support systems (DSS) [1,2].However, they are mainly used for certain functional management tasks, which are solved in the CIS or DSS.Their knowledge bases use classical knowledge representation models: frames, semantic networks, production rules, etc. [3].At the same time, there are some certain procedures, regulations, rules and business processes based on them.So, corporations themselves and other objects of economic activity operate according to this business processes which refer to the so-called procedural knowledge.
Unlike classical formal knowledge representation models, procedural knowledge is presented verbally in the form of different types of diagrams or schemes (SADT, DFD, UML, etc.).However, they cannot describe all possible situations that can take place during the control object life cycle.Therefore, there is an actual need to create knowledge management systems that accumulate experience for future decision support in certain situations.If not, we can face the lack of certain information and make some wrong decisions, sometimes vital for a company.This area of the CIS intellectualization is called Knowledge Management (KM).It should provide possibilities of the search and wide use of employee experience, providing relevant information among interrelated individuals.Such systems apparently will use some technical solutions [4] and organizational measures also can take place.These approaches can provide the stable behavior of an intellectual dynamic system, despite the content of such a system [5,6].
Thus, knowledge management is the process of identification, storage and effective use of information and knowledge in the team of the organization employees.The main purpose of the CIS knowledge management subsystem is to quickly provide the necessary knowledge to the employees (or employee groups) who need it.The result of a successful KM system should be the existence of a procedural knowledge base that can modify and improve itself while learning [7].

TOPICALITY
Several recent years we can observe creation of informational portals or other information storing systems in many companies.These systems use the word "knowledge" in their names.Information in them is presented in the form of recommendations, problem solving options, certain circumstances description, etc.For example, we can have database connection error in the application and the list of solutions, which covers the same issues resolving.In such cases, request to the knowledge base may bring the necessary technological solutions (tools), which can be valuable for practical application.Equally important in the knowledge management are organizational tools, that is, the organizational structure of KM.Its functions include the development, monitoring and improvement of business processes, organizational structures and technologies of corporate information processing.Now knowledge is more and more required even in highly standardized processes, such as order processing.It means giving more flexibility, due to the desire to find an individual approach for each unique customer requests.
Thus, to improve business processes, we need not only the analysis of functions flow, data, organizational structure and management flow, but also the effective usage of accumulated knowledge.Moreover, besides documented ones, we should also process knowledge, which presents hidden level of personal unscripted experience.
Since procedural knowledge in the KM system is presented mainly verbally, there is the task of their identifying and presenting in a certain form.One of the approaches to detect hidden knowledge in text data is the Text mining methods, including text parsing [8].But these methods cannot be considered as effective ones for management of unformalized knowledge.It is caused by the fact that creation of formalized knowledge base in this way needs to be transformed into text before being given to the user.Therefore, while creating such type of KM systems after the previous clearing of noise (lexical, syntactic, and semantic) verbally represented knowledge is stored as elements of the selected data structure.Then, the convenient system for knowledge search with self-studying ability is created.The last one can be implemented using feedback tools.The authors used this approach in [9].In this article, the created KM system is presented in more details.But firstly, let us introduce a brief description of existing systems.
Coredge's Logik package is quite easy in use of KM tool that gives some order to unstructured data and organize it into logical libraries.Logik retrieves groups of keywords or topics from documents automatically and organizes them without the end user participation.The process is characterized by high dynamics and flexibility.Coredge has an opinion that the classification rules development, based on which the parent and child relationships between documents are formed, should primarily be the program's task, but not the person's one.However, test center engineers agree that this can be justified in a dynamic environment [10].
Lotus's Discovery Server package completely automates knowledge and documentation management and is the most comprehensive solution among all the considered.The program tracks all the information needed for daily work of users.Data from many different sources such as e-mail, file Servers, databases and teamwork platforms are stored automatically without any actions of the user (IBM calls these data "traffic signs") [11,12].
The Smartdiscovery package of Inxight Company is a multi-level server that categorizes unstructured text and can import any classification system, written in XML format [13].Smartdiscovery Server includes a keyword extraction program that instantly detects certain types of phrases in the documents.However, the rules that the analyzer follows while identifying phrases are not included in any system of classification and that is why they cannot be changed and saved.
The Open Text Company Livelink package includes several knowledge management tools and the teamwork platform.According to this system, managing content, documents and records are the most important knowledge management functions.Livelink also combines the functions of document circulation and business processes automation by integrating the applications and corporate management systems of other developers [14].The search results can be extended by an explicit user assessment to help them select the documents more precisely.
There is well-known methodology of business process management, named ARIS [15] envisage.In addition to automatizing the business processes design, it provides the subsystem of knowledge management.The advantage of ARIS is the availability of advanced tools for constructing different kinds of models (functional, procedural, organizational, data model, etc.), as well as creating a knowledge base of these business process models usage [16].This is also a certain disadvantage, because it cannot be created separately.
Therefore, as we can see, there is a task, which is connected with problem of optimization of the information search process in company's KM system.It also appears to be actual in case, when this process is built on feedback with the results of analysis of the user activity.Knowledge management system developing aims to systematize and use the knowledge, which was accumulated in a particular organization to improve the efficiency of the activities of individual employees and departments and, as a consequence, the whole company.

KM APPLICATION DESIGN
"Initial" knowledge in the developed KM system is created by experts in the subject area or this type of activity.Current KM system includes the set of entities and dependencies between them.The entity "Topic" includes materials (another system entity) for solving some typical problem and has keywords (also an entity) to describe the entire group of materials.These keywords can also be included to other topics, which are considered as alternative recommendations, or problems of the same type.Consequently, we have 1:N join type between the entities "Topic" and "Material".Topics are described by keywords, with the relation type M:K, since one topic can be described by many keywords and one keyword can relate to several topics.Also, the system uses the "Category" entity, which includes several topics.On the other hand, the "Topic" can be categorized into several categories.That is, between entities "Category" and "Topic" we also have the type of connection M:K.Other entities of the KM system and their relationship (the "entityrelation diagram"), presented in 3 rd normal form [17], will be given soon.
The usage of feedback with KM system users allows us to solve the list of issues.For the beginning, the user queries analysis makes it possible to receive categories, questions and keywords of the materials topics.As a result, the initial description of knowledge from the expert can be adjusted according to the user's view, who is searching for this data.Such an operation improves the quality of the topics and categories classifiers, and as a result, minimizes the time of the required knowledge search.The estimation of several approaches, which are aimed at using statistics of knowledge querying as user feedback, is important as much as expert opinion.Using the history of knowledge changing operations, and, more important, operations that select knowledge while accomplishing daily tasks allows us to estimate both the knowledge itself, and the activities of staff who want to increase the efficiency of the KM system itself.
Below is a list of structural and functional models of the KM system by the SADT methodology in the form of IDEF0-diagrams [18].In Fig. 1 the context diagram is given (level A-0).In this and several other following figures, by the term "knowledge" authors meant textual or media data, that contains information about the ways of solving some problems or accomplishing work tasks.
Let us have a better look at the lists of incoming and outgoing messages, managing controls, and the resources, which are used by the system to implement the function of "ordering, classifying and using the knowledge of the organization to improve the activities efficiency".
Input can be divided into two main categories.Firstly, it is new knowledge that is relevant to optimizing the organization's current tasks.While solving the task of knowledge storage and use the mechanisms of effective introduction of new information in the database should be implemented in any case.Secondly, as an incoming message can be used a request to receive information, instructions, or explanations for the current employee tasks coming from the users of the information system.
The main output stream is knowledge about the optimal ways of performing one or another daily task of company employees.This stream is generated by the subsystem of analysis of accumulated data.By the intelligence level, this subsystem is "information retrieval", which means it is realized on the basis of a predetermined set of static queries to the database.In addition, an auxiliary output stream is allocated, which has to provide statistics of user queries that have been accumulated during the system usage.This flow is not the main one because the system has its own statistical analysis module, but accumulated statistics also need to be able to be downloaded into external data sources for further extended analysis by the third party software.
Next, in Fig. 2, a decomposition of the first level has been presented.Let us consider the main tasks of the system that need to be solved to achieve the goal of previous A0 level.Guided by the principle of complexity limitation, the decomposition of the system's main function "ordering, classifying and using the knowledge of the organization to improve the activities efficiency" is realized by allocating of four main functions: filling the database with new knowledge, storing of knowledge, getting the necessary and correct knowledge at a certain time and updating (correcting) existing knowledge.While functional modeling of knowledge management system was being done, no stream tunneling on any level of modelling have been used.So, the general list of external inputs and output messages, control influences and resources remained unchanged on all modelling levels.However, internal input and output streams were added.Besides, each of the functions uses its own list of external controls and resources.
The task of filling the database with new knowledge has new knowledge about tasks performance optimization as an input flow.After all, the input stream is extended with the output stream of the knowledge correction module.This trick plays the role of a feedback, which gives the system the ability of self-learning while knowledge is being used that has been stored previously.At the same time, it uses the company's employees, who are all the knowledge carriers, and hardware and software tools needed for data entry as resources.In the process of data entry, the subsystem is guided by the rules of the knowledge database filling.As the output stream, we receive the list of knowledge that is used by the next functional block as an input stream.
The functional block of knowledge storing contains knowledge coming from the block of filling the database as an input.These data are already structured and classified according to approved classifiers and roles of data representation.The task of storage uses hardware/software tools (resources) and is carried according to the forms of knowledge storage.The output stream is the knowledge itself that is used by the knowledge extraction and knowledge correction/optimization modules.
The block of extraction of the necessary knowledge performs searching and filtering functions.The input streams are the knowledge coming from the knowledge storage functional module and the request for instructions and explanations (which is an external input for the whole management system).As resources, it uses employees as customers of work executing instructions and hardware/software of data filtering and presenting.The control influences are the knowledge search algorithms.It also has two output streams.The first is knowledge about the current organization tasks optimal performance.It is passed out as the outflow of the system as a whole.The second forms statistics history of the searching and using of knowledge, which has been presented in the system previously.It is transmitted as an output too, but it is also used in the functional module of knowledge correction and updating as an input.
Correction and updating of knowledge requires the use of all available resources and mechanisms (Fig. 3) and is controlled by the conditions of correction of the existing knowledge.Input stream is the knowledge that comes from the storage module and the knowledge usage statistics provided by the module of the necessary knowledge extracting.The output stream is a change in the classification and search keywords for existing knowledge, which purpose is to optimize the process of the necessary information searching.This stream, in its turn, is the input for the data storage module.
Since the first level decomposition does not provide a proper representation of the system, let us consider the decomposition of each of the functional modules.It is done in order to complete the final detailed description of the system functional modules features.Fig. 3 shows the decomposition of the module "filling the database with new knowledge".The incoming message has new knowledge about optimization of tasks execution (which is input for the management system as a whole).The module is controlled by the rules of filling the database with new knowledge and uses the software/hardware and employees (as knowledge carriers) as resources.At this stage, it is necessary to provide the input information in proper form, according to the requirements for its future storage form.At the exit, we have a document-candidate that falls into the module of reliability estimation.
The module of reliability estimation aims at verifying the document-candidate for adequacy according to the criteria of reliability in a given subject area.Reliability can be verified by an expert or by the management system users through the evaluation tool utility.The first option is more preferable, since the users' access to untrusted knowledge can be denied to prevent making wrong decisions, with putting this document in queue before the relevant experts will confirm its status.However, this requires the presence of the same experts that provide their services to the system and makes maintenance more expensive.Consequently, this approach can be afforded only by big organizations with a large number of staff and big budget.Option to apply user estimation subsystem is cheaper and does not require additional resources, but it is more vulnerable to mistakes and possibilities of using the untrusted knowledge that may in some cases lead to unpredictable consequences.
Once the status of the document-candidate has been confirmed, it gets into the classification module.Here tested knowledge is classified according to belonging to one or another subject area.The input stream is an evaluated knowledge coming from the validation module.The resources are hardware/software and company employees as knowledge carriers.The process of classification is controlled by the rules of filling the knowledge database and the knowledge classifier.The last is formed by the output flow of the knowledge classifier development and support module.Classification can be performed both by the knowledge carrier and by an expert.The last option is, as in the previous case, more preferable, but again, requires the presence of experts, supporting the management system.
The classification process involves some classifiers usage, on which its' execution is based.
The classifier can be pre-designed and approved at the software implementation stage.However, there is always a need to expand or modify the list of elements and classifier hierarchy.Functional module "development/support of the knowledge classifier" has a request for the classifier to expand (modify) from the classification module of the documentscandidates as an input stream.It uses the resources of the hardware/software tools and company employees.The last, as in two previous cases, may be experts or ordinary knowledge carriers.The presence of an expert in this step is most preferable, since otherwise the classifier may be contaminated by a large number of trash data, which in its turn leads to complexity during the work with the modules of classification, filtering and statistical data collecting.
The last of the database filling functional modules is the defining of additional characteristics.At the input classified and evaluated knowledge comes, presented in a certain formalized format.Software/hardware and staff (as knowledge carriers) are used.Company employees (or in conjunction with experts) according to the rules of filling the knowledge database, define the keywords and questions that further characterize the knowledge and can be used for effective search.The output stream is ready-to-use knowledge, which are the input data for the module of knowledge storage.
The decomposition of the storage module is shown in Fig. 4. We have 3 functional modulesentry in the database, updating existing data and removing out-of-date information.All modules work in parallel, independently of each other, and have input, output, control and resource flows according to the parent "knowledge storage" module.The following is the task of query statistics collecting.The input consists of the output streams of the filtering (by category, by keywords, and by questions) modules.All these data are accumulated in the statistical block and forms the outflow, which is named "statistics of knowledge use".This flow falls to the input of the module of knowledge correction and updating.
In Fig. 6 the decomposition of the correcting and updating knowledge module is shown.

Figure 6 -Decomposition of module of correcting and updating knowledge
As a decomposition result, we get 6 functional modules, among which are: the accumulation of search statistics; the accumulation of knowledge usage statistics; determining whether changes in categories or keywords are needed; defining and changing the knowledge usefulness degree; efficiency evaluation; assessment of the recommendations correctness.
The accumulation of search statistics includes the history of using filters of all kinds that preceded the use of this or that knowledge.Input -the statistics of the knowledge use that is coming from the module of receiving the necessary knowledge.Only software\hardware (human resources are not involved) are used as resources.The output stream is fed to the input of module that determines whether the changes to categories or keywords need to be made.This need may be caused by the search for relevant information using a set of keywords, or in a category that does not match to the initial description of the resource.The output flow from the module of determining the need to change categories submitted to the input of the knowledge storage module.It is done for the implementation of appropriate changes in the document options and attributes.
Similarly, the next pair of modules works.They are the knowledge usage statistics accumulation module and the knowledge usefulness degree determining module.The first of them receives statistics of the resource usage frequency at the input.The following, according to the data analysis results, generates a decision of the need to change the usefulness level and submits the result to the input of the storage module.Operation is carried out automatically, without the use of staff resources.The following modules are responsible for changing the assessment of effectiveness and correctness of the recommendations.Both criteria are manually generated by polling the system users.The first one is determined by the task executor, according to the results of use, in the form of a review.The second is performed by the knowledge owner, without demands of practical use, in the form of an expert assessment.Output data is provided to the input of data modification module.
Based on already given functional models, the data-logical model is presented (Fig. 7).It is represented using commonly known [17] Entity-Relation Diagram (ERD).In the presented ERD of the Knowledge Management system, there are three different parts, one for each of system general modules -accounting of knowledge, access rights control and storing statistics of user activity.
While presenting knowledge management system ERD, it was mentioned about three separate system modules, corresponding to the major subsystems -"accounting of knowledge", "access rights control" and "storing statistics of user activity" accordingly.The first one and the last one are the most interesting in context of the current research goals.The "accounting of knowledge" module includes the next list of entities: theme; category; materials; keywords; attachments; theme keywords; theme categories; category keywords; users (authors of themes and materials).As for the "storing statistics of user activity" module, it has categories, sessions, users, keywords, query keywords, requests, request materials and materials.It is quite important fact, that we have a connection between material themes and relevant categories with additional indication of the value of the priority coefficient.This approach allows us to order the list of categories to which the material belongs according to the values of its priority.The similar technique is used while we are filling the keywords list or key phrases list, which are aimed at describing some material or its topic.The keywords and key phrases that describe categories are presented in the same way.In such a way the system can sort any list of materials that was selected and matched to query words in some way, in descending order of sum value of all keywords priority coefficients.But unfortunately it is very common, when in some ways we have incorrect or undefined keywords and their priority values, associated with materials or categories listings.That is why the system needs to fix it.For this reason, it uses separate module that can correct coefficients of keywords and categories priority for the one material or the whole theme.
Next, we will try to separate several simpler tasks from the main one.For the beginning, we need to determine the list of elements for which we can adjust the values of the priority coefficients of categories and keywords.The required data can be reached from "titles", "title_keywords" and "title_categories" entities.The algorithm that extracts all needed data in the form, close to DBMS execution plan, is given in Fig. 8 [9]: The following step is to update the values of priority coefficients that belongs to keywords of materials theme.We do this according to user activity statistics.While implementing this part of task, the additional tables of user activity statistics are attached to the statement.It is "requests", "sessions", "request_materials" and "request_keywords".The algorithm block diagram is given in Fig. 9.

Figure 9 -New values of priority coefficients select statement
If we need to select information about related to some category materials and its priority coefficients in order to change them, we can do it in similar manner.The tables list will be nearly the same, but analytics will be changed from "request" to "session" (Fig. 10) [9].The keywords priority coefficients values correction module (and also topics, theme categories and materials) can work in two different ways.The first mode simply calculates new values of it based on user activity statistics, and then replaces old values with new ones.But the main misfortune of the approach is that we do not take into account old values at all.It is not right, because previous data can be reached from high-level experts and can be very valuable.Besides the statistical sampling can be too small and does not show the real picture.It can lead us to partly wrong conclusions.The second mode, on the contrary, uses the old values of priority coefficients.It uses the moving average algorithm with an additional lever in form of statistics trust level.Depending on the new variable of statistics trust level, we move closer to expert estimation or statistic data.The statistics trust level value can be chosen from 0 to 1, where 0 -full trust to expert and 1 -full trust to statistics.Updating process is shown in Fig. 11.Then, let us define the support value Supp(F), that is the percentage of total transactions where the base set F can be found: Using relational databases, where information is presented in table-mode, we can present uppermentioned set of transactions (nothing common with the term "transaction" in relational databases context meaning) as the table "Transactions (Object, Transaction)".Here "Transaction" is some activity unique identifier and "Object" is an identifier of one of the objects that is the part of the activity [17,22].If this sequence of pairs (Object-Transaction) is too long, we can face some calculation problem.So, in this case, we will not be able to use classic straightforward algorithm.
But we can easily narrow our input sequence, because we are interested only in objects, that can be found in Supp(F) percent of transactions by themselves.So, we can use this heuristic and throw away tons of unnecessary stuff [20].First, we narrow a number of objects that we are interested in, and then we pay attention only to transactions, that include these objects.Having this approach and with a help of SQL temporary tables or table expressions, the next given algorithm was built (Fig. 12): As input for the associative rules search task, that analyses user requests keyword sequence, the data that describes the list of transactions will appear like the one shown in Fig. 13.Thus, the associative rules search task allows us to optimize the algorithm of extracting the necessary knowledge by the KM system.In general, it is given in Fig. 14.
Using the results of associative rules search task, the KM system can predict with some probability what else would the user want to find and what keywords it could be described with.

EVALUATION OF THE RESULTS
The knowledge management system was created and implemented in IT department of retail trading company.While using, it allows us to increase the department staff efficiency.The module of automatic classifier correction and associative rules analysis result makes it possible to work the queries in a "user-way" and reduces the number of search iterations.Fig. 15 shows the dependence of the average number of search iterations on the number of previous hits to the material, required by the user of KM system.As we can see, the main decrease in the search iterations number takes place in the range between 30 and 70 requests and does not decrease significantly afterwards.In Fig. 16, we also see the dependence of requests for the wrong category on the number of previous hits (as a percentage).With 60-70 previous requests, we bring an incorrect requests volume to the level of about 10%, which stands still on this point in future.It should be noted that the given results are obtained for small volumes of materials in the system (several thousand) and users (about 100).Also, the results of the analysis did not take into account that the same users can request the same materials after some time later.

SUMMARY AND CONCLUSION
The research result is the KM system with the module of keywords and categories priority coefficients correction.The advanced mode allows us to tune the statistics trust level value that makes the average algorithm work more smoothly.Also the associative rules search task was accomplished that made it possible to make assumptions about probable keywords, which a user can use to find what he needs.To increase searching speed we can store these results in the database.
Among future development perspectives of the system, mathematical (software) and organizational ones should be considered separately.As for the software of KM system (mathematical supplying), filtration of abnormal values and usage of the other data mining methods are studied.According to the organizational issues, it can motivate the staff to share the knowledge within KM system and change the existing ones according to time demands.The existence of statistical information about how useful the material was or frequency of its usage would be helpful as well.

Figure 1 -Figure 2 -
Figure 1 -IDEF0 for the knowledge management system development.Level A0

Figure 3 -
Figure 3 -Decomposition of "filling the database with new knowledge" module

Figure 5 -
Figure 5 -Decomposition of module for receiving the necessary knowledge

Figure 7 -
Figure 7 -The logical model of "accounting the knowledge" and "storing statistics of user activity" modules

Figure 8 -
Figure 8 -Category keywords with average priority value extract algorithm

Figure 10 -
Figure 10 -Select statement for priority coefficients of materials related to some category

Figure 11 -
Figure 11 -DFD for updating the priority coefficients (and first level decomposition)

Figure 12 -
Figure 12 -Narrowing the input set of transactions in selected statement

Figure 13 -
Figure 13 -The transaction set extraction for associative rules search task

Figure 14 -
Figure 14 -DFD for data search using association analysis results

Figure 15 -
Figure 15 -Decrease in the average number of search iterations

Figure 16 -
Figure 16 -Decreasing the percentage of requests for invalid categories