REASONING UNDER UNCERTAINTY WITH BAYESIAN BELIEF NETWORKS ENHANCED WITH ROUGH SETS

: The objective of this paper is to present a new approach to reasoning under uncertainty, based on the use of Bayesian belief networks (BBN’s) enhanced with rough sets. The role of rough sets is to provide additional reasoning to assist a BBN in the inference process, in cases of missing data or difficulties with assessing the values of related probabilities. The basic concepts of both theories, BBN’s and rough sets, are briefly introduced, with examples showing how they have been traditionally used to reason under uncertainty. Two case studies from the authors’ own research are discussed: one based on the evaluation of software tool quality for use in real-time safety-critical applications, and another based on assisting the decision maker in taking the right course of action, in real time, in the naval military exercise. The use of corresponding public domain software packages based on BBN’s and rough sets is outlined, and their application for real-time reasoning in processes under uncertainty is presented.


INTRODUCTION
Bayesian Belief Networks (BBN's) have been widely used in Industrial Information Systems for solving all types of computational problems with insufficient information and uncertainty [1,2].This includes applications such as: water contamination [3], fault detection in an industrial process [4], fog forecasting at the airports [5], predicting software defects [6], inferring certification metrics of software [7], predicting hospital admissions for emergency [8], multisensor fusion for landmine detection [9], evaluation of risk in software development [10], modeling an air traffic control [11], cell signaling pathway modeling [12], reliability estimation [13], safety assessment [14] and risk evaluation [15] in computer-based systems, to name only a few from a long list.They have been also studied theoretically by a number of researchers, for example [16][17].
Although, in general, BBN's have been very effective, because they allow reasoning and making predictions based on small sets of probabilities with backwards inference, they are still based on probability theory.A significant disadvantage of BBN's is that, in realistic cases, they require extensive computations of the conditional probability values.In most of these studies, it has been recognized that this is one of the method's major limitations.Another disadvantage of BBN's is that they become less effective in case of missing probability values.
With this in mind, one wants to look at a complementary method of evaluating data in the input data set, which would not rely strictly on probability densities and could deal with missing values.One of the theories that offer such an approach, with values of data attributes and events measured by likelihoods rather than probabilities, is the rough sets theory [18][19].
The objective of this paper is to look at the combination of using BBN's and rough sets in decision making under uncertainty, and suggest the enhancement of pure Bayesian reasoning by additional use of rough sets for preliminary evaluation of data.The paper is structured as follows.The next two sections describe briefly basic concepts of Bayesian belief networks and rough sets, respectively.Then, in a separate section, several examples and two case studies are presented, giving an overview of the method developed for combining BBNs and rough sets for real-time computations.The final sections present general conclusions and suggestions for future work.

BAYESIAN BELIEF NETWORKS
The following section describes Bayesian Belief Networks from an application point of view rather than the underlying mathematics and statistics.Rev. Thomas Bayes developed this method of updating probabilities based on new information in the 1760s.It has been widely applied in probability and statistics for over 250 years.
"Essay Towards Solving a Problem in the Doctrine of Chances" was published after Bayes' death in 1763 [32].It is the basis for the popular inversion formula for belief updating from evidence (E) about a hypothesis (H) using probability measurements of the prior truth of the statement updated by posterior evidence P(H|E) = ( P(E|H) * P(H) ) / P(E) where H is the hypothesis, E is the evidence, and P(x|y) is the conditional probability of x given y.
It is derived by the use of the joint probability definition P(x, y) = P(x|y) * P(y) = P(y|x) * P(x) that is then arranged as P(x|y) = P(y|x) * P(x) / P(y) where x = H and y = E.
Suppose we know from historical medical records that meningitis causes a stiff neck in 1 of 2 patients.We also know that 1 in 50,000 people have meningitis and that 1 in 20 people have a stiff neck.If you wake up with a stiff neck what is the probability that you have meningitis?How do you estimate it?
The hypothesis is that you have meningitis and the evidence is your stiff neck.Applying the Bayes formula yields: P(E|H) = 1 / 2 = 0.5 or probability of a stiff neck when you have meningitis P(H) = 1 / 50,000 = 0.00002 or probability of meningitis in the population; P(E) = 1/ 20 = 0.05 or probability of a stiff neck in the population; which yields in turn: P(H|E) = ( P(E|H) * P(H) ) / P(E) = ( 0.5 * 0.00002 ) / 0.05 = 0.0002 which is the probability of having meningitis when you have a stiff neck.Much more complex models with multiple hypotheses and evidence sources can be constructed usually in a graph form relating cause to effect.
These belief networks are more recent concepts credited to Professor Judea Pearl with his construction and solution algorithms along with the work of many others [33].
A Bayesian belief network is a form of probabilistic graphical model.The belief network represents the joint probability distribution of a set of random variables with explicit independence assumptions described by a directed graph.In this research a Bayesian network is defined by a directed acyclic graph of nodes representing variables and arcs representing probabilistic dependency relations among the variables.
If there is an arc from node A to another node B, then variable B depends directly on variable A and A is called a parent of B. If the variable represented by a node has a known value then the node is said to be observed as an evidence node.A node can represent any kind of variable, be it a measured parameter, a latent variable or a hypothesis.Nodes are not restricted to representing random variables; this is what is "Bayesian" about a belief network.
In the following, an example is presented of three node networks that are structured as linear, converging, and diverging (Figure 1), with the use of Netica software program [34].A different software package for Bayesian belief networks, named Hugin [35], is equally effective and simple to use.
The above examples are causal Bayesian networks where the directed arcs of the graph are interpreted as representing causal relations in some real domain with prior information.The directed arcs do not have to be interpreted as representing causal relations; however in practice knowledge about causal relations is very often used as a guide in drawing Bayesian network graphs, thus resulting in cause and effect Bayesian belief networks.In the linear case on the top of Figure 2, A causes B that causes C, while in the converging model in the center, A is conditionally independent of B and both cause C, while in the diverging model at the bottom, A causes both B and C. In each case an effect is observed at node C illustrating the update of the joint probabilities when new information is incorporated into the network.
In the simplest case, a Bayesian network is specified by an expert and is then used to perform inference after some of the nodes are fixed to observed values.In order to fully specify the Bayesian network and fully represent the joint probability distribution, it is necessary to further specify for each node X the probability distribution for X conditional upon X's parents.The distribution of X conditional upon its parents may have many forms.
The following data (Table 1) are the conditional probabilities for the previous linear A, B, and C node network example (the top one) from Figure 2, where we observe that if C is true then column B is 0.8 true and 0.2 false.Prior to any observation as shown in Figure 1 the symmetry of the conditional probabilities makes the probability of true and false states equal to 0.5 for all nodes.The goal of inference is typically to find the distribution of a subset of the variables, conditional upon some other subset of variables with known values called the evidence or observations, with any remaining variables integrated out.This is known as the posterior distribution of the subset of the variables given the evidence.The posterior gives us a universal sufficient statistic for detection applications, when one wants to choose values for the variable subset which minimize some expected loss function, for instance the probability of decision error.
An example of inference in the center converging example from Figure 2 is to specify A as observed true and estimate the inferred values for B and C, with C updated by the new information but B unchanged since it is conditionally independent of A. Questions about the dependence among variables can be answered by studying the graph alone.It can be shown that the conditional independence is represented in the graph by the graphical property of d-separation: nodes A and B are d-separated in the converging graph, given specified evidence nodes.
For belief reasoning a typical network is organized into three layers.The top layer is the causal variables, the middle layer is the reasoning variables, and the bottom layer is the effects variables.Four general classes of reasoning are defined for this three layer architecture.As an example the following five node network with three layers using binary random variables is used to illustrate the four principal reasoning strategies used in belief networks.The example network prior distribution has an equal probability for each variable state and is symmetric to illustrate the various conditional computations.
Diagnostic reasoning, illustrated in Figure 5, observes the effects of evidence and updates the middle reasoning variables and the top layer causal variables as shown in the example.This reasoning process diagnoses from an effect E of True to the cause B or in medical terms it is reasoning from symptom to disease.It also adjusts the probabilities for the middle reasoning layer C and the causal variable A and effect variable D. Predictive reasoning, illustrated in Figure 6, observes causal evidence and updates the middle reasoning variables and bottom layer effects variables as shown in the example.This reasoning process predicts from cause B to the effects D and E such as a patient saying that he is a smoker may focus on a certain set of symptoms.It also adjusts the probabilities for the middle reasoning layer C but not the independent causal variable A. Intercausal or explaining reasoning, illustrated in Figure 7, observes both causal evidence and the middle layer reasoning evidence to update other causal variables as shown in the example.This reasoning process on C explains the mutual known cause B with unknown cause A and the common effects D and E. It is often interpreted as performing an experiment for explaining away cause A. Combined reasoning, shown in Figure 8, observes causal evidence and effects evidence to update the middle reasoning variables as shown in the example.This reasoning process combines cause B and effect E to investigate the network conditional structure for reasoning variable C and the other cause A and the other effect D. This is a useful reasoning test for building and validating complex belief networks based on limited data and expert knowledge.

ROUGH SETS
Rough Set Theory was invented by Zdzisław Pawlak to cope with limited perception of the surrounding world.The theory is especially helpful in dealing with vagueness and uncertainty in decision situations.Its main purpose is the "automated transformation of data into knowledge" [18].The data are perceived in terms of objects and their features, i.e., values of the attributes used to characterize these objects.The knowledge deduced from these data is expressed in terms of surely and possibly statements describing notions of interests.More formally, such descriptions can be divided into so-called lower and upper approximations of entire notions.In the rest of this section, we describe a qualitative procedure containing all steps needed to form appropriate description of the concepts under considerations.

EXPLANATION OF A NOTION OF A ROUGH SET
First, let us illustrate intuitively a concept of a rough set, comparing it to an ordinary set and a fuzzy set, in a single dimension.Figure 9 shows such an intuitive illustration.For an ordinary set, the interval [A,B] in Figure 9, all its elements, that is, real numbers from this interval (assuming x represents a real axis), have values of their membership function equal to 1.0.
For a fuzzy set, elements on the set boundaries, that is, in the intervals [A,C] and [D,B], have values of their membership function equal to a fraction, a number from the interval [0.0, 1.0].This means that these elements only partially belong to the set, to the extent specified by the value of a membership function.

Fig. 9 -Intuitive illustration of a rough set vs an ordinary and a fuzzy set.
In contrast to the traditional concepts of a set, whether ordinary or fuzzy, for a rough set one cannot determine, even partially, the membership of the elements on the set boundary.Therefore, the value of the membership function for boundary elements of a set is undetermined.A rough set can only be described by its approximations, as illustrated in the lower part of Figure 9.
To express these intuitive concepts a bit more formally, we start from a relational database, i.e., a table with rows corresponding to objects and columns corresponding to the attributes.Each entry of the table represents attribute value of a corresponding object (i.e., its feature).In rough set formalism the database is considered as an information system, i.e., a quadruple IS = (U, A, V, f), where U = {u 1 , …, u n } stands for a (usually finite) set of objects, , where V i is the domain of i-th attribute, and f: U × A → V is a so-called information function providing the description of objects, i.e., f(u i , a j ) assigns a value of j-th attribute to i-th object.The above mentioned concepts are illustrated in Table 2, in which Usually the set A is decomposed into two disjoint subsets A = C ∪ D and the attributes from C are used to characterize objects and form so-called condition attributes, while the attributes in D are socalled decision attributes and they are used in decision-making or classification tasks.An information system with specified condition and decision attributes is called decision table.For the example in Table 2, attributes a 1 and a 2 could be interpreted as condition attributes, that is, certain parameters of an object, with values from the range {Low, Med, High} and {Min, Under, Over, Max}, respectively, and attribute a 3 -as a decision attribute, with values "yes" and "no."Hence, Table 2 can be viewed as a decision table.Because of the limited knowledge, we cannot fully discern objects, i.e., there are such objects u, v in U that f(u, c) = f(v, c) for all the condition attributes c.This fact leads to the notion of indiscernibility relation E being in fact an equivalence relation on U.For example, for the information system in Table 2, objects u 2 and u 8 are indiscernible.So are objects u 5 and u 7 .
It appears that in many cases we can identify proper subsets C' of C such that the indiscernibility relation E C' induced by the attributes in C' is identical with the original relation E. Such sets of attributes are called reducts.Existence of reducts proves that not all of the attributes are necessary to form the equivalence classes.In other words identifying reducts allows more economic description of objects as we need smaller number of descriptors (features) to characterize these objects.Unfortunately, from a computational point of view this is an NP-hard task.No such reducts exist for the example shown in Table 2.

DEFINITION OF A ROUGH SET
Now we are ready to introduce the key concepts of rough set theory.Let B be a subset of the condition attributes and let [v] B stand for an equivalence class, i.e., a set of objects u in U with identical description (narrowed to the set B) as the object v.The subset X of U can be characterized using information in B by means of so-called Blower and B-upper approximations defined as: The lower approximation of X is the collection of objects which can be viewed surely as members of the set X, while the upper approximation of X is the collection of objects that possibly are members of X. Obviously B(X) * ⊆ B(X) * .If B(X) * = B(X) * we say that X is B-definable and otherwise it is only partially definable.The set BN B = B(X) * -B(X) * is called a B-boundary region; it specifies the objects that cannot be classified with certainty to be either inside X, or outside X.
There are many grades of partial definability.We say that the set X is: Obviously, if B = C, i.e., the full set of condition attributes is used, we omit the prefix Bin all above definitions.In such a case, a set X is characterized by the pair (X * , X * ) and we say that X is a rough set (or B-rough set).
To illustrate these newly introduced concepts for the information system in Table 2, let's distinguish between condition attributes a 1 and a 2 , and a decision attribute a 3 .Values of a 1 and a 2 , can be interpreted as vague measurements (evaluations) of certain parameters of a technical system, and a 3 can be viewed as a decision (control) based on these evaluations.Let the set B be the entire set of condition attributes, This determination can be also illustrated in the table representing the information system under consideration (Table 3).The set X for a decision variable's value equal "yes" has three corresponding objects, u 1 , u 4 and u 6 .Given values of the specific condition attributes B = {a 1 , a 2 }, two of these objects, u 1 and u 6 , lead surely to this decision value (yes).Thus, u 1 and u 6 , form the lower approximation of set X.This is illustrated with heavy shading in Table 3. If, however, we take the third object, u 4 , the values of its condition attributes, { a 1 =Med, a 2 =Under}, can produce two different values of the decision attribute: "yes" for object u 4 , and "no" for object u 3 .Thus, objects with these values of the condition attributes belong possibly to the set X, which is illustrated with light shading in Table 3. Obviously, objects in the non-shaded lines do not belong to X.
To get a numerical characterization of the "roughness" of a set X we introduce so-called accuracy of approximation where the symbol |Y| stands for the cardinality of the set Y. X is said to be crisp (or precise) with respect to the set of attributes B if and only if α B(X) = 1, and otherwise X is said to be rough (or vague) with respect to B. Another characterization of the set of objects can be obtained by introducing so-called rough membership function µ B,X : U → [0,1] defined as follows With such a definition a relationship between rough and fuzzy sets theory is established.More particularly, µ B,X (u) determines the degree in which object u described by the set B of attributes belongs to the concept (equivalence class) X.Further, we can relax the definitions of the lower and upper approximation, namely where 0 ≤ β ≤ 1.If β = 1 we obtain original definitions (1a) and (1b).

ROUGH RULES
In practical applications of interest are the sets of objects with identical set of decision attributes, that is, we define X as the set of objects satisfying the equality f(x 1 , d) = f(x 2 , d) for all attributes d in D. If D is, for example, a set of diseases then X is a set of persons suffering on a particular disease, and the equivalence classes [x] B contain patients with identical symptoms (restricted to the set B). Hence, it is natural to find such condition attributes which can be used to discriminate between different diseases.This leads us to the practical aspects of rough set theory: rough rules.
More formally, given an information system IS = (U, A, V, f) and a subset B ⊆ A we start from the set of atomic formulae, called also descriptors, being expressions of the form a = v, where a ∈ B and v ∈ V a .Next, we define the set of all possible formulae F(B,V) containing all atomic formulae and being closed with respect to the logical connectives: ¬ (negation), ∧ (conjunction) and ∨ (disjunction).
If ϕ is an atomic formula of the form a = v, then its meaning (semantics) is as follows: where d is a decision attribute; the formula ϕ is said to be predecessor (or ancestor) of the rule and the formula (d = v) -its successor (or consequent).We say that the decision rule: ϕ ⇒ (d = v) is true in the information system IS, if ||ϕ|| ⊆ ||(d = v)|| and ||ϕ|| ≠ ∅.Deeper classification of the rules is given in [36].
For instance the rule is true in the information system from Table 3, while the rule Detailed remarks on inducing rules from information systems can be found, for example, in [37].
To characterize the rules numerically, a number of measures can be introduced; support and confidence are most popular.The former is defined as the number of objects satisfying both predecessor and successor, while the latter as the conditional probability that the consequent is satisfied provided the ancestor is satisfied.In case of rule r 2 its support, sup(r 2 ) = 1, and its confidence, conf(r 2 ) = ½.The already mentioned process of "transformation of data into knowledge" translates now into refining the dependencies between sets of attributes.Intuitively, if C and D are two sets of attributes, we say that D depends totally on C, if all values of the attributes from D are uniquely determined by values of attributes from C. This is functional dependency known from database theory.
Rough set theory enables relaxing this definition by introducing a dependency in a degree k ∈ (0, 1].An interested reader is referred to [19] and [38] for details.There are at least two successful computer programs allowing rough data analysis: Rosetta [39] downloadable from the following website: http://rosetta.lcb.uu.se/general/ and LERS [40].
Finally if a new object is introduced into the data set with the attribute value missing, one could attempt to determine this value by using the previously generated rules.This is explained in the next section.

HANDLING THE MISSING VALUE IN A ROUGH SET
Grzymala-Busse describes several algorithms of dealing with missing values in information systems, based on three types of such values [41]: • those which are lost and no longer available • totally irrelevant values, and • partially relevant values.They are marked in Table 4, using the following symbols: a question mark "?" for not available values, an asterisk "*" for irrelevant values, and a dash "-" for partially relevant values.To calculate the approximations, one has to start with the meaning of the atomic formulas in a given information system.For the information system in Table 2, these meanings, called also blocks in [41] are as follows: These sets have to be modified for an information system with missing values in Table 4, as follows.For the missing value of the attribute a 1 , which is not available for object u 1 and marked "?", object u 1 has to be removed from all blocks created for this attribute, that is, block ||a 1 = Low|| will change to: with two other blocks for a 1 remaining unchanged, because they do not include objects with lost value of a 1 .
For the missing value of the attribute a 2 , which is irrelevant and marked "*", its corresponding object, u 8 , has to be included in blocks for all values of this attribute, which will lead to the following modifications: Finally, for the missing value of the attribute a 1 , which is marked "-", as partially relevant, respective object u 4 has to be added to the blocks containing objects corresponding to the decision attribute's value the same as the value of this decision attribute for the partially relevant value.In case of Table 4, the partially relevant value of attribute a 1 for object u 4 corresponds to the decision attribute's value "yes".Thus, this attribute's value is relevant to this particular decision attribute, and this is the meaning of the term "partially relevant".Two other objects exist, which have "yes" as their decision attribute's value: u 1 , whose value of attribute a 1 is unavailable, so we drop it from consideration, and u 6 , whose value of a 1 equals Low; therefore u 4 has to be added to the block, which contains a 1 = Low, because it is partially relevant to corresponding decision attribute.
So the final list of blocks looks as follows: Because of the limited length of this paper, we can only mention here that for further computations the so called characteristic sets have to be calculated, for each object, which is done as follows: 1) The characteristic set K of an object is defined as an intersection of blocks for specific values of the attributes for this object.2) If the value of an attribute is irrelevant "*" or unavailable "?", then the entire universe U is taken as a corresponding block for this attribute.3) If the value of an attribute is partially relevant "-", then for this specific block it is substituted by a union of blocks representing particular values of the attributes for the corresponding decision attribute's value.A more formal presentation of these concepts, with respective algorithms, is given in [41].Below we present the computation of characteristic sets for the list of blocks corresponding to Table 4.
As explained in [41], computation of lower and upper approximations, depends on their definitions.The author presents three such definitions and for one of them: The interpretation of this result is such that the missing values cause broadening of the potential span for the lower approximation, because they have to be inferred from the rest of the set.The upper approximation can change either way, because the missing values change the entire structure of a set.

COMBINATION OF BBN'S WITH ROUGH SETS
This section describes several examples and two case studies related to Bayesian networks and rough sets.First, we give a background on applying Bayesian networks to software quality evaluation.Next, we discuss a case study on the assessment and qualification of software tools for real-time safetycritical systems.Finally, we present a method for combining Bayesian networks and rough sets in decision making under uncertainty, and discuss the operation of two public domain tools, assisting in real-time decision making.

USE OF BBN'S FOR SOFTWARE QUALITY EVALUATION
In recent years, these authors have dealt with various aspects of assessing software quality in realtime safety-critical applications [42][43][44].The basic idea to apply Bayesian networks in such problems comes from multiple previous attempts to assess various software properties in critical applications, which are briefly outlined below.
A. Application of BBN's to Assess Software Quality.In one of the first studies reported [45], the authors addressed the eternal question: "Can we predict the quality of our software before we can use it?",by applying BBN's to assess the defect density as a measure of software quality.A simplified diagram from their study is presented in Figure 10.The nodes were built based on the understanding of life-cycle processes, from requirements specification through testing.
The probabilities of respective states were based on the analysis of literature and common-sense assumptions about the relations between variables.The node variables are shown on histograms of the predictions obtained by execution of the network after the evidence entered (the evidence is represented by nodes with probabilities equal to 1.0).As the authors say, the advantage of their model is that it "provides a way of simulating different events and identifying optimum courses of action based on uncertain knowledge."B. Use of BBN's in Assessment of Software Safety.Gran et al. [46] applied BBN's to address safety assessment of software for acceptance purposes, in a more comprehensive way, using multiple information sources, such as complexity, testing, user experience, system quality, etc.
Their BBN network for system quality, which is only a part of the entire model, is shown in Figure 11.It involves two root nodes: UserExperience and VendorQuality, and a number of leaf nodes, corresponding to observable variables, of which QualityMeasures is of particular importance.This node shows evidence about the system quality, grouping quality attributes, such as readability, structuredness, etc., and can be expanded further.

Fig. 11 -BBN for the system quality in safety assessment [46]
Other observable variables include FailuresInOtherProducts, those related to the user experience (NoOfProducts and TotalUseTime), as well as those related to quality assurance policy.When evidence becomes available, entering respective observation data into these nodes and executing the network provides assessment of the variable in question, which in this case is SystemQuality.
The authors note, however, that their example is intended more as an illustration of the method rather than as a real attempt to compute the quality of the system.Their probability assignments to node variables were chosen somewhat ad hoc, and not based on any deeper analysis of the problem.However, as the authors say in conclusion, the results of the study were positive and showed "that the method reflects the way of an assessor's thinking during the assessment process." C. BBN's in Dependability and Reliability Assessment.[47] used BBNs to formalize reasoning about software dependability to facilitate the software assessment process.They constructed a network for evaluating dependability of a softwarebased safety system.It used the data associated with two primary assumptions: the excellence in development (called a process argument) and failure-free statistical testing (called a product argument).The network topology includes taking into consideration variables such as: Test Failures, Operational Failures, Initial Faults, Faults Found, Faults Delivered, and System PFD (Probability of Failure per Demand).The probability distributions have been derived from a sample of programs from an academic experiment.
The authors were interested in estimating the probabilities of failure during acceptance testing and during the operational life of the product (represented by two variables mentioned in previous paragraph), given the prior probabilities and observed events.In particular, positive results of an acceptance test allowed deriving numerical estimates about the PFD and operational performance of the product.
Helminen [13] used BBN's to attack the problem of software reliability estimation.His primary motivation to apply BBN's was that they allow all possible evidence (large number of variables, different potential sources, etc.) to be used in the analysis of the reliability of a programmable safetycritical system.The essential characteristic of such systems is that they involve a significant number of variables related to reliability, with very limited evidence.
The reliability of such systems is modeled as a probability of failure, that is, the probability that the programmable system fails when it is required to operate correctly.To develop an estimate of probability of failure, the authors built a series of BBN models, using evidence from such sources, as the system development process, system design features, and pre-testing, before the system is deployed.This is later enhanced by data from testing and operational experience.
The essential part of this work was building BBN models for various operational profiles for multiple test cycles, involving continuous probability distributions.As a result, using BUGS software that combines Bayesian inference with Gibbs sampling, via Markov chain Monte Carlo (MCMC) simulation, it was possible to estimate, how many tests had to be run for a single system in a particular operational environment to achieve certain level of reliability.To decrease the huge number of necessary tests, multiple operational profiles for the same system were used, which required building replicated BBN models to include other profiles' evidence.In essence, by expanding the BBN models further, this approach also allows reliability estimation over the entire lifespan of the software product, but respective experiments have not been conducted in this study.

CASE STUDY IN SOFTWARE TOOL EVALUATION
To test the applicability of BBNs in software assessment, we applied this technique to evaluate the software development tools used in real-time safety critical applications in avionics.The data for the project were taken from experiments described in detail elsewhere [43,48].The experiments involved applying a number of specific criteria, including: efficiency of the generated code, to conduct forward evaluation regarding the quality of code, and traceability, to allow backward evaluation regarding the tool capability of maintaining the right requirements.To evaluate the tool during its operation from perspective of the functions it provides and the ease of use, two additional criteria seemed to be appropriate: functionality and usability.The exact process of choosing the criteria is described in [48].For criteria selected that way, a series of experiments were conducted, with six industry-strength tools applied to embedded software development.The above mentioned criteria were quantified using the following measures: Data for some measures were collected in multiple aspects, for example, data involving the development effort were divided into four categories: preparation, modeling and code generation, measurements, and postmortem (including report writing).Details of the software requirements and actual experimental results are discussed in [43].Based on the adopted model of the tool evaluation process, and the results of experiments with the selected evaluation criteria outlined above, our high-level model of a BBN for tool assessment is illustrated in Figure 12.Its primary assumptions are that the tool assessment process should involve the following mutually interrelated factors: a) development of the tool itself (including the process, vendor quality and reputation, their quality assessment procedures, etc.), b) the tool use (including experimental evaluation based on predefined criteria, but also previous user experiences with this tool, etc.) c) quality of the products developed with this tool, based on product execution, static code analysis, etc.
Based on the results of this analysis and other acceptance procedures (such as, legal aspects, independent experts opinions, etc.), the tool qualification process can be completed, as reflected in a BBN in Figure 13.Because of the limited data obtainable from experiments, we only deal with ToolUse part of the diagram in Figure 12.The logic of the BBN is similar to the ones reported in [14], where they had no real probability data, and [46], where the conditional probability values "were estimated based on judgments in a brainstorming activity among the project participants." For the experimentally collected data for six tools, nicknamed L, M, N, O, P and Q, a sample tool assessment BBN is shown in Figure 13 for a tool, which is likely to pass the qualification process with 80% confidence at the level MediumToHigh or High.

REAL-TIME APPLICATION: THE AUSTRALIAN NAVY EXERCISE
As visible from previous examples, the principle of using a BBN for reasoning under uncertainty is that when the evidence about the state of one of the nodes (variables) becomes available, the rest of the network is also updated according to the conditional probability tables and dependency relations among the nodes.However, an updating process becomes a problem, if the new evidence is distorted or missing.This situation does not look that difficult in off-line computations, such as those discussed in subsections above, because one can do additional experiments and wait for the data when they will become complete.But if one wants to use BBN's for situation assessment in real time, when missing or distorted data come into play, as in circumstances such as sensor noise or sensor failure, especially over extended period of time, then the value of Bayesian reasoning may become problematic.
In general, this issue comes into play when there is no information on certain behavior or some information previously available becomes scarce or unavailable.Then using a rough set theory can help filling the gap caused by such circumstances.To illustrate this concept, we present a case study of the Australian Naval Exercise [49].
In this case study, there are two naval military forces called Blue and Orange that are hostile towards each other, and a country that the Orange forces obtain fuel supplies from and the Blue forces treat as neutral.The Blue forces have communications and surveillance facilities that the Orange forces want to destroy.Blue have set up a restricted area that contains the communication facilities and will consider any military activity or transportation of supplies hostile.Orange have a supply route that passes through the restricted area that it wants to defend.
Blue monitor the restricted area via sensors and reconnaissance.Orange vessels that are likely to be detected are Guided Missile Frigates (FFG in Figure 14), Free Mantle Class Patrol Boards (FCPB), and Communication vessels.Oil Tankers from the neutral country can also be detected.The position, mobility, and communications activity of the vessel are also recorded to try to determine the intent of the Orange forces.
The Bayesian network in Figure 14 is used to try to determine what the intentions of the Orange forces are and how to respond to it by entering the findings from the sensors and reconnaissance into the appropriate nodes.In essence, based on this information entered into the bottom nodes, the Bayesian network recalculates the variables in all other nodes, and the value of a variable in node BlueCOA makes a suggestion to the decision maker, what would be the most appropriate Course of Action (COA) at any given time.
The situation is more complicated when some of the sensor or reconnaissance data are missing, for example, due to a sensor failure or temporary or permanent unavailability of the reconnaissance.The BBN, which does the calculations, still expects receiving new data, because the command unit has to assess the situation and make respective decisions in real time.Even though the BBN can still operate, the missing data make its assessments less and less accurate when the time progresses.
In such case, we try to employ a rough set theory, particularly in its part dealing with the missing values.The essential idea is as follows.If we treat specific variables from the BBN network as attributes of the information system (rough set), with one of them being the decision attribute and all remaining ones -condition attributes, then we can determine (with some level of accuracy) the missing values of the attributes, using the reasoning presented briefly in the section on rough sets and described in more detail in [41].In plain language, this would be equivalent to deriving the approximate value of a certain variable based on the context information.A sample of a respective information system is illustrated in Figure 15, for the Australian Naval Exercise, using a rough set tool Rosetta [39].All fourteen nodes from the Bayesian network are mapped onto attributes of an information system.In each time instant, depending on the frequency of measurements in the decision making process, a new case (an object with fourteen attributes) is created.The values of respective attributes may be obtained directly by the measurement process, or from a BBN if necessary.For example, the first attribute in Figure 15, SensorMobilityInt, corresponds to the node of the same name in the BBN in Figure 14, and has a value of RapidParallel.If some measurements are missing, this is illustrated by an asterisk in Figure 15.
The operation of software tools to conduct this process in real time is illustrated in Figure 16, with evidence meaning the new sensor measurements or reconnaissance data.Such process can be easily automated with existing tools, since a Netica version exists that has a Java API and can read cases from a text file.In turn, Rosetta, which also has a command language interface, can export its tables as text files to be grabbed by Netica.With a converter software reading Rosetta files, making respective adjustments if some data are missing, and transforming them to the Netica format, the whole system shown in Figure 16 can operate smoothly and enhance the decision making process in real time.

SUMMARY AND CONCLUSION
This paper discussed basic concepts of Bayesian belief networks and rough sets, and showed how they can be combined to enhance the process of reasoning under uncertainty in case of missing values of certain attributes of objects.Bayesian networks and rough sets are individually very adequate tools to solve computational problems with insufficient information and reason about uncertainty.The use of rough sets helps making BBN's more valuable in case of the occasional lack of evidence.It becomes particularly important, when BBN's are used in applications such as real-time decision making or active safety diagnostics, with information being supplied to the nodes during operation.In such cases, losing the source of information for one of the BBN nodes impairs the inference process in the next steps.Using rough set reasoning helps in keeping the BBN in good standing, disregarding the lost source of information.
This logic of this process is very similar to the use of a Kalman filter [50], when the information about the system is updated based on its previous behavior.However, in case of rough sets the information does not have a statistical nature, as in the case of Kalman filtering.Comparing the concepts outlined in this article with the operation of a Kalman filter would be a good topic for further study.
There are several important questions still to be addressed.For example, to apply this method in practice, one would need to know how computationally intensive are the rough set calculations?It seems that for typical applications of Bayesian belief networks, which are used in decision support systems, the deadlines for completing the computations are most likely in the order of minutes or hours, so this issue should not cause problems.

•
Efficiency measured as code size (in LOC) • Usability measured as development effort (in hours) • Functionality measured via the questionnaire (on a 0-5 points scale) • Traceability measured by manual tracking (in number of defects).

Fig. 16 -
Fig. 16 -Real-Time operation of a BBN tool with a rough set tool.