Copyright ACM, 2000


Dragan Bojic, Dusan Velasevic

University of Belgrade, Faculty of Electrical Engineering
Bulevar revolucije 73
11000 Belgrade, Yugoslavia
+381 11 3370 161



We propose a novel technique for recovering certain elements of the UML model of a software system. These include relationships between use cases as well as class roles in collaborations that realize each use case, identifying common functionality and thus establishing a hierarchical view of the model. The technique is based on dynamic analysis of the system for the selected test cases that cover relevant use cases. The theory of formal concept analysis is applied to obtain classification of model elements, obtained by a static analysis of code, in terms of use case realizations.


Object-oriented systems, UML modeling, reverse engineering, use cases.


A support for reverse engineering a system model from its code has become a standard part of modern analysis and design tools to satisfy the needs for round-trip engineering, updating a model to the evolved code, and taking the legacy context into account. Our technique is devised to supplement a model obtained by a typical static analyzer. Such models lack the support for reverse engineering of the use case architectural view. Therefore, we have decided to address this issue by recovering use case realizations, which serve as a basis for classifying model elements in a hierarchically nested package structure that is normally a part of a design model as prescribed by [9].

We have adopted the approach to conduct reverse engineering guided by use cases (i.e. scenarios of system use). A use case is a description of a set of sequences of actions, including variants, that a system performs to yield an observable result of value to an actor (a specific role of an external entity). Numerous advantages have been reported if a process of developing or reengineering a system is driven by use cases [7]:

The presented technique takes into account the following practical considerations:


There is a number of methods aimed at recovering system model at the architectural level. Structural clustering techniques define strictly structural criteria for decomposing a system in a hierarchy of subsystems – minimal coupling between subsystems and maximal coupling between entities in the same subsystem. For example, In [13], a suboptimal algorithms are proposed, based on search (hill climbing) and a genetic algorithm. Wiggerts [23] presents an overview of clustering algorithms that could be used for software remodularization.

Clustering techniques that focus on data abstraction, consider the use of data of the same type as the grouping criterion. These techniques find their use in discovering candidates for object and classes in non-object-oriented software. Lindig and Snelting [12] use a formal concept analysis on relation between procedures and global variables. However, experiments suggest a limited usability for discovering system structure (partly because of noise in input data). Siff and Reps [22] also use formal concept analysis in a similar context, but also make use of negative information, e.g. function f does not use structure x. They claim more promising results than [12].

Plan recognition techniques [11], [17], [18] need a library of predefined programming plans which are matched against parts of code to build a hierarchy of concepts (abstractions such as complex data structures or a specific functionality). Each plan consists of two parts: a pattern, used for matching and a concept that is recognized with the pattern. A pattern is a combination of components (the language items or subplans that must be recognized) and constraints (the relationships that must hold between these components to have the plan recognized). Plan matching can be performed by a variety of techniques including deductive inference, constraint satisfaction or pattern matching, and could be combined with the top-down search of plan library for a plan that matches a particular concept. These techniques require the existence of large plan database and at the present guarantee only partial model recovery. There is also a problem of varying code patterns that implement the same plan, and a scaling problem when considering large systems.

Recently, a work has been done on identifying standard design patterns in code. For example, Antoniol [2] present a multi-stage reduction strategy using software metrics and structural properties to extract some structural design patterns from code. For now, only these kinds of patterns can automatically be identified, and experiments show that they are rarely used in real programs.

Besides automatic decomposition techniques, with a possibility for a user to guide a process in some phases, there exist a number of manual and semi-automatic techniques. These techniques allow collecting, filtering and presenting in a suitable form, data obtained by means of either static or dynamic analysis of the system. The user is, however, responsible for identifying and grouping system entities into subsystems, or for connecting implementation entities with a functional model of the system. Rigi tool [14] allows a user to create multiple views, and also has some graph arranging algorithms to cluster related system entities. In [20], [19] static and dynamic information is modeled in terms of logic facts in Prolog, which allows creating views with high-level abstractions using declarative queries. In software reflexion model technique [15], a user defines a high-level model (a graph of logical subsystems) and specifies how the model maps to the source model (a call graph) – by assigning functions to logical subsystems. A tool then computes software reflexion model, a graphical representation that shows how well the connections between logical subsystems defined in high-level model agree with those actually found in source model.

De Hondt [6], among other methods, describes a method tagging technique for architectural model recovery that relies on comments in the code inserted manually during software development.

Finally, there has been some work on locating functionality in code using static or dynamic analysis. The Software Reconnaissance technique implemented in c Suds toolset [1] locates plans i.e. code fragments associated with a particular feature i.e. functionality by comparing traces of test cases that exhibit or do not exhibit a feature. Every feature must be considered separately from others, and the focus is on small-scale features. Wilde [24] proposed a technique to partition a program code in equivalence classes such that all code in the same class has the same execution count in each of multiple test cases. Experimental results revealed that only a tiny fraction of identified classes could be related to some functionality.

Slicing techniques make use of call and data dependency graphs to identify those parts of code that contribute the computation of the selected function, for all possible inputs, in case of static slicing, or for a given input, in case of dynamic slicing. [10] presents a dynamic slicing technique to help understanding of large programs, by visualizing the slices at the level of call graphs. In [5] a static slicing is used to identify user interface software layer.


There are several activities to perform when applying the proposed technique:

  1. Identify a set of use cases for a given system or a part of it.
  2. For each use case, define one or more test cases that cover relevant use case.
  3. Collect profiling information for each test case.
  4. Construct the context relation between use cases and covered system code entities.
  5. Construct the conceptual lattice (i.e. acyclic digraph) from a context relation by applying formal concept analysis. Informally, each concept is a pair of two components: a set of use cases and a set of system code entities that is a shared part of realization of these use cases. If an edge connects two concepts in the lattice, then the parent concept represent "broader" functionality that in some way "uses" the functionality of the child concept.
  6. Estimate the quality of decomposition from the topology of the lattice, and if necessary, modify the test cases or apply some purification of the context relation, to eliminate the effects of interleaving functionality in code and other undesired effects, and repeat activities necessary to obtain the updated lattice.
  7. Construct the basic UML model using static analyzer (an integral part of the design tool) and decompose the model using the concept lattice.

These activities are further elaborated in sections 3.2–3.6. In the next section we present the results obtained by applying the technique to a small sample application.

3.1. A Small Example

The sample application is Scribble step 3 tutorial application from Microsoft Visual C++. It is MDI application that supports elementary drawing, which can be saved and subsequently retrieved. It has a total of 34 classes and 401 member and non-member functions. MFC framework classes were excluded from the analysis. That leaves 11 user defined classes and 121 functions, which does not violate a limitation of ConImp in the number of objects. We defined the following use cases that roughly corresponds to items from the main menu:

U1. Working with Documents (specified with either U2 or U3)

U2. Creating, Opening and Saving

U3. Printing

U4. Working with Multiple Windows

U5. Drawing

U6. Configuring

U7. Help

We defined twelve test cases to cover these use cases (for most of them the profiling data were differed with Application Start and Exit test case). For example, the first test case was to Start the application, select File New from the menu, and then exit. It covered U1 and U2.

From the profiling data, we obtained the concept lattice shown in Figure 1.

Figures 26 present the logical structure of the model. For each concept, except the root and the leaf, there is one logical package. A lattice determines the containment relationship between packages. The package that corresponds to some concept U contains or references only those parts of classes relevant to realization of the use case U, which are not already present in contained packages. Those parts are shown in the pictures. The only exception is "Working With Multiple Windows" package, in which member functions are not shown at all to conserve space. A static analyzer that is a part of Rose induced relationships between classes.

Figure 1: The concept lattice for Scribble

Figure 2: Working With Multiple Windows package

Figure 3: Drawing package

Figure 4: Creating, Opening and Saving package

Figure 5: Printing package


Figure 6: Working with Documents, Configuring and Help packages

Now we shall discuss the decomposition process in detail.

3.2. Identifying Use Cases

Each use case represents one functional requirement. We are interested in those that are "architecturally significant" [9] so that separate use cases should not be used for too elementary system functions. A use case can also be introduced for complex functionality that includes other functions. Relationships between use cases (as described in [3]) are not introduced in this step. They will be determined automatically in the next phases.

Use case identification can be done manually, in a systematic manner similar to one described in [16]. For example, for Graphical User Interface applications, each menu item, toolbar button and other items that initiates some function can be assigned one use case. This procedure can even be automated by analyzing resource file that contains definitions of such items (resource descriptions can be extracted even from executable file for Windows applications).

3.3. Selecting and Executing Test Cases

A set of test cases should be selected, to cover all use cases. Each test case can be related to more than one use case. For example, a test case can cover general editing functionality and specific text replacement functionality; or a test can cover opening test file and then previewing that file for printing.

The profiling information is then collected for each test case. We have chosen to collect executions counts at the function (class members) level (the code lines level seemed too detailed). Using execution counts, and not just coverage information enables what we called "test case differing": suppose we conduct a test case in which a file is opened in editor, and another in which a file is opened and then previewed for printing. By subtracting execution counts of the first test case from those of the second, we obtain a profiling information for just the print preview use case, which can now be treated as a separate test case.

Figure 7 shows a Test Recorder application screen of our prototype recovery system. This application is used to activate profiling session in Visual Studio environment for each test case and collect profiling data using automation interface. The user then selects covered use cases from a list of all use cases. This information and profiling data can be either saved to file or memorized internally for test differing. A difference between previously memorized data and current data can also be saved to file as a separate test case.

3.4. Introduction to Formal Concept Analysis

Concept analysis provides a way to identify sensible groupings of objects that have common attributes [4].

In order to understand the basics of concept analysis, a few definitions are required. A context is a triple C = (O, A, R), where O and A are finite sets (the objects and attributes, respectively), and R is a binary relation between O and A.

Let X Í  O and Y Í A. The mappings

s (X) = {a Î  A | "Î  X : (o, a) Î  R}

(the common attributes of X) and

t (Y) = {o Î  O | "Î  Y : (o, a) Î  R}

(the common objects of Y) form a Galois connection. That is, the mappings are antimonotone:

X1 Í  X2 Þ  s (X2)  Í   s (X1)

Y1 Í  Y2 Þ  t (Y2)  Í  t (Y1)

and extensive:

Í  t (s (X)) and Y Í   s (t (X)).

A concept is a pair of sets – a set of objects (the extent) and a set of attributes (the intent) (X, Y) such that Y = s (X) and X = t (Y ). That is, a concept is a maximal collection of objects sharing common attributes. A concept (X0, Y0) is a subconcept of concept (X1, Y1) if X0 Í  X1 (or, equivalently, Y1 Í  Y0).

The subconcept relation forms a complete partial order (the concept lattice) over the set of concepts. The fundamental theorem for concept lattices [4] relates subconcepts and superconcepts as follows:

The significance of the theorem is that the least common superconcept of a set of concepts can be computed by intersecting their intents, and by finding the common objects of the resulting intersection.

There are several algorithms for computing the concept lattice for a given context. We describe a simple bottom-up algorithm here.

An important fact about concepts and contexts used in the algorithm is that, given a set of objects X, the smallest concept with extent containing X is (t (s (X)), s (X)). Thus, the bottom element of the concept lattice is (t (s (Æ )), s (Æ )) – the concept consisting of all those objects that have all the attributes (which is often the empty set, as in our example).

The initial step of the algorithm is to compute the bottom element of the concept lattice. The next step is to compute atomic concepts – smallest concepts with extent containing each of the objects treated as a singleton set. The atomic concepts correspond to those elements in the concept lattice reachable from the bottom element in one step.

The algorithm then closes the set of atomic concepts under join. Initially, a work list is formed containing all pairs of atomic concepts (c’, c) such that c c’ and c’ c. While the work list is not empty, remove an element of the work list (c0, c1) and compute c’’ = c0 ë û  c1. If c’’ is a concept that is yet to be discovered then add all pairs of concepts (c’’, c) such that c c’’ and c’’c to the work list. The process is repeated until the work list is empty.

Figure 7: Test Case Recorder Application

3.5. Applying Formal Concept Analysis

For our purposes, the set of objects O includes all functions, class member as well as non-member, in our application. The set of attributes A includes all use cases as defined by the user. Context relation between functions and use cases is defined with a meaning "F is related with U if, and only if, F is a part of code that implements U". Relation implements is established on the basis of profiling data obtained by executing test cases, and is defined as follows:

F implements U, if, and only if, the following conditions hold:

  1. There exist at least one test case that executes F (at least once) and covers U, and
  2. Every test case that executes F (at least once) also covers U.

Note that it is not required that every test case that covers U must also execute F, because there can exist different execution paths for U.

With this definition of context relation, the extent E of each concept C in a resulting concept lattice is a set of functions that are the shared part of implementation code of the set of use cases that forms the intent I of C.

In a real world situation, two problems can possibly appear with this interpretation. First, if all test cases that cover certain use case U1 also cover some unrelated use case U2, than all functions that implement U1 will be mistakenly related to U2 also. This problem can be eliminated by carefully choosing test cases, or by applying "test differing".

Second problem is code interleaving, a fact that a same function F can actually take part in implementing two unrelated use cases U1 and U2. The above definition works fine for this situation, relating F to both U1 and U2, but a resulting concept lattice mistakenly relates together U1 and U2. To eliminate this problem, a user is required to check the context matrix before performing lattice calculation, and in case he or she finds function F that implements two or more unrelated use cases, to split row F in two or more rows, one for each of implemented use cases. The second modification of context relation is the elimination of all functions that belong to standard library or foundation classes because they are used in many different contexts. This does not apply to classes that are inherited from these standard classes to perform some specific functionality

The quality of decomposition can be estimated from the topology of the lattice, and could serve as a basis for a decision to reformulate test cases or to modify the context matrix. Figure 8 shows an example of the preferred topology. When we remove the root concept (representing all functions and none of use cases) and the only leaf concept (if such exists, it represents all use cases), we should obtain a forest (the set of proper trees).

Figure 8: A preferred topology of the concept lattice

3.6. Updating Model from the Lattice

From the concept lattice, we can determine various elements of a UML model.

To specify model elements, let us first define the notions of object and attribute concepts. The function concept (E,I) that corresponds to a function f is the smallest (closest to the leaf) concept for which f Î E. The use case concept (E,I) that corresponds to a use case u is the largest (closest to the root) concept for which u Î I.

A concept can be use case concept for zero or more use cases. In the former case, a concept represents some common functionality that is not externally visible. In the latter case, if there is more than one use case corresponding to a particular concept, further test cases are needed to differentiate between those use cases. In case of one to one correspondence, each use case concept corresponds to the collaboration (use case realization) of the corresponding use case. Extents of these concepts define analysis classes (parts of real, implementation classes that take role in the realization of relevant use cases).

For every concept in a lattice, except the root and the leaf ones, we introduce one package in a logical view (more precisely, in a distinct subview of this view). The superconcept–subconcept relation corresponds to the containment relation between packages. If there is a concept in the lattice with more than one parent concept, a corresponding package will be owned by one parent package and referenced from others. Packages that correspond to use case concepts are named after the relevant use cases.

Each non-member function belongs to the package that corresponds to its function concept. In case of classes, there could be more than one function concept for its member functions. In that case, we assign the class to the package that corresponds to the concept that is function concept to most of its functions, and reference the class from other packages.

It is well known that the use case and the logical view of a UML model are related with each other by interaction diagrams describing particular use cases, showing interaction (function member calls) of instances of classes from the logical view that are implementing these use cases.

Having only the information obtained from static analysis and execution counts, one cannot determine exact interaction diagrams but only model elements relevant for the implementation of each use case. They all belong to the package that corresponds to use case concept for a particular use case.

Finally, the ordering relation on the set of concepts can be used to determine relationships among particular use cases. If the attribute concept of some use case U is a superconcept of the attribute concept of some use case U’ then there exists a unidirectional relationship between them, but a deeper analysis is needed to establish which kind.


The proposed technique is conceived to automate a part of task that system reengineer has to do manually after creating basic UML model with a static analyzer.

We certainly do not imply that a presented decomposition strategy is "the best" strategy, in the sense that the original designer would create the same decomposition, or that it works equally well for all kinds of software systems. But it is our opinion that a functional decomposition is useful for the system understanding task that faces a reengineer unfamiliar with system implementation. More experimentation should be done, but we do not expect problems with scaling the technique to large systems, because there are already experimental evidence of applying formal concept analysis, in a slightly different manner, to large systems [12].

A further work is needed to extend the technique to include creation of interaction diagrams, on the basis of more detailed dynamic data. The second possible research direction is to incorporate other criteria (possible through adding attributes in the context relation) such as layering, which is, as [6] points out, the decomposition criterion orthogonal to functional decomposition.


  1. H. Agrawal, J. L. Alberi, J. R. Horgan, J. Jenny Li, Saul London, W. Eric Wong, Sudipto Ghosh, Norman Wilde, Mining System Tests to Aid Software Maintenance, IEEE Computer, July 1998, pp. 64-73
  2. G. Antoniol, R. Fiutem, L. Cristoforetti, Design Pattern Recovery in Object-Oriented Software, Proceedings of the 6th International IEEE Workshop on Program Comprehension IWPC 98, Ischia, Italy, 1998, pp. 153-160
  3. G. Booch, J. Rumbaugh, I. Jacobson, The Unified Modeling Language User Guide, Addison-Wesley, 1998.
  4. P. Burmeister, Formal Concept Analysis with ConImp: Introduction to the Basic Features, Arbeitsgruppe Allgemeine Algebra und Diskrete Mathematik, Technische Hochschule Darmstadt, Schloßgartenstr. 7, 64289 Darmstadt, Germany, 1998
  5. G. Canfora, A. Cimitile, A. De Lucia, G. A. Di Lucca, Decomposing Legacy Programs, a First Step Towards Migrating to Client-Server Platforms, Proceedings of the 6th International IEEE Workshop on Program Comprehension IWPC 98, Ischia, Italy, 1998, pp. 136-144
  6. K.De Hondt, A Novel Approach to Architectural Recovery of Object-Oriented Systems, PhD Theses, Vrije Universiteit Brussel, 1998.
  7. M. Jarke, Scenarios for Modeling, Communications of the ACM, January 1999, Vol. 42, No. 1, pp. 47-48
  8. M. Jarke, R. Kurki-Suonio, Scenario Management – Introduction to the Special Issue, IEEE Transactions on Software Engineering, Vol. 24, No. 12, December 1998, pp. 1033-1035
  9. I. Jacobson, G. Booch, J. Rumbaugh, The Unified Software Development Process, Addison-Wesley, 1999.
  10. B. Korel, J. Rilling, Program Slicing in Understanding of Large Programs, Proceedings of the 6th International IEEE Workshop on Program Comprehension IWPC 98, Ischia, Italy, 1998, pp. 145-152
  11. W. Kozaczynski, J. Ning, A. Engberts, Program Concept Recognition and Transformation, IEEE Transactions on Software Engineering, Vol. 18, No. 12, December 1992, pp. 1065-1075
  12. C. Lindig, G. Snelting, Assessing Modular Structure of Legacy Code Based on Mathematical Concept Analysis, ICSE 97, Boston, USA, 1997, pp. 349-359
  13. S. Mancoridis, B. S. Mitchell, C. Rorres, Y. Chen, E. R. Gansner, Using Automatic Clustering to Produce High-level System Organizations of Source Code, IEEE Proceedings of 6th International Workshop on Program Understanding IWPC’98, Ischia, Italy, June, 1998, pp. 45-52
  14. H. Müller, M. Orgun, S. R. Tilley, J. S. Uhl, A Reverse Engineering Approach To Subsystem Structure Identification, Software Maintenance: Research and Practice, 5(4), December 1993, pp. 181-204
  15. G. Murphy, D. Notkin, K. Sullivan, Software Reflexion Models: Bridging the Gap between Source and High-Level Models, Proceedings of the ACM SIGSOFT ’95, Washington, D.C., 1995, pp. 18-28
  16. W.B. Noffsinger, R. Niedbalski, M. Blanks, N. Emmart, Legacy Object Modeling Speeds Software Integration, Communications of the ACM, Vol. 41, No. 12, December 1998, pp. 80-89
  17. A. Quilici, Q. Yang, S. Woods, Applying Plan Recognition Algorithms To Program Understanding, Journal of Automated Software Engineering , 5(3):347-372, 1998.
  18. A. Quilici, D. Chin, DECODE: A Cooperative Environment for Reverse-Engineering Legacy Software, Proceedings of the Second IEEE Working Conference on Reverse Engineering, pp. 156-165, July 1995.
  19. T. Richner, S. Ducasse, Recovering High-Level Views of Object-Oriented Applications from Static and Dynamic Information, to appear in IEEE Proceedings ICSM’99
  20. T. Richner, S. Ducasse, R. Wuyts, Understanding Object-Oriented Programs with Declarative Event Analysis, 12th European Conference on Object-Oriented Programming, Workshop on Experiences in Object-Oriented Re-Engineering, Brussels, Belgium, July 1998.
  21. M. Sefika, A. Sane, R. H. Campbell, Monitoring compliance of a software system with its high-level design models. In Proceedings ICSE-18, March 1996, pp 387–396
  22. M. Siff, T. Reps, Identifying Modules Via Concept Analysis, IEEE International Conference on Software Maintenance, Bary, Italy, Sept. 1997
  23. T. A. Wiggerts, Using Clustering Algorithms in Legacy System Remodularization, Working Conference of Reverse Engineering WCRE ’97, pp. 33-43
  24. N. Wilde, M. Cotten, S. London, Using Execution Counts to Identify Delocalized Program Plans, Tech. Rpt. SERC-TR-81F, Software Engineering Research Center, OSE-301, University of Florida, FL 32611, 1996.

Copyright 2000 ACM

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and or fee.
SAC 2000 March 19-21 Como, Italy
(c) 2000 ACM 1-58113-239-5/00/003>...>$5.00