The Problem Domain: The problem domain consists of transaction data recorded individually by a set of organizations when they interact with one other. Each record in a transaction log should consist of at least the following fields: <event_type, sender_process, receiver_process, id > where event_type is either ``send'' or ``receive''; sender_process is the local workflow that requests services from a remote workflow; receiver_process is the workflow process of the remote workflow that is requested, and id is a unique identifier that identifies the starting workflow instance that initiated the transaction. id is used to track semantic interactions across multiple transaction log entries.
Since transaction records are maintained in different logs (and timestamped by possibly unsynchronized clocks), it would be first necessary to obtain an ordering of the events that occur in the transaction logs. For this, we use Lamport's  technique for obtaining a partial order by using logical time. The essential idea behind this is to provide a logical time stamp for each event occurrence, such that every receive event occurs after the send event that sent the message.
Interaction Entities: An interaction entity is defined to be the set of all interactions which share the same id. An interaction entity is represented by a directed graph structure, where nodes correspond to services and edges correspond to recorded transactions among the services. An interaction entity is thus a logical unit of interaction, or a process structure that exists across the interacting actors. An interaction entity type is defined to be the set of all interaction entities which have the same interaction structure. And an interaction entity type is termed ``interesting'' if it satisfies any interestingness criteria as defined by the user. An interestingness criteria especially suited for mining transaction logs is that of consistency, which was proposed earlier in .
After the mining process, interesting interaction entity types may be
clustered based on the similarity between their structures. A measure
called overlap is proposed for clustering interaction entity
types. The overlap measure between any two interaction entity types
represented by directed graphs
G1 = <V1,E1>
G2 = <V2,E2>
is computed as follows-
overlap = (|V1 ^ V2| + |E1 ^ E2|) / (|V1 U V2| + |E1 U E2|)
where ^ denotes set intersection, U denotes set union; the set V is the set of all nodes, and the set E is the set
of all directed edges. The overlap measure is computed for every pair
of interaction entity types. Clusters are formed by using a threshold
overlap measure to filter away all non overlapping structures from a
cluster. A cluster C of interaction entity types
< V1, E1> , < V2, E2> , ..., < Vi, Ei>
is represented by
C = < (V1 U V2 U ... Vi), (E1 U E2 U ... Ei)> .
Behavioral Properties: Clusters of interaction entity types which are interesting, are then selected to discern behavioral properties. Behavioral properties of interaction entity types are depicted in the form of a flow graph, which illustrates different interaction paths that could be taken as part of an interaction entity structure. In the flowgraph, each node depicts a transaction of the form a -> b or b <- a denoting send and receive events respectively.
The flowgraph for an interaction cluster is constructed by building flowgraphs for each of the id values in the cluster, and merging the constructed graphs together. This is done as follows- for each id value in the cluster, the logical timestamps are normalized such that logical time starts from zero. Then an effective event is computed for each logical time, by computing an AND on each of the events. For example, if events a -> b and c <- e occur in a single logical time, then the effective event is (a -> b) && (c <- e) . A flow path is established from the effective task of the previous logical time, to the present time.
Flowgraphs so created for each id are merged to form a behavioral graph for the selected interesting cluster. Merging is done by computing a logical OR among the effective events from all the different ids in the cluster. Hence, if there are two ids in the cluster and at a particular logical time, the events recorded were a -> b and b -> c , then the merged flowgraph would show the effective event as (a -> b) || (b -> c) .
Functional Properties: If an interaction cluster is found interesting, it is likely to be implemented as a separate service in the workflow that connects the disparate workflows together. Hence, it is desirable to determine possible semantic states in such a service in order to design it appropriately.
Functional properties are represented in the form of semantic states in the interaction process. The task is to build a state machine that explains the transaction data that are part of the interaction cluster. There have been a few approaches in existing literature regarding the generation of a state machine from execution data - for example , . A generated state machine is characterized between two boundaries - ``most general'' and ``most specific''. Our approach generalizes from the data in a specific way and hence falls somewhere midway between the two boundaries. A more detailed presentation of the approach and an analysis about the generalization may be found in .
The essential idea of our approach is as follows - each send event is considered as an input to a hypothetical automaton from an actor, and each receive event is considered to be an output from the automaton to the actor. The states of the automaton are determined as follows - (a) Start the process with a hypothetical Start state and with a set of strings each of which denote a particular execution (discerned from the id values). (b) For the first string, build a state machine consisting of a state sequence that would recognize the string. (c). For subsequent executions, trace the string as far as possible on the built state machine. When there are no leading paths from a state for a given symbol, create a breakaway path with new states. (c). Merge the final states of all the different traces, and perform optimization by merging together redundant string expressions leading to the end state.