Refine
Document Type
- Conference proceeding (15)
- Journal article (8)
Language
- English (23)
Is part of the Bibliography
- yes (23)
Institute
- Informatik (23)
Publisher
- IARIA (15)
- Gesellschaft für Informatik e.V (2)
- IEEE (2)
- Ed2.0Work (1)
- Springer (1)
Knowledge transfer is very important to our knowledge-based society and many approaches have been proposed to describe this transfer. However, these approaches take a rather abstract view on knowledge transfer, which makes implementation difficult. In order to address this issue, we introduce a layered model for knowledge transfer that structures the individual steps of knowledge transfer in more detail. This paper gives a description of the process and also an example of the application of the layered model for knowledge transfer. The example is located in the area of business process modelling. Business processes contain the important knowledge describing the procedures of the company to produce products and services. Knowledge transfer is the fundamental basis in the modelling and usage of Business processes, which makes it an interesting use case for the layered model for knowledge transfer.
Online credit card fraud presents a significant challenge in the field of eCommerce. In 2012 alone, the total loss due to credit card fraud in the US amounted to $ 54 billion. Especially online games merchants have difficulties applying standard fraud detection algorithms to achieve timely and accurate detection. This paper describes the Special constrains of this domain and highlights the reasons why conventional algorithms are not quite effective to deal with this problem. Our suggested solution for the problem originates from the fields of feature construction joined with the field of temporal sequence data mining. We present Feature construction techniques, which are able to create discriminative features based on a sequence of transaction and are able to incorporate the time into the classification process. In addition to that, a framework is presented that allows for an automated and adaptive change of features in case the underlying pattern is changing.
The recent years and especially the Internet have changed the ways in which data is stored. It is now common to store data in the form of transactions, together with ist creation time-stamp. These transactions can often be attributed to Logical units, e.g., all transactions that belong to one customer. These groups, we refer to them as data sequences, have a more complex structure than tuple-based data. This makes it more difficult to find discriminatory patterns for classification purposes. However, the complex structure potentially enables us to track behaviour and its change over the course of time. This is quite interesting, especially in the e-commerce area, in which classification of a sequence of customer actions is still a challenging task for data miners. However, before standard algorithms such as Decision Trees, Neural Nets, Naive Bayes or Bayesian Belief Networks can be applied on sequential data, preparations are required in order to capture the information stored within the sequences. Therefore, this work presents a systematic approach on how to reveal sequence patterns among data and how to construct powerful features out of the primitive sequence attributes. This is achieved by sequence aggregation and the incorporation of time dimension into the feature construction step. The proposed algorithm is described in detail and applied on a real-life data set, which demonstrates the ability of the proposed algorithm to boost the classification performance of well-known data mining algorithms for binary classification tasks.
When forecasting sales figures, not only the sales history but also the future price of a product will influence the sales quantity. At first sight, multivariate time series seem to be the appropriate model for this task. Nonetheless, in real life history is not always repeatable, i.e., in the case of sales history there is only one price for a product at a given time. This complicates the design of a multivariate time series. However, for some seasonal or perishable products the price is rather a function of the expiration date than of the sales history. This additional information can help to design a more accurate and causal time series model. The proposed solution uses an univariate time series model but takes the price of a product as a parameter that influences systematically the prediction based on a calculated periodicity. The price influence is computed based on historical sales data using correlation analysis and adjustable price ranges to identify products with comparable history. The periodicity is calculated based on a novel approach that is based on data folding and Pearson Correlation. Compared to other techniques this approach is easy to compute and allows to preset the price parameter for predictions and simulations. Tests with data from the Data Mining Cup 2012 as well as artificial data demonstrate better results than established sophisticated time series methods.
This work presents a disconnected transaction model able to cope with the increased complexity of longliving, hierarchically structured, and disconnected transactions. Wecombine an Open and Closed Nested Transaction Model with Optimistic Concurrency Control and interrelate flat transactions with the aforementioned complex nature. Despite temporary inconsistencies during a transaction’s execution our model ensures consistency.
The recent years and especially the Internet have changed the way on how data is stored. We now often store data together with its creation time-stamp. These data sequences potentially enable us to track the change of data over time. This is quite interesting, especially in the e-commerce area, in which classification of a sequence of customer actions, is still a challenging task for data miners. However, before Standard algorithms such as Decision Trees, Neuronal Nets, Naive Bayes or Bayesian Belief Networks can be applied on sequential data, preparations need to be done in order to capture the information stored within the sequences. Therefore, this work presents a systematic approach on how to reveal sequence patterns among data and how to construct powerful features out of the primitive sequence attributes. This is achieved by sequence aggregation and the incorporation of time dimension into the Feature construction step. The proposed algorithm is described in detail and applied on a real life data set, which demonstrates the ability of the proposed algorithm to boost the classification performance of well known data mining algorithms for classification tasks.
This paper presents a concurrency control mechanism that does not follow a ‘one concurrency control mechanism fits all needs’ strategy. With the presented mechanism a transaction runs under several concurrency control mechanisms and the appropriate one is chosen based on the accessed data. For this purpose, the data is divided into four classes based on its access type and usage (semantics). Class O (the optimistic class) implements a first-committer-wins strategy, class R (the reconciliation class) implements a first-n-committers-win strategy, class P (the pessimistic class) implements a first reader-wins strategy, and class E (the escrow class) implements a firsnreaderswin strategy. Accordingly, the model is called OjRjPjE. Under this model the TPC-C benchmark outperforms other CC mechanisms like optimistic Snapshot Isolation.
When forecasting sales figures, not only the sales history but also the future price of a product will influence the sales quantity. At first sight, multivariate time series seem to be the appropriate model for this task. Nontheless, in real life history is not always repeatable, i.e. in the case of sales history there is only one price for a product at a given time. This complicates the design of a multivariate time series. However, for some seasonal or perishable products the price is rather a function of the expiration date than of the sales history. This additional information can help to design a more accurate and causal time series model. The proposed solution uses an univariate time series model but takes the price of a product as a parameter that influences systematically the prediction. The price influence is computed based on historical sales data using correlation analysis and adjustable price ranges to identify products with comparable history. Compared to other techniques this novel approach is easy to compute and allows to preset the price parameter for predictions and simulations. Tests with data from the Data Mining Cup 2012 demonstrate better results than established sophisticated time series methods.
At DBKDA 2019, we demonstrated that StrongDBMS with simple but rigorous optimistic algorithms, provides better performance in situations of high concurrency than major commercial database management systems (DBMS). The demonstration was convincing but the reasons for its success were not fully analysed. There is a brief account of the results below. In this short contribution, we wish to discuss the reasons for the results. The analysis leads to a strong criticism of all DBMS algorithms based on locking, and based on these results, it is not fanciful to suggest that it is time to re-engineer existing DBMS.
Schema and data integration have been a challenge for more than 40 years. While data warehouse technologies are quite a success story, there is still a lack of information integration methods, especially if the data sources are based on different data models or do not have a schema. Enterprise Information Integration has to deal with heterogeneous data sources and requires up-to-date high-quality information to provide a reliable basis for analysis and decision-making. The paper proposes virtual integration using the Typed Graph Model to support schema mediation. The integration process first converts the structure of each source into a typed graph schema, which is then matched to the mediated schema. Mapping rules define transformations between the schemata to reconcile semantics. The mapping can be visually validated by experts. It provides indicators and rules to achieve a consistent schema mapping, which leads to high data integrity and quality.