Informatik
Refine
Year of publication
Document Type
- Conference proceeding (15)
- Journal article (8)
Language
- English (23)
Is part of the Bibliography
- yes (23) (remove)
Institute
- Informatik (23)
Publisher
- IARIA (15)
- Gesellschaft für Informatik e.V (2)
- IEEE (2)
- Ed2.0Work (1)
- Springer (1)
This work presents a disconnected transaction model able to cope with the increased complexity of longliving, hierarchically structured, and disconnected transactions. Wecombine an Open and Closed Nested Transaction Model with Optimistic Concurrency Control and interrelate flat transactions with the aforementioned complex nature. Despite temporary inconsistencies during a transaction’s execution our model ensures consistency.
Knowledge transfer is very important to our knowledge-based society and many approaches have been proposed to describe this transfer. However, these approaches take a rather abstract view on knowledge transfer, which makes implementation difficult. In order to address this issue, we introduce a layered model for knowledge transfer that structures the individual steps of knowledge transfer in more detail. This paper gives a description of the process and also an example of the application of the layered model for knowledge transfer. The example is located in the area of business process modelling. Business processes contain the important knowledge describing the procedures of the company to produce products and services. Knowledge transfer is the fundamental basis in the modelling and usage of Business processes, which makes it an interesting use case for the layered model for knowledge transfer.
The recent years and especially the Internet have changed the ways in which data is stored. It is now common to store data in the form of transactions, together with ist creation time-stamp. These transactions can often be attributed to Logical units, e.g., all transactions that belong to one customer. These groups, we refer to them as data sequences, have a more complex structure than tuple-based data. This makes it more difficult to find discriminatory patterns for classification purposes. However, the complex structure potentially enables us to track behaviour and its change over the course of time. This is quite interesting, especially in the e-commerce area, in which classification of a sequence of customer actions is still a challenging task for data miners. However, before standard algorithms such as Decision Trees, Neural Nets, Naive Bayes or Bayesian Belief Networks can be applied on sequential data, preparations are required in order to capture the information stored within the sequences. Therefore, this work presents a systematic approach on how to reveal sequence patterns among data and how to construct powerful features out of the primitive sequence attributes. This is achieved by sequence aggregation and the incorporation of time dimension into the feature construction step. The proposed algorithm is described in detail and applied on a real-life data set, which demonstrates the ability of the proposed algorithm to boost the classification performance of well-known data mining algorithms for binary classification tasks.
Data Integration of heterogeneous data sources relies either on periodically transferring large amounts of data to a physical Data Warehouse or retrieving data from the sources on request only. The latter results in the creation of what is referred to as a virtual Data Warehouse, which is preferable when the use of the latest data is paramount. However, the downside is that it adds network traffic and suffers from performance degradation when the amount of data is high. In this paper, we propose the use of a readCheck validator to ensure the timeliness of the queried data and reduced data traffic. It is further shown that the readCheck allows transactions to update data in the data sources obeying full Atomicity, Consistency, Isolation, and Durability (ACID) properties.
This paper reviews suggestions for changes to database technology coming from the work of many researchers, particularly those working with evolving big data. We discuss new approaches to remote data access and standards that better provide for durability and auditability in settings including business and scientific computing. We propose ways in which the language standards could evolve, with proof-of-concept implementations on Github.
In this presentation the audience will be: (a) introduced to the aims and objectives of the DBTechNet initiative, (b) briefed on the DBTech EXT virtual laboratory workshops (VLW), i.e. the educational and training (E&T) content which is freely available over the internet and includes vendor-neutral hands-on laboratory training sessions on key database technology topics, and (c) informed on some of the practical problems encountered and the way they have been addressed. Last but not least, the audience will be invited to consider incorporating some or all of the DBTech EXT VLW content into their higher education (HE), vocational education and training (VET), and/or lifelong learning/training type course curricula. This will come at no cost and no commitment on behalf of the teacher/trainer; the latter is only expected to provide his/her feedback on the pedagogical value and the quality of the E&T content received/used.
"Learning by doing" in Higher Education in technical disciplines is mostly realized by hands-on labs. It challenges the exploratory aptitude and curiosity of a person. But, exploratory learning is hindered by technical situations that are not easy to establish and to verify. Technical skills are, however, mandatory for employees in this area. On the other side, theoretical concepts are often compromised by commercial products. The challenge is to contrast and reconcile theory with practice. Another challenge is to implement a self-assessment and grading scheme that keeps up with the scalability of e-learning courses. In addition, it should allow the use of different commercial products in the labs and still grade the assignment results automatically in a uniform way. In two European Union funded projects we designed, implemented, and evaluated a unique e-learning reference model, which realizes a modularized teaching concept that provides easily reproducible virtual hands-on labs. The novelty of the approach is to use software products of industrial relevance to compare with theory and to contrast different implementations. In a sample case study, we demonstrate the automated assessment for the creative database modeling and design task. Pilot applications in several European countries demonstrated that the participants gained highly sustainable competences that improved their attractiveness for employment.
The recent years and especially the Internet have changed the way on how data is stored. We now often store data together with its creation time-stamp. These data sequences potentially enable us to track the change of data over time. This is quite interesting, especially in the e-commerce area, in which classification of a sequence of customer actions, is still a challenging task for data miners. However, before Standard algorithms such as Decision Trees, Neuronal Nets, Naive Bayes or Bayesian Belief Networks can be applied on sequential data, preparations need to be done in order to capture the information stored within the sequences. Therefore, this work presents a systematic approach on how to reveal sequence patterns among data and how to construct powerful features out of the primitive sequence attributes. This is achieved by sequence aggregation and the incorporation of time dimension into the Feature construction step. The proposed algorithm is described in detail and applied on a real life data set, which demonstrates the ability of the proposed algorithm to boost the classification performance of well known data mining algorithms for classification tasks.
Online credit card fraud presents a significant challenge in the field of eCommerce. In 2012 alone, the total loss due to credit card fraud in the US amounted to $ 54 billion. Especially online games merchants have difficulties applying standard fraud detection algorithms to achieve timely and accurate detection. This paper describes the Special constrains of this domain and highlights the reasons why conventional algorithms are not quite effective to deal with this problem. Our suggested solution for the problem originates from the fields of feature construction joined with the field of temporal sequence data mining. We present Feature construction techniques, which are able to create discriminative features based on a sequence of transaction and are able to incorporate the time into the classification process. In addition to that, a framework is presented that allows for an automated and adaptive change of features in case the underlying pattern is changing.
Recent work on database application development platforms has sought to include a declarative formulation of a conceptual data model in the application code, using annotations or attributes. Some recent work has used metadata to include the details of such formulations in the physical database, and this approach brings significant advantages in that the model can be enforced across a range of applications for a single database. In previous work, we have discussed the advantages for enterprise integration of typed graph data models (TGM), which can play a similar role in graphical databases, leveraging the existing support for the unified modelling language UML. Ideally, the integration of systems designed with different models, for example, graphical and relational database, should also be supported. In this work, we implement this approach, using metadata in a relational database management system (DBMS).
Modern web-based applications are often built as multi-tier architecture using persistence middleware. Middleware technology providers recommend the use of Optimistic Concurrency Control (OCC) mechanism to avoid the risk of blocked resources. However, most vendors of relational database management systems implement only locking schemes for concurrency control. As consequence a kind of OCC has to be implemented at client or middleware side.
A simple Row Version Verification (RVV) mechanism has been proposed to implement an OCC at client side. For performance reasons the middleware uses buffers (cache) of its own to avoid network traffic and possible disk I/O. This caching however complicates the use of RVV because the data in the middleware cache may be stale (outdated). We investigate various data access technologies, including the new Java Persistence API (JPA) and Microsoft’s LINQ technologies for their ability to use the RVV programming discipline.
The use of persistence middleware that tries to relieve the programmer from the low level transaction programming turns out to even complicate the situation in some cases.Programmed examples show how to use SQL data access patterns to solve the problem.
Business processes are important knowledge resources of a company. The knowledge contained in business processes impart procedures used to create products and services. However, modelling and application of business processes are affected by problems connected to knowledge transfer. This paper presents and implements a layered model to improve the knowledge transfer. Thus modelling and understanding of business process models is supported. An evaluation of the approach is presented and results and other areas of application are discussed.
Schema and data integration have been a challenge for more than 40 years. While data warehouse technologies are quite a success story, there is still a lack of information integration methods, especially if the data sources are based on different data models or do not have a schema. Enterprise Information Integration has to deal with heterogeneous data sources and requires up-to-date high-quality information to provide a reliable basis for analysis and decision-making. The paper proposes virtual integration using the Typed Graph Model to support schema mediation. The integration process first converts the structure of each source into a typed graph schema, which is then matched to the mediated schema. Mapping rules define transformations between the schemata to reconcile semantics. The mapping can be visually validated by experts. It provides indicators and rules to achieve a consistent schema mapping, which leads to high data integrity and quality.
Learning and teaching requires the transfer of knowledge from one person to another. Due to the relevance of knowledge many models have been developed for knowledge transfer. However, the process of knowledge transfer has not yet been described completely and the approaches are too vague to facilitate its implementation. This paper contributes to a better understanding of knowledge transfer to support knowledge transfer in teaching. To address this challenge, we depict a layered model for knowledge transfer. The model structures the transfer in several steps and thus identifies major influencing factors. The paper describes the knowledge transfer from one person to another step by step. An example in the area of teaching business process management illuminates the process. The main contribution of this paper is the development of a layered model and its application in teaching.
This paper presents a concurrency control mechanism that does not follow a ‘one concurrency control mechanism fits all needs’ strategy. With the presented mechanism a transaction runs under several concurrency control mechanisms and the appropriate one is chosen based on the accessed data. For this purpose, the data is divided into four classes based on its access type and usage (semantics). Class O (the optimistic class) implements a first-committer-wins strategy, class R (the reconciliation class) implements a first-n-committers-win strategy, class P (the pessimistic class) implements a first reader-wins strategy, and class E (the escrow class) implements a firsnreaderswin strategy. Accordingly, the model is called OjRjPjE. Under this model the TPC-C benchmark outperforms other CC mechanisms like optimistic Snapshot Isolation.
This paper presents a concurrency control mechanism that does not follow a "one concurrency control mechanism fits all needs" strategy. With the presented mechanism a transaction runs under several concurrency control mechanisms and the appropriate one is chosen based on the accessed data. For this purpose, the data is divided into four classes based on its access type and usage (semantics). Class O (the optimistic class) implements a first-committer-wins strategy, class R (the reconciliation class) implements a first-n-committers-win strategy, class P (the pessimistic class) implements a first-reader-wins strategy, and class E (the escrow class) implements a first-n-readers-win strategy. Accordingly, the model is called OjRjPjE. The selected concurrency control mechanism may be automatically adapted at run-time according to the current load or a known usage profile. This run-time adaptation allows OjRjPjE to balance the commit rate and the response time even under changing conditions. OjRjPjE outperforms the Snapshot Isolation concurrency control in terms of response time by a factor of approximately 4.5 under heavy transactional load (4000 concurrent transactions). As consequence, the degree of concurrency is 3.2 times higher.
At DBKDA 2019, we demonstrated that StrongDBMS with simple but rigorous optimistic algorithms, provides better performance in situations of high concurrency than major commercial database management systems (DBMS). The demonstration was convincing but the reasons for its success were not fully analysed. There is a brief account of the results below. In this short contribution, we wish to discuss the reasons for the results. The analysis leads to a strong criticism of all DBMS algorithms based on locking, and based on these results, it is not fanciful to suggest that it is time to re-engineer existing DBMS.
When forecasting sales figures, not only the sales history but also the future price of a product will influence the sales quantity. At first sight, multivariate time series seem to be the appropriate model for this task. Nontheless, in real life history is not always repeatable, i.e. in the case of sales history there is only one price for a product at a given time. This complicates the design of a multivariate time series. However, for some seasonal or perishable products the price is rather a function of the expiration date than of the sales history. This additional information can help to design a more accurate and causal time series model. The proposed solution uses an univariate time series model but takes the price of a product as a parameter that influences systematically the prediction. The price influence is computed based on historical sales data using correlation analysis and adjustable price ranges to identify products with comparable history. Compared to other techniques this novel approach is easy to compute and allows to preset the price parameter for predictions and simulations. Tests with data from the Data Mining Cup 2012 demonstrate better results than established sophisticated time series methods.
The typed graph model
(2020)
In recent years, the Graph Model has become increasingly popular, especially in the application domain of social networks. The model has been semantically augmented with properties and labels attached to the graph elements. It is difficult to ensure data quality for the properties and the data structure because the model does not need a schema. In this paper, we propose a schema bound Typed Graph Model with properties and labels. These enhancements improve not only data quality but also the quality of graph analysis. The power of this model is provided by using hyper-nodes and hyper edges, which allows to present a data structure on different abstraction levels. We demonstrate by example the superiority of this model over the property graph data model of Hidders and other prevalent data models, namely the relational, object-oriented, and XML model.
In recent years, the Graph Model has become increasingly popular, especially in the application domain of social networks. The model has been semantically augmented with properties and labels attached to the graph elements. It is difficult to ensure data quality for the properties and the data structure because the model does not need a schema. In this paper, we propose a schema bound Typed Graph Model with properties and labels. These enhancements improve not only data quality but also the quality of graph analysis. The power of this model is provided by using hyper-nodes and hyper-edges, which allows to present data structures on different abstraction levels. We prove that the model is at least equivalent in expressive power to most popular data models. Therefore, it can be used as a supermodel for model management and data integration. We illustrate by example the superiority of this model over the property graph data model of Hidders and other prevalent data models, namely the relational, object-oriented, XML model, and RDF Schema.
When forecasting sales figures, not only the sales history but also the future price of a product will influence the sales quantity. At first sight, multivariate time series seem to be the appropriate model for this task. Nonetheless, in real life history is not always repeatable, i.e., in the case of sales history there is only one price for a product at a given time. This complicates the design of a multivariate time series. However, for some seasonal or perishable products the price is rather a function of the expiration date than of the sales history. This additional information can help to design a more accurate and causal time series model. The proposed solution uses an univariate time series model but takes the price of a product as a parameter that influences systematically the prediction based on a calculated periodicity. The price influence is computed based on historical sales data using correlation analysis and adjustable price ranges to identify products with comparable history. The periodicity is calculated based on a novel approach that is based on data folding and Pearson Correlation. Compared to other techniques this approach is easy to compute and allows to preset the price parameter for predictions and simulations. Tests with data from the Data Mining Cup 2012 as well as artificial data demonstrate better results than established sophisticated time series methods.
Transaction processing is of growing importance for mobile computing. Booking tickets, flight reservation, banking, ePayment, and booking holiday arrangements are just a few examples for mobile transactions. Due to temporarily disconnected situations the synchronisation and consistent transaction processing are key issues. Serializability is a too strong criteria for correctness when the semantics of a transaction is known. We introduce a transaction model that allows higher concurrency for a certain class of transactions defined by its semantic. The transaction results are ”escrow serializable” and the synchronisation mechanism is non-blocking. Experimental implementation showed higher concurrency, transaction throughput, and less resources used than common locking or optimistic protocols.
A sequence of transactions represents a complex and multi dimensional type of data. Feature construction can be used to reduce the data´s dimensionality to find behavioural patterns within such sequences. The patterns can be expressed using the blue prints of the constructed relevant features. These blue prints can then be used for real time classification on other sequences.