In BestPeer++, peers are organized in a balanced binary tree structured P2P overlay protocol called BATON.
The BestPeer++ network consists of peers owned by organizations, facilitated by a bootstrap server, which provides the functionality for performance monitoring, database configuration, and authenticating. The network can be accessed by both external users, which are outside the system, and internal users, which are in participant organizations, to issue queries.
Just-in-time (JIT) is an inventory strategy implemented to improve the return on investment of a business by reducing in-process inventory and its associated carrying costs. We use this term to emphasize that indexes in BestPeer++ are selectively built to reduce maintenance cost.
In business applications, it is impossible to fully index databases in the system due to the high maintenance cost. On the other hand, from the observation of search engines, users' queries are highly skewed. As a result, in BestPeer++ we use a special index technique: just-in-time indexing. Only the data that are frequently queried are indexed. In other words, the index in the network is adaptively tuned based on the query distribution, data distribution, update rates and other factors.
BestPeer++ employs a hybrid design for achieving high performance query processing. The major workload of a corporate network is simple, low-overhead queries. Such queries typically only involve querying a very small number of business partners and can be processed in short time. BestPeer++ is mainly optimized for these queries. For infrequent time-consuming analytical tasks, we provide an interface for exporting the data from BestPeer++ to Hadoop and allow users to analyze those data using MapReduce.
The messages sent between nodes in BestPeer++ are encrypted to increase the security level of the system. Furthermore, access to the data shared in BestPeer++ corporate network is controlled by a distributed role-based access control scheme to protect local data of each node from malicious users. Addtionally, Bestpeer supports multi-granularity acess control on data table. Access privileges on data table can be defined at row level and column level.
E3 is a programming framework for simplifying scalable heterogeneous data processing on large clusters. By introducing an Actor-like concurrent programming model, E3 is able to achieve easy programmability and excellent performance. Using E3, analysts simply run data analytical programs inside a set of actors and coordinate them for parallel execution by message passing. Currently, we have a working E3 core runtime system and MapReduce extensions for running MapReduce programs inside actors.