Externally indexed torrent
If you are the original uploader, contact staff to have it moved to your account
Textbook in PDF format
With the unprecedented rate at which data is being collected today in almost all fields of human endeavor, there is an emerging economic and scientific need to extract useful information from it. For example, many companies already have data-warehouses in the terabyte range (e.g., FedEx, Walmart). The World Wide Web has an estimated 800 million web-pages. Similarly, scientific data is reaching gigantic proportions (e.g., NASA space missions, Human Genome Project). High-performance, scalable, parallel, and distributed computing is crucial for ensuring system scalability and interactivity as datasets continue to grow in size and complexity.
To address this need we organized the workshop on Large-Scale Parallel KDD Systems, which was held in conjunction with the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, on August 15th, 1999, San Diego, California. The goal of this workshop was to bring researchers and practitioners together in a setting where they could discuss the design, implementation, and deployment of large-scale parallel knowledge discovery (PKD) systems, which can manipulate data taken from very large enterprise or scientific databases, regardless of whether the data is located centrally or is globally distributed. Relevant topics identified for the workshop included:
How to develop a rapid-response, scalable, and parallel knowledge discovery system that supports global organizations with terabytes of data.
How to address some of the challenges facing current state-of-the-art data mining tools. These challenges include relieving the user from time and volume constrained tool-sets, evolving knowledge stores with new knowledge effectively, acquiring data elements from heterogeneous sources such as the Web or other repositories, and enhancing the PKD process by incrementally updating the knowledge stores.
How to leverage high performance parallel and distributed techniques in all the phases of KDD, such as initial data selection, cleaning and preprocessing, transformation, data-mining task and algorithm selection and its application, pattern evaluation, management of discovered knowledge, and providing tight coupling between the mining engine and database/file server.
How to facilitate user interaction and usability, allowing the representation of domain knowledge, and to maximize understanding during and after the process. That is, how to build an adaptable knowledge engine which supports business decisions, product creation and evolution, and leverages information into usable or actionable knowledge.
Parallel and Distributed Data Mining: An Introduction
Mining Frameworks
The Integrated Delivery of Large-Scale Data Mining: The ACSys Data Mining Project
A High Performance Implementation of the Data Space Transfer Protocol (DSTP)
Active Mining in a Distributed Setting
Associations and Sequences
Efficient Parallel Algorithms for Mining Associations
Parallel Branch-and-Bound Graph Search for Correlated Association Rules
Parallel Generalized Association Rule Mining on Large Scale PC Cluster
Parallel Sequence Mining on Shared-Memory Machines
Classification
Parallel Predictor Generation
Efficient Parallel Classification Using Dimensional Aggregates
Learning Rules from Distributed Data
Clustering
Collective, Hierarchical Clustering from Distributed, Heterogeneous Data
A Data-Clustering Algorithm On Distributed Memory Multiprocessors