Distributed data processing just got a massive boost with bloomfilter-powered MapReduce
The article explores how to make joining data sets in big computer systems faster using something called a bloomfilter. The scientists came up with three ways to set up the bloomfilter for big data sets using a framework called MapReduce. They also created two methods for connecting two sets of data and one for connecting multiple sets at once. The tests showed that their methods could really speed things up when joining data sets. By understanding the costs of these methods, they found ways to make joining two sets or many sets work even better.