Massive data warehouses now accessible to all with efficient join synopsis maintenance
In the study, researchers developed a method called SJoin to efficiently track a small, representative sample of complex data queries in a dynamic database. They aimed to reduce the costs associated with analyzing large datasets, like those from business operations and IoT sensors, especially when using join operations on multiple tables. By using a weighted join graph index, SJoin can quickly update the sample when new data flows in, making it easier to build histograms or train models. Tests on different types of join queries showed that SJoin outperformed other methods, proving its effectiveness in managing data efficiently.