Scalable Computation of High-Order Optimization Queries
Overview with simple words
The paper proposes a solution for finding a set of tuples that collectively satisfy certain group constraints. The set of tuples could be filtered initially using the traditional DBMS predicates. The group constraints limits the aggregate value of a group of tuples. It finds the solution by turning the question into an optimization problem and solved by a black-box solver. Additionally, the paper simplifies the optimization problem by finding out a few representative tuples of the base relation, and first construct a solution based on them. It then iteratively substitute the representative tuples by the real tuples in a smaller optimization problem.
Details
Construction of a package query
- Basic package query: select relevant attributes from the desired relation
- Repetition constraints: limits the number of times one tuple appears in the package
- Predicates
- Base predicates: individual predicate on each tuple, filtered by the DBMS
- Global predicates: constraints on the group of tuples, solved by ILP solver
- Objective Clause: ranking of packages, either minimize or maximize
Query→ILP using Direct
Each tuple in the base relation is assigned a variable , that indicates the number of times it could appear in the package. for tuples that does not satisfy the base predicates. It then expresses the global predicates as inequalities. Finally, we minimize/maximize the objective clause under the previous constraints.
SketchRefine
Direct is too slow and is not scalable, also, the optimization problem is hard to solve, need to decompose into smaller problems.
- SketchRefine
- Offline partitioning
- Partition the base relation into groups, and find the representative tuple for each group
- restricts the group size; larger → larger group size → less groups
- defines the greatest distance between any two tuples in a group on attribute ; smaller → better approximation
- Partitioning dataset on the union of all attributes
- Sketch
- Solve a Sketch Query where the base relation is the group of representative tuples generated in offline partitioning
- Produces the sketch package
- Refine
- Refine the sketch package one group at a time
- Eliminate the representative tuple from one group
- Solve a refine query that finds the real tuples to substitute for the same group
- Greedy Backtracking if unsolvable
- Offline partitioning
Inspiration
- K-Means Clustering or Expectation-Maximization algorithm for clustering
- Maybe use the representative samples to approximate query answers
- Similar to index, build partitioning on certain attributes in advance
Pros
- PaQL: an extension of SQL that answers high-order query, easy to understand
- SketchRefine decomposes optimization problem into smaller subproblems
- Empoy Greedy Backtracking to modify answer to be feasible
- Mostly executed under 10s
Cons
May have false infeasible query answer