UCC PAC verifier#738
Draft
p-senichenkov wants to merge 52 commits into
Draft
Conversation
974dde8 to
58eacd4
Compare
3cab755 to
a782fbe
Compare
b4dcaee to
330a13c
Compare
Change `using Destructor = std::function<void(std::byte*)>` to `using Destructor = std::function<void(std::byte const*)>` in model::Type, because a) all currently implemented destructors work with const pointer; b) GetDestructor() is not used anywhere, except for implementations of pac::model::IDomain; c) implementations of pac::model::IDomain work with const pointers.
Implement TupleType class to calculate distances between fixed-sized tuples of values. Implement IDomain class to represent an ordered domain in a metric space of values. Implement default domain classes: Ball and Parallelepiped. Implement PAC and DomainPAC classes.
Add option names and descriptions required by PAC verifier
Implement base class for PAC-Man based PAC verifiers
Implement verifier for Domain Probabilistic Approximate Constraints
Implement unit tests for Domain PAC veririfier.
Implement Python bindigns for Domain PAC verifier
Implement Python examples for Domain PAC verifier
Move "validation mode" from FindEpsilonDelta into a separate function
330a13c to
54f0918
Compare
14ab028 to
e7a6aed
Compare
Move min_delta and max_delta options to concrete PAC type verifiers, since some algorithms need only min_delta option, whereas others need max_delta
Implement verifier for Functional Dependency Probabilistic Approximate Constraints
e7a6aed to
44ba569
Compare
Implement verification algorithm for Unique Column Combination Probabilistic Approximate Constraints
Implement tests for UCC PAC verification algorithm
44ba569 to
d3fb8d5
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Probabilistic Approximate Constraint is a primitve intoduced in "Checks and Balances: Monitoring Data Quality Problems in Network Traffic Databases" by Flip Korn et al. PAC with parameters epsilon and delta means that Probability that approximate constraint with error=epsilon holds is greater than delta.
Implement UCC PAC verifier.
An UCC PAC on column set X means that Pr(dist(t_i[A_l], t_j[A_l]) <= eps) <= delta
(note that this definition differs a little from the one given in "Checks and Balances")
Add UCC PAC verifier bindings and examples.
A script to build ECDF and visually validate the results given by algorithm (and, maybe, better understand the PAC-Man algorithm) can be found here (note: there are different links in three PAC PRs).