Skip to content

Commit d3fb8d5

Browse files
committed
Add UCC PAC examples
1 parent f0ebc08 commit d3fb8d5

4 files changed

Lines changed: 162 additions & 0 deletions

File tree

Lines changed: 66 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,66 @@
1+
import desbordante
2+
3+
from util import *
4+
5+
WILDFIRE_SENSORS = "examples/datasets/verifying_pac/wildfire_sensors.csv"
6+
7+
print(
8+
f"""This example illustrates the use of Unique Column Combination Probabilistic Approximate
9+
Constraints (UCC PACs). Given a column set X, a UCC PAC with parameters ε and δ specifies it is
10+
unlikely that more than one tuple exists with approximately the same key:
11+
{BOLD}Pr(dist(t₁[Aᵢ] - t₂[Aᵢ]) ≤ ε) ≤ δ for each Aᵢ ∈ X{ENDC}
12+
For more information consult "Checks and Balances: Monitoring Data Quality Problems in Network
13+
Traffic Databases" by Flip Korn et al (Proceedings of the 29th VLDB Conference, Berlin, 2003).
14+
15+
Consider a wildfire monitoring system based on a number of temperature sensors. The following table
16+
contains coordinates of sensors that recorded elevated temperatures over a given period of time:"""
17+
)
18+
print(f"{BOLD}{csv_to_str(WILDFIRE_SENSORS)}{ENDC}")
19+
20+
print(f"""
21+
When a single sensor raises an alarm, it does not necessarily indicate a wildfire. A sensor may have
22+
been heated by sunlight or a campfire, or there may have been polling or software errors. Instead,
23+
we should focus on situations in which several sensors raise alarms within a small geographic area.
24+
In addition, a small probability threshold should be considered, since nearby sensors may have been
25+
affected by the same non-wildfire heat source.
26+
27+
A good initial approximation is a distance threshold of 1 meter and a probability threshold of 10%.
28+
In UCC PAC terms, this means that the following UCC PAC should hold:
29+
{BOLD}Pr(|t₁[Latitude] - t₂[Longitude]| ≤ 1) ≤ 0.1{ENDC}
30+
31+
Let's run UCC PAC Verifier with the following parameters: column_indices={BLUE}[0, 1]{ENDC}.""")
32+
33+
algo = desbordante.pac_verification.algorithms.UCCPACVerifier()
34+
algo.load_data(table=(WILDFIRE_SENSORS, ",", True), column_indices=[0, 1])
35+
algo.execute()
36+
37+
pac = algo.get_pac()
38+
print(f"Algorithm result: {GREEN}{pac}{ENDC}.")
39+
40+
print(f"""
41+
This PAC indicates that only {pac.delta * 100:.2f}% of the alarmed sensors were relatively close to
42+
(distance no more than {pac.epsilon:.0f}). Therefore, it is unlikely that there is a wildfire.
43+
44+
We can gain additional insight by inspecting outliers (also called highlights) -- pairs of tuples
45+
for which the PAC does not hold for a given ε.
46+
47+
To determine which alarmed sensors are located particularly close to one another, let's examine the
48+
outliers for ε ∈ {BLUE}(0, {pac.epsilon:.0f}]{ENDC}.
49+
""")
50+
51+
# get_highlights takes two arguments: eps_1 and eps_2 and returns highlights in (eps_1, eps_2]
52+
# The default values are 0 and pac.epsilon
53+
highlights = algo.get_highlights()
54+
print(f"Outliers in (0, {pac.epsilon:.0f}]:")
55+
for tp in highlights.string_data:
56+
print(f"\t{tp[0]} {tp[1]}")
57+
58+
print(f"""
59+
These are pairs of tuples whose distances fall within the interval {BLUE}(0, {pac.epsilon:.0f}){ENDC}.
60+
You can find additional examples of outlier analysis in the Domain PAC examples. Note that, unlike
61+
Domain PACs, UCC PAC outliers are pairs of tuples rather than individual tuples. In practice,
62+
clustering these pairs before further processing may be useful.
63+
64+
Now that you are familiar with the basics of UCC PACs, you can continue with the second UCC PAC
65+
example: {CYAN}examples/basic/verifying_pac/verifying_domain_pac2.py{ENDC}. This example demonstrates
66+
the insights that can be gained by examining how δ depends on ε.""")
Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,24 @@
1+
# Пример -- в колабе. А здесь чисто логика, чтобы не ждать, пока замёржим алгоритм.
2+
# Если Вы видите этот файл в репозитории, значит миру пришёл конец, b ,jkmit ybxnj yt bcnbyyj
3+
4+
from matplotlib import pyplot as plt
5+
6+
import desbordante
7+
8+
TABLE = "examples/datasets/verifying_pac/laptop_score.csv"
9+
10+
algo = desbordante.pac_verification.algorithms.UCCPACVerifier()
11+
algo.load_data(table=(TABLE, ",", True), column_indices=[1, 2], delta_steps=int(1e6))
12+
13+
DELTA_STEPS = int(1e5)
14+
delta_step = 1 / DELTA_STEPS
15+
16+
deltas = [i * delta_step for i in range(DELTA_STEPS)]
17+
epsilons: list[float] = []
18+
for delta in deltas:
19+
algo.execute(max_epsilon=0, min_delta=delta)
20+
pac = algo.get_pac()
21+
epsilons.append(pac.epsilon)
22+
23+
plt.plot(epsilons, deltas)
24+
plt.show()
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
Model,Price,Performance score
2+
Novus Flex 14 ,505,2280
3+
HexaTech NeoBook 15 ,515,2410
4+
Velora Lite 13 ,528,2330
5+
HexaTech Spark 14 ,535,2520
6+
Novus Home 15 ,548,2440
7+
Velora Slim 14 ,555,2590
8+
Velora Lite 14 ,565,2490
9+
Velora Go 15 ,575,2680
10+
HexaTech Spark 15,588,2610
11+
Velora Go 14 ,598,2750
12+
Novus Air 15 ,522,2470
13+
Velora Lite 15 ,538,2380
14+
HexaTech CoreBook 15,548,2560
15+
HexaTech NeoBook 14 ,562,2480
16+
Novus Air 14 ,572,2670
17+
Novus Home 14 ,585,2580
18+
Novus Air 13 ,598,2760
19+
HexaTech NeoBook 13 ,612,2690
20+
Velora Edge 14 ,875,5250
21+
Novus Creator 15 ,888,5480
22+
Velora Vision 14 ,902,5360
23+
Aster Book Pro 14 ,915,5640
24+
Novus Studio 14 ,928,5510
25+
Aster Flow 14 ,942,5790
26+
Velora Fusion 15,955,5670
27+
Aster Core X14 ,968,5960
28+
Novus Ultra 14 ,982,5830
29+
Aster Flow 15 ,995,6120
30+
Aster Book S14 ,890,5570
31+
Novus Studio 15 ,905,5440
32+
Velora Edge 15 ,920,5730
33+
Velora Motion 16 ,935,5610
34+
Novus Pro 15 ,950,5900
35+
Velora Vision 15 ,965,5770
36+
Novus Pro 14 ,980,6060
37+
Aster Book S15 ,995,5930
38+
Velora Quantum Pro ,1375,9250
39+
HexaTech Predator 16 ,1390,9620
40+
Velora Infinity 18 ,1405,9410
41+
Velora Prime Studio ,1420,9890
42+
Titan Apex 17 ,1435,9670
43+
Aster Forge 16 ,1450,10140
44+
Titan Vector X17 ,1465,9920
45+
HexaTech Vulcan Pro,1480,10390
46+
Titan Forge X15 ,1495,10180
47+
Aster Elite X16 ,1510,10650
48+
HexaTech Vulcan X ,1388,9810
49+
Titan Forge X17 ,1402,9590
50+
Velora Prime 16 ,1418,10060
51+
Velora Quantum X ,1432,9840
52+
Titan Apex 18 ,1448,10320
53+
Aster Forge 17 ,1462,10110
54+
Titan Vector X18 ,1478,10580
55+
Aster Elite X14 ,1492,10370
56+
HexaTech Predator 17 ,1508,10840
57+
Aster TitanBook 18 ,1522,10630
58+
Titan Forge Ultra 15,1538,11110
59+
Titan Forge Ultra 16,1552,10890
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
Latitude,Longitude
2+
1,2
3+
1,5
4+
3,6
5+
3,1
6+
0,5
7+
5,4
8+
7,2
9+
2,8
10+
5,3
11+
8,4
12+
6,0
13+
1,-1

0 commit comments

Comments
 (0)