The Egyptian Plover: Rhymes with "over". An African waterbird that maintains a (dubious) symbiotic relationship with crocodies, feeding on decaying meat lodged between their teeth.
Automatically find, explain, and fix errors without rules.
Your organization's performance is limited by the quality of the data that underlies its learning and decision making. Redpoll Plover is the only AI data platform that bolts onto your data system to automatically find, explain, and mitigate bad data to help your organization perform at its peak potential.
What Plover can do:
Automatically identify erroneous values
Protect your dataset from future erroneous values
Identify the best data to collect for learning
How it works
1. Plover builds a model of your data
from plover.source import ImmutableSqlSource
from plover.store import AwsS3Store
from plover.backend import AwsBackend
from plover.engine import DatabaseEngine
conn = connect_to_sql_database()
db_engine = (
DatabaseEngine(
source=ImmutableSqlSource(conn, "SATELLITES_WH"),
store=AwsS3Store("myorg-plover", "SATELLITES_WH"),
backend=AwsBackend("myorg-plover"),
)
.fit()
.persist()
.metalearn()
.persist()
)
```
Holistic modeling and metalearning
Plover builds a holistic model of the world as defined by your data, then learns a metamodel to further refine its understanding of the information in your data.
2. Identify Likely Errors
top_5_errors = (
db_engine
.metrics
.errorness
.sort(by=['Confusion'], descending=True)
.head(5)
)
```
Row | Column | Confusion | Observed | Predicted |
---|---|---|---|---|
Intelsat 903 | Eccentricity | 10.514456 | 0.793069999999… | 0.000335826120… |
Intelsat 902 | Inclination_ra… | 10.036763 | 25.06467339 | 0.002274957131… |
Intelsat 903 | Apogee_km | 7.853968 | 358802 | 35792.92656399… |
DSP 20 (USA 14… | Period_minutes… | 6.695287 | 142.08 | 1436.095724064… |
SDS III-6 (Sat… | Source_Used_fo… | 4.93515 | JM/5_11 | ZARYA |
Inconsistency is key
Plover finds errors by identifying data that are inconsistent with its model of your data or cause confusion.
Plover also shows you the observed value and its predicted value, which you may use to overwrite erroneous values.
3. Find similar errors
errs_like = db_engine.errors_like(
"DSP 20 (USA 149) (Defense Support Program)",
"Period_minutes"
)
```
row | rowsim | Observed | Predicted |
---|---|---|---|
SDS III-7 (Sat… | 0.988281 | 23.94 | 1436.088006 |
SDS III-6 (Sat… | 0.964844 | 14.36 | 1436.113453 |
Advanced Orion… | 0.9453125 | 23.94 | 1436.105354 |
Metareasoning
After identifying an error, Plover can identify similar errors by finding data that are inconsistent or confusing in similar ways.
4. Identify errors in incoming data
err = db_engine.detect(new_satellite_record)
err[0]
```
Row | Column | Incon Quantile | Observed | Predicted |
---|---|---|---|---|
Satmex 8 | Users | 0.99998 | Commecial | Commercial |
Protect Prod
Compute how confusing or inconsistent data are before they make it into the database to protect production systems
5. Fill knowledge gaps
to_fill = db_engine.find_missing_to_fill(
to_help_predict="Purpose"
)
to_fill[0].show()
```
All (missing) data are not created equal
Plover can identify missing fields that are most likely to reduce uncertainty in specific predictions if filled in.
Machine learning + human learning + engineering
Baxter Eaves
CEO
Baxter is a US Navy veteran and holds a PhD in Experimental Psychology from the University of Louisville where he developed computational models of human trust and social learning. He has led a number of DARPA projects and brings 13 years of experience deploying human-inspired AI tech in high-risk industries.
Patrick Shafto
Scientist at large
Patrick is a program manager at DARPA under the Information Innovation office (I20) and professor of Data Sciences at Rutgers University - Newark. He has led a number of projects for agencies including DARPA, DOD, and NSF, and his publications have appeared in top journals of machine and human learning.
Michael Schmidt
Principal ML Engineer
Michael has 14 years of research and engineering experience. He has built production models for healthcare, agronomy, finance, and law; and has conducted research in the areas of high-energy physics, differential geometry, plasma physics, and high-performance computing.
For information cultivation
What it is not
There are many data quality and observabilty platforms out there and they all do one or more of the following:
-
focus on mechanical failures of your data architecture that can be completely avoided with solid database architecture i.e. you can do it yourself;
-
do anomaly/outlier detection by looking for distributional changes in single columns, which requires that a certain amount of bad data make it into prod in order to make the comparison and ignores the context of the data;
-
use error between predictions made by your machine learning models and the observed data as a measure of anomaly, which requires you to go through the effort of building an ML model that is more accurate than your data;
-
flatten your database, throwing out your carefully designed relational structure and biasing learning toward the entities that interact most with the database.
What it is
Plover focuses singly of improving the veracity of the values in your databases toward improving the information in your databases, and thus improving your ability to learn. By creating a model of your entire database (without flattening it) we are able to evaluate the erroneousness of each datum individually. This allows us to pluck out (or impute) individual bad data and protect prod without lag.
Get in touch
Provide your information here and we will reach out to you promptly.