Data Distribution Shifts & Monitoring
Causes of the ML System failures
operational expectation violation & ML performance expectation violation
Software system failures
- dependency failure
- deployment failure
- hardware failure
- downtime / crashing
ML-Specific failures
- production data + training data are not of the same distribution
- edge cases
- degenerate feedback loops : can be the exposure biais, result from user feedback. Can cause a system to be more homogeneous over time.
- use randomization to correct
- use positional feature
Data Distribution Shifts
data distribution changing over time train : source distribution inference : target distribution
| according to bayes : $P(X \cap Y)$ = $P(Y | X)P(X)$ |
| - covariance shift : $P(X)$ changes but not $P(Y | X)$ (input dist changes but not pred’s) |
| - label shift : $P(Y)$ changes but not $P(X | Y)$ |
| - concept drift : $P(Y | X)$ changes but not $P(X)$ |
other general data distribution drifts:
- feature change
- label schema change
data shift is an issue only if it causes the model’s performance to degrade, you have to :
- monitor accuracy related metrics
-
monitor $P(X)$, $P(Y)$ and conditional distributions $P(X Y)$ and $P(Y X)$ - use statistics such as min, max, var, mean, etc… of the distribution but they are not enough -> two sample-test
shifts ca, happen across two dimensions : spatial or temporal -> to detect temporal shifts, a common approach is to threat input data as time-series data, cumulative vs sliding statistics
Monitoring & Observating
monitoring : act of tracking, measuring and logging different metrics observability: setting up the system to be monitored
operational metrics (SYSTEM):
- network
- machine
- application
(SLOs) or Service Level Agrements (SLAs), e.g is up if median latency < 200ms & 99th percentil < 2s
ML Specific Metrics
- row inputs
- features : check expectations to detect shifts in distribution
- prediction : can detect shifts (since low-dim -> two sample test)
- accuracy : can be feedback
Great Expectation & Deequ by AWS for feature monitoring
Can also check data shift with two-sample test
Monitoring Toolbox
- metrics logs traces from monitoring dev POV
- logs dashboard alerts for monitoring users
- for microservice architecture : ==distributed tracing==, where each process is given an unique ID in the logs + all metadata.
KSQL or FlinkSQL for streaming data