Data Distribution Shifts & Monitoring


Causes of the ML System failures

operational expectation violation & ML performance expectation violation

Software system failures

according to bayes : $P(X \cap Y)$ = $P(Y X)P(X)$
- covariance shift : $P(X)$ changes but not $P(Y X)$ (input dist changes but not pred’s)
- label shift : $P(Y)$ changes but not $P(X Y)$
- concept drift : $P(Y X)$ changes but not $P(X)$

other general data distribution drifts:

data shift is an issue only if it causes the model’s performance to degrade, you have to :

shifts ca, happen across two dimensions : spatial or temporal -> to detect temporal shifts, a common approach is to threat input data as time-series data, cumulative vs sliding statistics

Monitoring & Observating

monitoring : act of tracking, measuring and logging different metrics observability: setting up the system to be monitored

operational metrics (SYSTEM):

Great Expectation & Deequ by AWS for feature monitoring

Can also check data shift with two-sample test

Monitoring Toolbox

KSQL or FlinkSQL for streaming data