A software engineer website

Homelab: Monitoring Withings

Gautier DI FOLCO September 24, 2023 [ops] #ops #development #monitoring #lgtm-stack

I have a simple morning routing starting with:

Each Sunday, I take additional measurements, and I log them and my week's average week weight in a multiple-years LibreOffice Calc (don't judge...).

For the record, I was using Fitbit until Google acquired it (and degraded it: 1, 2, 3, etc.).

The thing is, there's no way to get week average weight, so, after few months of manually making averages, I dirtily pushed withings-weights.

To illustrate my journey through monitoring (especially alerting), I would like to add some metrics and alert (as I would do for business systems).

I have found prometheus-client on hackage which is a great fit (and prometheus-metrics-ghc, servant-prometheus, wai-middleware-prometheus).

Let's start with generic code:

import Network.Wai.Middleware.Prometheus
import Prometheus
import Prometheus.Metric.GHC
import Prometheus.Servant

main :: IO ()
main = do
  -- ...
  register ghcMetrics

  let servantPMW = prometheusMiddleware defaultMetrics $ Proxy @API
  run serverEnv.serverPort $ prometheus def $ servantPMW $ app serverEnv oauthEnv info

We can highlight three parts in the above snippet:

Then we can define few metrics:

data WithingsMetrics = WithingsMetrics
  { lastChecked :: Vector Text Gauge,
    lastWeight :: Vector Text Gauge,
    users :: Counter

There are few kind of metrics:

all of them being floating point numbers (Double).

Vector a m are partitioned metrics (m) or labels (a).

Let's instantiate them:

main :: IO ()
main = do
  -- ...
  register ghcMetrics
  metrics <-
      <$> register (vector "username" $ gauge (Info "withings_last_checked" "Last time a User checked his/her stats"))
      <*> register (vector "username" $ gauge (Info "withings_last_weight" "Last User weight"))
      <*> register (counter (Info "withings_users" "Users count"))

  addCounter metrics.users . fromIntegral . length =<< runHandler (listUsers info)

  let servantPMW = prometheusMiddleware defaultMetrics $ Proxy @API
  run serverEnv.serverPort $ prometheus def $ servantPMW $ app serverEnv oauthEnv info metrics

We can notice:

Finally, we used addCounter to pre-populate user count at startup.

We can continue with endpoint instrumentation:

retrieveOauthHandler :: OauthEnv -> UsersInfo -> WithingsMetrics -> Text -> Text -> Handler Text
retrieveOauthHandler env info metrics code state = do
  user <- fetchOauthTokens env info code state
  liftIO $ incCounter metrics.users
  -- ...

statsHandler :: OauthEnv -> UsersInfo -> WithingsMetrics -> UserName -> Handler Text
statsHandler env info metrics user = do
  groupedWeights <- fetchStats $ withOauthBearer env info user
  let fromFixed :: (Fractional a, HasResolution b) => Fixed b -> a
      fromFixed fv@(MkFixed v) = (fromIntegral v) / (fromIntegral $ resolution fv)
  nowGaugeValue <- fromFixed . nominalDiffTimeToSeconds . utcTimeToPOSIXSeconds <$> liftIO getCurrentTime
  liftIO $
    withLabel metrics.lastChecked user.getUserName $ \metric ->
      setGauge metric nowGaugeValue
  -- ...

So far so good, if we curl the /metrics endpoint:

# HELP withings_users Users count
# TYPE withings_users counter
withings_users 1.0
# HELP withings_last_checked Last time a User checked his/her stats
# TYPE withings_last_checked gauge
withings_last_checked{username="Gautier"} 1.6955709670325263e9
# HELP withings_last_weight Last User weight
# TYPE withings_last_weight gauge
withings_last_weight{username="Gautier"} 67.34

Finally we can set up some rules:

  alert = "WeightLogForgotten";
  for = "0m";
  expr = ''time() - withings_last_checked{username="Gautier"} > 691200'';
  labels.severity = "info";
  annotations.summary = ''Info: logging weights should be done (last time was > 8 days ago)'';
  alert = "WeightUnder";
  for = "0m";
  expr = ''withings_last_weight{username="Gautier"} < 66.5'';
  labels.severity = "warning";
  annotations.summary = ''Warning: you have lost too much weight (< 66.5 kg), call your nutritionist'';
  alert = "WeightSignup";
  for = "0m";
  expr = ''withings_users > 1'';
  labels.severity = "critical";
  annotations.summary = ''Critical: someone signed up, you have been breached'';

So far so good, on a real project, I would create an associated dashboard and (probably) share it with stakeholders.

Back to top