[cf-dev] Richer health-checks for CF apps: request for use cases

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view

[cf-dev] Richer health-checks for CF apps: request for use cases

Eric Malm
Dear CF Community,

CF has long had a notion of health-checking app instances as they start up to determine whether they're in a functional state, on top of the process simply having started. On the DEAs, the health-check behavior is coupled to whether the app has routes mapped to it, and for apps targeting the Diego backend, this health-check specification is independent of the routing configuration on the app. On Diego cells, the health check is also run periodically[1] even after the app is started, to verify the health of the instance continually.

With that independence, we now would have more flexibility to specify richer health checks for CF app instances. We on the CAPI and Diego teams would like to know what kinds of health checks you would find useful for your apps (either ones serving web traffic, or ones doing background work). The two types of health check currently available are 'port', which checks that a TCP connection can be made to the app instance on the port specified by the PORT env var, and 'none', which despite the name does continually verify that the process invoked in the container is still running.

As a starting point, on a recent cf-dev thread[2], we identified that for an HTTP-based health check, it would be useful to specify an endpoint to hit, an acceptable response status code or codes, and a timeout to apply to the request. Sensible defaults could be "/", 200 OK, and 1 second, respectively.

In any case, please comment here with your health-check use cases, and we intend to use them as input to a proposal soon.

Thanks very much,
Eric, CF Runtime Diego PM

Reply | Threaded
Open this post in threaded view

Re: [cf-dev] Richer health-checks for CF apps: request for use cases

Just as a reference you could look at some of the connection tests that Monit allows:


Obviously there are quite a few there so it might go well beyond what's reasonable for container health checking.

I think to meet our use cases the addition of the HTTP check already mentioned would be sufficient but to add to it, I could imagine that it might be useful to be able to specify a regular expression to search for in the returned HTML instead of or in addition to the status code.

Also, since you guys are expanding into offer TCP routing for containers, a generic TCP monitor that looked for a specific regular expression in the returned data might be useful, which might also require specifying data to send to trigger a response.