So we have Java microservices written with Spring-Boot, using Consul for service discovery and config management and running in Docker containers. All of it is working, but when a container dies or a service restarts the old service-id never goes away in Consul and the service forever after shows as "Failing" in the Consul UI, even though the new container has registered and shows all Green.
We are not using heartbeat - but I cannot find much documentation on what the difference between heartbeat and healthcheck are for Consul.
Here's my bootstrp.yml
spring:
application:
name: my-service
cloud:
config:
enabled: false
consul:
host: ${discovery.host:localhost}
port: ${discovery.port:8500}
config:
watch:
wait-time: 30
delay: 10000
profile-separator: "-"
format: FILES
discovery:
prefer-ip-address: true
instanceId: ${spring.application.name}:${spring.application.instance_id:${random.value}}
There are other settings to enable heartbeat, but the docs say something about this putting more stress on the Consul cluster.
Has anyone managed to get Consul and Spring Boot/Docker services to actually de-register automatically? It actually doesn't cause any real problems, but it makes the Consul UI pretty useless to actually monitor for up/down services.
You have mentioned you are using a docker container to run the microservice. Are you trapping the SIGTERM in your entrypoint script in docker container ? If a SIGTERM is sent, the boot application will get it and you will see the below log showing that the microservice is deregistering with Consul.
2017-04-27 09:20:19.854 INFO 6852 --- [on(6)-127.0.0.1] inMXBeanRegistrar$SpringApplicationAdmin : Application shutdown requested.
2017-04-27 09:20:19.857 INFO 6852 --- [on(6)-127.0.0.1] ationConfigEmbeddedWebApplicationContext : Closing org.springframework.boot.context.embedded.AnnotationConfigEmbeddedWebApplicationContext@afb5821: startup date [Thu Apr 27 09:20:00 EDT 2017]; parent: org.springframework.context.annotation.AnnotationConfigApplicationContext@130c12b7
2017-04-27 09:20:19.859 INFO 6852 --- [on(6)-127.0.0.1] o.s.c.support.DefaultLifecycleProcessor : Stopping beans in phase 2147483647
2017-04-27 09:20:19.863 INFO 6852 --- [on(6)-127.0.0.1] o.s.c.support.DefaultLifecycleProcessor : Stopping beans in phase 0
2017-04-27 09:20:19.863 INFO 6852 --- [on(6)-127.0.0.1] o.s.c.c.s.ConsulServiceRegistry : Deregistering service with consul: xxxxxxxxxxxxx
Consul doesn't automatically deregister services.
See https://groups.google.com/forum/#!topic/consul-tool/slV5xfWRpEE for the hint about the same question. According to that thread you need to either update the config or perform an Agent API call. Since the agent is the source of truth, you shouldn't try to update via Catalog API. See GitHub for details. They also mention at the Google group that you don't necessarily have to deregister services if the node goes down gracefully, but that doesn't seem to be your use case.
Please have a look at Consul not deregistering zombie services for hints about automating the service de-registration using either the api or tools like registrator.