-
Notifications
You must be signed in to change notification settings - Fork 73
connect only to primary host for load balancer scenario #999 #1001
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
...gic-client-api/src/main/java/com/marklogic/client/datamovement/HostAvailabilityListener.java
Show resolved
Hide resolved
@ehennum , So, in case where a MarkLogic host is down, the user will still have to remove 'HostAvailabilityListener' and 'NoResponseListener' and add their custom listener based on what exception they get with their LoadBalancer ? |
@srinathgit, no, it should no longer be necessary to remove HostAvailabilityListener or NoResponseListener. At least, that's the intent of the change to the getHostUnavailableExceptions() method: The opposite: to detect a failure of the load balancer, adopters will need to add a custom listener for the specific errors generated by their preferred load balancer. Possibly, as a failsafe in the primary host case, the batcher should keep a count of the number of retry efforts on the job and fail if it exceeds a limit. |
@ehennum , In a 3 node AWS cluster with an ALB, I ran a write batcher job , during the job if MarkLogic instance is shutdown (to force a forest failover ) and a request is sent to that host, the "FailedRequestException" is thrown which is not part of the "hostUnavailableExceptions" list. There is a method "withHostUnavailableExceptions" where we can specify the list of "host unavailable exceptions" however "FailedRequestException" can't be added to the list as it is too generic and could be thrown for other valid reasons. So, in this scenario neither the job gets shut down nor the failed batches retried.
In an on-prem 3 node MarkLogic cluster, during the same job, if a MarkLogic instance is shutdown , java.net.SocketException is thrown which is part of the "hostUnavailableExceptions" list.
So, if "FailedRequestException" is going to be thrown, shouldn't a CustomHostAvailabilityListener be used that checks for both "FailedRequestException" as well as the exception message so that legitimate failed requests are not retried ? |
@srinathgit , thanks for doing the comparison. It sounds like the Java Client API receives a different HTTP response in the ALB scenario. If so, can you report the difference in the HTTP response that causes the same server event to throw a FailedRequestException instead of a SocketException? The only change in the HostAvailabilityListener behavior in the primary host case is that it skips the host refresh that culminates in the invalidation of retry on this line: Line 299 in 431c1a2
|
@ehennum , I printed the HTTP response for the ALB scenario and after the instance is stopped, the following is the response I get : Response{protocol=http/1.1, code=502, message=Bad Gateway, url=http://srinath-q-elasticl-a1rzued8pwa4-882617851.us-east-2.elb.amazonaws.com:8008/v1/documents} Response body:
|
@ehennum , I spoke with @mattsunsjf ,
|
@srinathgit , thanks for digging deeper. It's useful to know that no pass through configuration is possible but that redirection from a downed instance might be possible. Does the load balancer still send 502 errors only when the Security db is unavailable or in all cases? That is, if all forests have failed over correctly, does the load balancer send different errors that would reliably indicate a retry condition for the client? |
To create a DataMovementManager for use with a load balancer, the factory call should be:
No FilteredForestConfiguration can be used when connecting only to the primary host. Thus, no whitelist should be possible or needed. The builtin HostAvailabilityListener and NoResponseListener should work in failover situations.
Please consider whether the design changes are usable and test in an ALB environment.
If the changes are good, please merge into 4.1.1-develop.