Description
A. The following test was performed on a 3 node cluster (rh7v-intel64-90-test-4/5/6.marklogic.com), with a db associated with forests on each of the nodes.
B. The dmsdk job is started from a client machine and while loading is going on, client machine is disconnected from the servers (vpn is disconnected) and after a while vpn is reconnected
C. Chronologically , the events are as follows:
At 20:44:18.082 , [pool-1-thread-1] blacklists "rh7v-intel64-90-test-5.marklogic.com"
At 20:44:18.097 , [main] blacklists "rh7v-intel64-90-test-6.marklogic.com"
Any forestConfig obtained after this time shouldn't contain either "rh7v-intel64-90-test-5.marklogic.com" or "rh7v-intel64-90-test-6.marklogic.com".But
At 20:44:18.113 ,[pool-1-thread-1] uses hosts [rh7v-intel64-90-test-6.marklogic.com, rh7v-intel64-90-test-4.marklogic.com] with forests for "WriteHostBatcher"
and
At 20:44:18.150, [main] uses hosts [rh7v-intel64-90-test-5.marklogic.com, rh7v-intel64-90-test-4.marklogic.com] with forests for "WriteHostBatcher"
D.20:44:39.245 [pool-1-thread-1] INFO c.m.c.d.impl.WriteBatcherImpl - (withForestConfig) Using [rh7v-intel64-90-test-5.marklogic.com] hosts with forests for "WriteHostBatcher"
At 20:44:39.245 when [pool-1-thread-1] blacklists "rh7v-intel64-90-test-4", the job should have stopped as other hosts were already blacklisted and the number of available hosts at this point is 0. But [pool-1-thread-1] seems to filter out host "rh7v-intel64-90-test-4" from the latest forest config obtained by [main] thread [rh7v-intel64-90-test-5.marklogic.com, rh7v-intel64-90-test-4.marklogic.com] at 20:44:18.150 and the job continues.
Snippet log is below and complete log is available at exception.txt
20:44:06.444 [main] INFO c.m.c.d.impl.WriteBatcherImpl - Adding DatabaseClient on port 8000 for host "rh7v-intel64-90-test-5.marklogic.com" to the rotation
20:44:06.819 [main] INFO c.m.c.d.impl.WriteBatcherImpl - Adding DatabaseClient on port 8000 for host "rh7v-intel64-90-test-6.marklogic.com" to the rotation
20:44:06.819 [main] INFO c.m.c.d.impl.WriteBatcherImpl - Adding DatabaseClient on port 8000 for host "rh7v-intel64-90-test-4.marklogic.com" to the rotation
20:44:06.179 [main] INFO c.m.c.d.impl.WriteBatcherImpl - (withForestConfig) Using [rh7v-intel64-90-test-5.marklogic.com, rh7v-intel64-90-test-6.marklogic.com, rh7v-intel64-90-test-4.marklogic.com] hosts with forests for "WriteHostBatcher"
20:44:18.082 [pool-1-thread-1] ERROR c.m.c.d.HostAvailabilityListener - ERROR: host unavailable "rh7v-intel64-90-test-5.marklogic.com", black-listing it for PT15S
20:44:18.097 [main] ERROR c.m.c.d.HostAvailabilityListener - ERROR: host unavailable "rh7v-intel64-90-test-6.marklogic.com", black-listing it for PT15S
20:44:18.113 [pool-1-thread-1] INFO c.m.c.d.impl.WriteBatcherImpl - (withForestConfig) Using [rh7v-intel64-90-test-6.marklogic.com, rh7v-intel64-90-test-4.marklogic.com] hosts with forests for "WriteHostBatcher"
20:44:18.150 [main] INFO c.m.c.d.impl.WriteBatcherImpl - Adding DatabaseClient on port 8000 for host "rh7v-intel64-90-test-5.marklogic.com" to the rotation
20:44:18.150 [main] INFO c.m.c.d.impl.WriteBatcherImpl - (withForestConfig) Using [rh7v-intel64-90-test-5.marklogic.com, rh7v-intel64-90-test-4.marklogic.com] hosts with forests for "WriteHostBatcher"
20:44:39.245 [pool-1-thread-1] ERROR c.m.c.d.HostAvailabilityListener - ERROR: host unavailable "rh7v-intel64-90-test-4", black-listing it for PT15S
20:44:39.245 [pool-1-thread-1] INFO c.m.c.d.impl.WriteBatcherImpl - (withForestConfig) Using [rh7v-intel64-90-test-5.marklogic.com] hosts with forests for "WriteHostBatcher"
20:44:54.422 [pool-4-thread-1] INFO c.m.c.d.impl.WriteBatcherImpl - (withForestConfig) Using [rh7v-intel64-90-test-5.marklogic.com, rh7v-intel64-90-test-6.marklogic.com, rh7v-intel64-90-test-4.marklogic.com] hosts with forests for "WriteHostBatcher"
20:44:54.422 [pool-4-thread-1] INFO c.m.c.d.impl.WriteBatcherImpl - Adding DatabaseClient on port 8000 for host "rh7v-intel64-90-test-6.marklogic.com" to the rotation
20:44:54.422 [pool-4-thread-1] INFO c.m.c.d.impl.WriteBatcherImpl - Adding DatabaseClient on port 8000 for host "rh7v-intel64-90-test-4.marklogic.com" to the rotation
Test:
@Test
public void testFailOver() throws Exception{
try{
final String query1 = "fn:count(fn:doc())";
final AtomicInteger successCount = new AtomicInteger(0);
final MutableBoolean failState = new MutableBoolean(false);
final AtomicInteger failCount = new AtomicInteger(0);
WriteBatcher ihb2 = dmManager.newWriteBatcher();
ihb2.withBatchSize(20);
//ihb2.withThreadCount(120);
dmManager.startJob(ihb2);
ihb2.setBatchFailureListeners(
new HostAvailabilityListener(dmManager)
.withSuspendTimeForHostUnavailable(Duration.ofSeconds(15))
.withMinHosts(1)
);
ihb2.onBatchSuccess(
(client, batch) -> {
successCount.addAndGet(batch.getItems().length);
System.out.println("Success Host: "+ client.getHost());
System.out.println("Success batch number: "+ batch.getJobBatchNumber());
System.out.println("Success Job writes so far: "+ batch.getJobWritesSoFar());
}
)
.onBatchFailure(
(client, batch, throwable) -> {
System.out.println("Failed batch number: "+ batch.getJobBatchNumber());
/*try{
System.out.println("Retrying batch: "+ batch.getJobBatchNumber());
ihb2.retry(batch);
}
catch(Exception e){
System.out.println("Retry of batch "+ batch.getJobBatchNumber()+ " failed");
e.printStackTrace();
}*/
throwable.printStackTrace();
failState.setTrue();
failCount.addAndGet(batch.getItems().length);
});
for (int j =0 ;j < 20000; j++){
String uri ="/local/ABC-"+ j;
ihb2.add(uri, stringHandle);
}
ihb2.flushAndWait();
System.out.println("Fail : "+failCount.intValue());
System.out.println("Success : "+successCount.intValue());
System.out.println("Count : "+ dbClient.newServerEval().xquery(query1).eval().next().getNumber().intValue());
Assert.assertTrue(dbClient.newServerEval().xquery(query1).eval().next().getNumber().intValue()==20000);
}
catch(Exception e){
e.printStackTrace();
}
}