Hazelcast Simulator is a high-performance, production-grade testing framework for running performance, stress, and latency tests on Hazelcast clusters. It allows you to simulate complex workloads and evaluate distributed systems in a realistic and reproducible manner.
Hazelcast Simulator is designed for:
- 
Validating new features. 
- 
Detecting regressions. 
- 
Measuring throughput and latency under varying loads. 
- 
Simulating real-world failures (such as network latency and node crashes). 
- 
Running tests against cloud-deployed Hazelcast clusters. 
Key capabilities
- 
Supports both throughput and latency-oriented testing modes. 
- 
Orchestrates tests across multiple clients and members, with configurable topologies. 
- 
Provides out-of-the-box support for static infrastructure. 
- 
Support for automatic provisioning of cloud infrastructure (currently for AWS). 
- 
Integrates with monitoring tools and profilers for system-level analysis. 
- 
Enables custom test logic with a flexible Java-based test API. 
Supported test types
Simulator enables the execution of a variety of performance test types:
- 
Load Tests: Combined throughput and latency under realistic load. 
- 
Max Throughput Tests: Identify system saturation points under increasing concurrency. 
- 
Latency Tests: Measure operation response time at fixed request rates. 
- 
Spike and Soak Tests: Evaluate short bursts and long-term stability. 
- 
Stress Tests: Push the system until failure to observe limits. 
Performance testing strategy
Before executing tests:
- 
Use a test plan to define test scope, goals, and configurations. 
- 
Choose appropriate machine types, network configuration, and topology. 
- 
Consider persistence, CPU, memory, and network bandwidth requirements. 
Executing tests:
- 
Start with a small cluster and progressively scale load by adjusting parameters such as threadCountandratePerSecond.
- 
Continuously monitor throughput ( TPS), latency percentiles, CPU utilization, memory consumption, and other relevant system metrics.
- 
Use Hazelcast Simulator’s latency measurement and ramp-up utilities to apply load in a controlled and reproducible manner. 
After executing tests:
- 
Analyze the collected metrics and logs to detect bottlenecks, stability issues, and scaling thresholds. 
- 
Compare observed results against the goals defined in the test plan and identify bottlenecks or deviations from the planned outcomes. 
- 
Conclude by shutting down or cleaning up all cluster instances and related resources to avoid interference with subsequent runs. 
Advanced features
- 
Network Latency Simulation: Inject delays between groups of machines. 
- 
CP Subsystem Testing: Configure CP member priorities using cp_priorities.
- 
Flight Recorder Integration: Enable JFR for profiling with member_args.
- 
Warmup/Cooldown: Configure warmup and cooldown periods to improve report accuracy.