Being a member of the Post-Sales team working on storage deployments and knowledge transfer sessions, now and then I run into an often-ignored conversation with storage & virtualization admins on how-to and what-is the best way to calculate LUN Queue Depths between Host Initiators and Storage Targets?
Before I indulge further, lets understand what exactly a LUN is? In the block-based storage protocols such as iSCSI, FC and FCoE, a LUN (Logical Unit Number) is a unique identifier used to describe individual or collections of hard disk drives for address by a SCSI command. Basically, it is a logical drive or you can refer to it as a soft partition within the raid unit, disk pool or disk volume. Different storage vendors have different layers and designs requirements before you are able to create and provision a LUN to Host(s). A LUN could reference as an entire RAID set, a single disk or partition, or multiple hard disks or partitions. Each LUN has its limitations depending upon the Host Initiator used to connect.
Next, What is Queue Depth? It is the number of I/O operations (SCSI commands) that can run in parallel on a LUN. Meaning, it specifies the number of outstanding requests per LUN. It is the number of I/O requests that the initiator can send and receive per LUN. Each I/O request from the host’s initiator HBA to the storage target adapter ingests a queue entry. Typically, a higher queue depth equates to improved performance. However, if the storage controller’s maximum queue depth is reached, that storage controller discards incoming I/O request by returning queue full response to the initiator, causing bottlenecks and latency issues.
I have seen some Storage Admins don’t recognize and plan ahead the Fan-In (Host Port to Target Port) and Fan-Out (Storage Port to Host Port) ratios. In order to plan these performance effecting relationship, one need to find out what queue depth values that are currently configured and its limitations. Also what is the purpose of the LUN, meaning what type of workload will be served? Small block size (random) or Larger block size (sequential), where bandwidth is critical and the fan-out ratio need to be low. Basically your application performance and block size requirements should also be consider for the right ratio.
Fine tuning queue depths values end to end are important for handling the performance and I/O requests between the initiators and targets. This is where one can encounter performance jams. Understanding queue depth is a not at all difficult. You just have to remember a simple math. Let me try to explain. Generally, the number of hosts an array can adequately supports is dependent upon the following three availabilities:
1: Available Queues Depth per Physical Storage Port
2. Available Storage Ports per Controller
3. Available bandwidth per Controller
So what our IntelliFlash Array Controllers are supporting currently?
1. 2048 I/O Queues per Physical Port (QLogic HBA)
2. Total 4098 I/O Queues per Controller. (Two ports, when Round-Robin Rule is achievable)
3. Upto 8GbFC bandwidth per Physical Port
The number of outstanding IOs per physical storage port has a direct impact on performance and scalability. Storage ports within arrays have varying queue depths, from 512 Queues, to 1024, to 2048 per port. The number of initiators a single storage port can support is correlated to the storage port’s available queues. For instance, a port with 2048 queues and a typical LUN queue depth of 64 will deliver the following result:
2048 / 64 = 32 LUNs on 1 Initiator (HBA) or 32 Initiators (HBAs) with 1 LUN each or any combination not to surpass this number. The fan-out ratio is 32:1 meaning 32 initiators per storage port.
Performance degradation starts quickly once you exceed the queue depth on target and it starts backing off the queues and it will hit hard on your response times. Also, you cannot add any further hosts and present any more LUNs from that target port. Ideally, if you have few Hosts and LUNs, the default is good. If you have the luxury to dedicate ports for high performance applications, make sure the queue depth is configured appropriately so you get the best of performance.
Best practice is to keep the HBA vendor uniform across all hosts participating in a Cluster of Hosts. If you were using a mixed environment (for example, with QLogic and Emulex), then recommendation would be to keep the Queue Depth value the same across all the hosts in the cluster. Every HBA vendors does provide directions on how to change default Queue Depth number. The general recommendations can be made about “tuning” queue depths for small to mid-size systems, use a HBA queue depth of 32 and 128 or large systems.
As your storage and virtualization setup evolve and you continue to provision new LUNs and/or introduce new Hosts, make sure to follow the above math and that the storage target port queue depth is not exceeded to avoid performance degradation. We all understand that due to dynamics of new technologies and constant demands from end user and business application priorities, it will be very difficult to have a perfect IO traffic; it will be always be give-take scenarios when planning your queue depth ratios. Understanding and knowing your IO request configuration between Initiators and Targets, will help you to manage and troubleshoot any performance related holdups and latency.