T H E   S T R O N G E S T   K N O W L E D G E

EMC and HP Storage Freaks

EMC VNX – RAID groups vs Storage Pools

With EMC training I have learnt some general guidelines, that should be followed to optimize and enable good performance on a VNX Unified storage system.

Drive types

When you chose a specific drive type, you should base your choice upon the expected workload.

  • FLASH – Extreme Performance Tier – great for extreme performance.
  • SAS – Performance Tier – perfect for general performance. Usually 10k RPM or 15k RPM.
  • NL-SAS – Capacity Tier – most cost effective, for streaming, agining data and archives and backups

RAID groups

When we want to provision storage on our hard drives, first we have to create either a RAID group, or a Storage Pool. Let’s first review RAID groups.

RAID Levels

Within EMC VNX RAID groups should have a maximum count of 16 drives (which is strange, since this rule does not apply to NetApp at all!).
Choosing the correct drive type is only one step, you also have to choose a correct RAID group.

  • RAID 1/0 – this RAID group is appropriate for heavy transactional workloads with a high rate of random writes (let’s say greater than 25-30%)
  • RAID 5 – I would say this one is most common one. Works best for medium to high performance, general-purpose and sequential workloads. And it’s much more space effective, since blocks are not mirrored, and parity takes only one extra disk (distributed on all disks thru the RAID group)
  • RAID 6 – Most often used for NL-SAS. Works best with read-based workloads such as archiving and backup to disk. It provides additional RAID protection (two disk may fail in one RAID group, instead of 1 with RAID5).

As I mentioned before, RAID groups can have a maximum count of 16 drives (not with NetApp). For parity RAID, the higher the count, the higher the capacity utilization (since there is only one parity disk in RAID5, and 2 parity disks in RAID6 regardless the RAID group size).

Preffered RAID configuration

With a predominance of large-block sequential operations the following rules are most effective:

  • RAID5 has a preference of 4+1 or 8+1
  • RAID6 has a preference of 8+2 or 14+2
  • RAID1/0 has a preference of 4+4

One more tip – when creating RAID groups, select drives from the same bus if possible to boost performance (always prefer horizontal over vertical selection).

Storage Pools

A storage pool is somehow analogous to a RAID group. In few words it’s a physical collection of disks on which logical units (LUNs) are created. Pools are dedicated for use by pool (thin or thick) LUNs. Where RAID group can only contain up to 16 disks, pool can contain hundreds of disks. Because of that, pool-based provisioning spreads workloads over many resources requiring minimal planning and management effort.

Of course pool has the same level of protection against disk failure as RAID groups, simply because RAID groups are used to build up pool. In fact, during pool creation, you have to choose which RAID configuration you preffer.

Pools can be homogeneous or heterogeneous. Homogeneous pools have a single drive type (e.g. SAS or NL-SAS) whereas heterogenous pools contain different drive types. Why EMC allow us to mix different drive types in one pool? Actually it’s pretty clever, le me go a little deeper into that:

Homogeneous pools

Pools with single drive type are recommended for application with similar and expected performance requirements. Only one drive (Flash, SAS, NL-SAS) is available for selection during pool creation.

Homogeneous Storage Pools

Homogeneous Storage Pools

Picture above shows three Storage pools, one created from Flash Drives (extreme performance tier), another created form SAS drives (performance tier) and the third one created from NL-SAS drives (capacity tier)

Heterogeneous pools

As mentioned previously, heterogeneous pools can consist different types of drives. VNX supports Flash, SAS, and NL-SAS drives in one pool. Heterogeneous pools provide the infrastructure for FAST VP (Fully Automated Storage Tiering for Virtual Pools).

Heterogeneous Storage Pools

Heterogeneous Storage Pools

FAST VP facilitates automatic data movement to appropriate drive tiers depending on the I/O activity for that data. The most frequently accessed data is moved to the highest tier (Flash drives) in the pool for faster access. Medium activity data is moved to SAS drives, and low activity data is moved to the lowest tier (NL-SAS drives).

Understanding NPIV and NPV

Two technologies that seem to have come to the fore recently are NPIV (N_Port ID Virtualization) and NPV (N_Port Virtualization). Judging just by the names, you might think that these two technologies are the same thing. While they are related in some aspects and can be used in a complementary way, they are quite different. What I’d like to do in this post is help explain these two technologies, how they are different, and how they can be used. I hope to follow up in future posts with some hands-on examples of configuring these technologies on various types of equipment.

First, though, I need to cover some basics. This is unnecessary for those of you that are Fibre Channel experts, but for the rest of the world it might be useful:

  • N_Port: An N_Port is an end node port on the Fibre Channel fabric. This could be an HBA (Host Bus Adapter) in a server or a target port on a storage array.

  • F_Port: An F_Port is a port on a Fibre Channel switch that is connected to an N_Port. So, the port into which a server’s HBA or a storage array’s target port is connected is an F_Port.

  • E_Port: An E_Port is a port on a Fibre Channel switch that is connected to another Fibre Channel switch. The connection between two E_Ports forms an Inter-Switch Link (ISL).

There are other types of ports as well—NL_Port, FL_Port, G_Port, TE_Port—but for the purposes of this discussion these three will get us started. With these definitions in mind, I’ll start by discussing N_Port ID Virtualization (NPIV).

N_Port ID Virtualization (NPIV)

Normally, an N_Port would have a single N_Port_ID associated with it; this N_Port_ID is a 24-bit address assigned by the Fibre Channel switch during the FLOGI process. The N_Port_ID is not the same as the World Wide Port Name (WWPN), although there is typically a one-to-one relationship between WWPN and N_Port_ID. Thus, for any given physical N_Port, there would be exactly one WWPN and one N_Port_ID associated with it.

What NPIV does is allow a single physical N_Port to have multiple WWPNs, and therefore multiple N_Port_IDs, associated with it. After the normal FLOGI process, an NPIV-enabled physical N_Port can subsequently issue additional commands to register more WWPNs and receive more N_Port_IDs (one for each WWPN). The Fibre Channel switch must also support NPIV, as the F_Port on the other end of the link would “see” multiple WWPNs and multiple N_Port_IDs coming from the host and must know how to handle this behavior.

Once all the applicable WWPNs have been registered, each of these WWPNs can be used for SAN zoning or LUN presentation. There is no distinction between the physical WWPN and the virtual WWPNs; they all behave in exactly the same fashion and you can use them in exactly the same ways.

So why might this functionality be useful? Consider a virtualized environment, where you would like to be able to present a LUN via Fibre Channel to a specific virtual machine only:

  • Without NPIV, it’s not possible because the N_Port on the physical host would have only a single WWPN (and N_Port_ID). Any LUNs would have to be zoned and presented to this single WWPN. Because all VMs would be sharing the same WWPN on the one single physical N_Port, any LUNs zoned to this WWPN would be visible to all VMs on that host because all VMs are using the same physical N_Port, same WWPN, and same N_Port_ID.

  • With NPIV, the physical N_Port can register additional WWPNs (and N_Port_IDs). Each VM can have its own WWPN. When you build SAN zones and present LUNs using the VM-specific WWPN, then the LUNs will only be visible to that VM and not to any other VMs.

Virtualization is not the only use case for NPIV, although it is certainly one of the easiest to understand.

<aside>As an aside, it’s interesting to me that VMotion works and is supported with NPIV as long as the RDMs and all associated VMDKs are in the same datastore. Looking at how the physical N_Port has the additional WWPNs and N_Port_IDs associated with it, you’d think that VMotion wouldn’t work. I wonder: does the HBA on the destination ESX/ESXi host have to “re-register” the WWPNs and N_Port_IDs on that physical N_Port as part of the VMotion process?</aside>

Now that I’ve discussed NPIV, I’d like to turn the discussion to N_Port Virtualization (NPV).

N_Port Virtualization

While NPIV is primarily a host-based solution, NPV is primarily a switch-based technology. It is designed to reduce switch management and overhead in larger SAN deployments. Consider that every Fibre Channel switch in a fabric needs a different domain ID, and that the total number of domain IDs in a fabric is limited. In some cases, this limit can be fairly low depending upon the devices attached to the fabric. The problem, though, is that you often need to add Fibre Channel switches in order to scale the size of your fabric. There is therefore an inherent conflict between trying to reduce the overall number of switches in order to keep the domain ID count low while also needing to add switches in order to have a sufficiently high port count. NPV is intended to help address this problem.

NPV introduces a new type of Fibre Channel port, the NP_Port. The NP_Port connects to an F_Port and acts as a proxy for other N_Ports on the NPV-enabled switch. Essentially, the NP_Port “looks” like an NPIV-enabled host to the F_Port on the other end. An NPV-enabled switch will register additional WWPNs (and receive additional N_Port_IDs) via NPIV on behalf of the N_Ports connected to it. The physical N_Ports don’t have any knowledge this is occurring and don’t need any support for it; it’s all handled by the NPV-enabled switch.

Obviously, this means that the upstream Fibre Channel switch must support NPIV, since the NP_Port “looks” and “acts” like an NPIV-enabled host to the upstream F_Port. Additionally, because the NPV-enabled switch now looks like an end host, it no longer needs a domain ID to participate in the Fibre Channel fabric. Using NPV, you can add switches and ports to your fabric without adding domain IDs.

So why is this functionality useful? There is the immediate benefit of being able to scale your Fibre Channel fabric without having to add domain IDs, yes, but in what sorts of environments might this be particularly useful? Consider a blade server environment, like an HP c7000 chassis, where there are Fibre Channel switches in the back of the chassis. By using NPV on these switches, you can add them to your fabric without having to assign a domain ID to each and every one of them.

Here’s another example. Consider an environment where you are mixing different types of Fibre Channel switches and are concerned about interoperability. As long as there is NPIV support, you can enable NPV on one set of switches. The NPV-enabled switches will then act like NPIV-enabled hosts, and you won’t have to worry about connecting E_Ports and creating ISLs between different brands of Fibre Channel switches.

I hope you’ve found this explanation of NPIV and NPV helpful and accurate. In the future, I hope to follow up with some additional posts—including diagrams—that show how these can be used in action. Until then, feel free to post any questions, thoughts, or corrections in the comments below. Your feedback is always welcome!

Disclosure: Some industry contacts at Cisco Systems provided me with information regarding NPV and its operation and behavior, but this post is neither sponsored nor endorsed by anyone.