iSCSI performance is one of the most misunderstood aspects of the protocol. Looking at it purely from a bandwidth perspective, Fibre Channel at 2/4Gbit certainly appears much faster than iSCSI at 1Gbit. However, there are two important terms that need to be defined: Bandwidth and Throughput
Bandwidth: The amount of data transferred over a specific time period. This is measured in KB/s, MB/s, GB/s
Throughput: The amount of work accomplished by the system over a specific time period. This is measured in IOPS (I/Os per second), TPS (transactions per second)
There is a significant difference between the two in that Throughput has varying I/O sizes which have a direct effect on Bandwidth. Consider an application that requires 5000 IOPS at a 4k block size. That translates to a bandwidth of 20MB/s. Now consider the same application but at a 64k size. That's a bandwidth of 320MB/s.
Naturally, as the I/O size increases the interconnect with the smaller bandwidth will become a bottleneck sooner than the interconnect with the larger one (iSCSI vs FC).
At small block random I/O (4k-8k) both protocols perform equally well with similar IOPs and latencies but as the I/O size increases one (iSCSI) gets affected more than the other (FC)
TOEs and iSCSI HBAs do not guarantee higher performance and that's not the reason to deploy them. In fact, for a lot of workloads the native SW intiators outperform the iSCSI HBAs. The reason these cards came to fruition was to offload TCP and iSCSI processing overhead from the CPU. In an already underutilized server they provide no benefit unless you want to boot and even then you still don't need them given that there are NICs out there that support IP SAN boot using native iscsi stacks.