
how-to block ads
|
|
Uniqs: 1668 |
Share Topic  |
 |
|
|
|
 sk1939Premium join:2010-10-23 Washington, DC kudos:9 Reviews:
·T-Mobile US
1 edit | Disk Transfer Speeds RAID 1 with 2 Seagate ST3500418AS drives |  QNAP with Hitachi Hitachi HDS723020BLA642 |  120GB Samsung 830 Series |
I had someone ask me in person the other day about the peformance between a local disk and network storage like NFS or iSCSI, so I thought I would share a bit of what I found here.
First off, let me start off by saying that while NFS is slightly faster and has less overhead, iSCSI if FAR more friendly to set up in a Windows environment, however for a server environment note the performance overhead if you don't have a NIC that supports ToC and iSCSI offload (disabled for this test) such as a Broadcom NetExtreme II 5709. Also note the CPU overhead for iSCSI which is around 4% (this PC idles between 1 and 4% so adjust as a reference).
Obviously your transfer speed depends on your NAS appliance and drive, which in my case is a single drive QNAP TS-119 PII+ with a Hitachi HDS723020BLA642 (thanks again Koitsu) respectively. As a result, speeds are going to be higher with something like a EMC VNXe 3100, a dedicated storage server, or even a higher end NAS. | |  koitsuPremium,MVM join:2002-07-16 Mountain View, CA kudos:19 1 edit |  Intel 510 SSD |
Thank you for posting this -- this is more or less what I was hoping to see/get insights to.
I'm a little surprised by the iSCSI results -- I expected something higher (more like 70MBytes/sec or so). But iSCSI may not be the source of the issue -- do you see similar rates to the QNAP using other protocols (CIFS/SMB, NFS, and FTP). Actually, strike that -- exclude NFS from the test list. It's a pain in the ass and varies horribly depending on both the server and client OSes, as well as massive numbers of tuning parameters, protocol versions (2 vs. 3 vs. 4), and whether or not you're using TCP or UDP. So yeah, stuff NFS for these tests. 
The reason I mention FTP, BTW, despite not being a network filesystem protocol: it has significantly less overhead than other protocols. For example on my home LAN (gigE), from a Windows XP Pro system I can send/receive 97-98MBytes/second *repeatedly* to/from my FreeBSD box (filesystem is a ZFS mirror consisting of 2 disks, but I got this performance even with 1 disk, as well as with an empty ZFS ARC (applies to receiving only)), while CIFS/SMB is about 70MBytes/sec (using Samba + use of AIO + very specific tunings in smb.conf). In my environment, 25MByte/sec hit is pretty major (that's 1/4th the total bandwidth available), so that's why I ask about FTP.
Also just an observation in passing, nothing to worry about or get too concerned over: the SSD results you posted look very erratic for read speeds. They appear to drop then recover every 2GBytes or so. How much space is free on that SSD presently? If very little, then it's wear-levelling. If a lot, then I wonder if there's a firmware update for it that improves things. I'd expect something a little more "linear". Or do you have HD Tune Pro's Benchmark -> Test/speed accuracy slider set all the way up at top (Fast)? I tend to drop it to the notch below that.
Attached is a read benchmark screenshot from my Intel 510 (120GB) SSD for comparison, running on Windows XP Pro (and has always been used on that OS -- read: NO TRIM SUPPORT). You can see the used disk space in the Intel tool. -- Making life hard for others since 1977. I speak for myself and not my employer/affiliates of my employer. | | |
|  sk1939Premium join:2010-10-23 Washington, DC kudos:9 Reviews:
·T-Mobile US
|  512k iSCSI |  512k SSD |
said by koitsu:Thank you for posting this -- this is more or less what I was hoping to see/get insights to.
I'm a little surprised by the iSCSI results -- I expected something higher (more like 70MBytes/sec or so). But iSCSI may not be the source of the issue -- do you see similar rates to the QNAP using other protocols (CIFS/SMB, NFS, and FTP). Actually, strike that -- exclude NFS from the test list. It's a pain in the ass and varies horribly depending on both the server and client OSes, as well as massive numbers of tuning parameters, protocol versions (2 vs. 3 vs. 4), and whether or not you're using TCP or UDP. So yeah, stuff NFS for these tests. 
Your welcome, glad someone was able to benefit from it.
It's a little slow, but it's in keeping with file copy speed that I've seen so far over the network. SAMBA/AFP/CIFS nets me somewhere between 25-40MB/s, depending. However, that is basing it off off Windows Explorer, which is notorious for lying. Interface traffic reports it as peaking at 430Mbps on average though, which works out to be around 54MB/s. The other thing to remember is that the QNAP uses a Marvell ARMADA 300 ARM CPU running at 2.0GHz to process everything, and that the iSCSI stack is layered on top of the EXT3 filesystem on the disk.
said by koitsu:The reason I mention FTP, BTW, despite not being a network filesystem protocol: it has significantly less overhead than other protocols. For example on my home LAN (gigE), from a Windows XP Pro system I can send/receive 97-98MBytes/second *repeatedly* to/from my FreeBSD box (filesystem is a ZFS mirror consisting of 2 disks, but I got this performance even with 1 disk, as well as with an empty ZFS ARC (applies to receiving only)), while CIFS/SMB is about 70MBytes/sec (using Samba + use of AIO + very specific tunings in smb.conf). In my environment, 25MByte/sec hit is pretty major (that's 1/4th the total bandwidth available), so that's why I ask about FTP.
I haven't tried FTP admittedly, but I don't expect performance to be significantly better in this particular case due to being CPU limited. I do expect that for a higher-powered NAS/SAN that FTP would be faster, due to less overhead, however many enterprise-grade SAN/NAS devices don't support FTP if I do recall, but then we go into other, lower overhead technologies like Fibre Channel and FCoE. There are some benefits to iSCSI over FTP in the sense that Windows maps it as a local disk with all the benefits involved, as well as the ability to boot from iSCSI as a local disk (again using certain network controllers).
said by koitsu:Also just an observation in passing, nothing to worry about or get too concerned over: the SSD results you posted look very erratic for read speeds. They appear to drop then recover every 2GBytes or so. How much space is free on that SSD presently? If very little, then it's wear-levelling. If a lot, then I wonder if there's a firmware update for it that improves things. I'd expect something a little more "linear". Or do you have HD Tune Pro's Benchmark -> Test/speed accuracy slider set all the way up at top (Fast)? I tend to drop it to the notch below that.
Attached is a read benchmark screenshot from my Intel 510 (120GB) SSD for comparison, running on Windows XP Pro (and has always been used on that OS -- read: NO TRIM SUPPORT). You can see the used disk space in the Intel tool. That is due to the use of 64kb sectors instead of the standard 512k when running the benchmark. The Benchmark setting is set all the way down to "Accuracy". Attached is a benchmark using standard 512k sectors. | |  DarkLogixTexan and ProudPremium join:2008-10-23 Baytown, TX kudos:3 1 edit | When I get a 4th drive for my qnap I'll post some similar pics.
as mine is running on an Atom not an Arm. (TS-469-Pro) though I also plan to get a dual or quad intel nec for my computer and setup teaming (my Qnap also has dual nics on it.)
Slightly OT but any idea of how to get windows to expand a 2TB volume on a 4TB disk?
On my Qnap I started with 2x 2tb drives in raid1, and then after I finished copying everything from my computer's 2tb drive to the iScsi disk I moved it's 2tb drive over to the qnap and converted it to raid5 (took about 8 hours)
then on the qnap I made the iSCSI lun expand to 4TB, and windows sees a 4TB disk.
but I can't get it to expand (I really don't want to have to get a drive to copy everything to and reformat the iSCSI drive.) | |  Reviews:
·Speakeasy
2 edits | reply to sk1939
Here is a run of my iSCSI disk at home. This disk is backed by a Linux box running ZFSonLinux, with a 7 drive array in RaidZ2 (6 drives + 1 hotspare, with two parity drives) RaidZ2 is sort of similar to RAID6. They are all Hitachi 5k3000 2TB drives. I am using a ZVOL as the iSCSI target, and then my windows 2008R2 server box acts as the initiator, and thus there is an NTFS volume on top of the ZFS ZVOL. I get just under 7TB of usable space in windows. I am losing space to file systems twice as I am running NTFS on top of ZFS. The windows box has 18GB of RAM and the Linux box has 24GB of RAM. The iSCSI runs over a dedicated gigabit ethernet link between the two boxes. Both machines also have another gigabit ethernet connection to my LAN. I use this space for mass storage so speed is not super important to me. All of the data on here is also backed up in other places. 
NOTE1: CPU is 100% as the machine runs folding@home
NOTE2: Seek times in the first pic are very low because of caching. | |  sk1939Premium join:2010-10-23 Washington, DC kudos:9 | What are the stats on the Linux box? Disk transfer speed seems somewhat low given the amount of hardware involved. Maybe due to the software RAID, but still... | |  koitsuPremium,MVM join:2002-07-16 Mountain View, CA kudos:19 | Guesses for the performance hit:
* Use of 6 disks in a single ZFS vdev that makes up the entire raidz2 pool. I forget the rule of thumb here but I believe multiple vdevs of 2-disk mirrors (thus "multiples mirrors which are striped") would perform better; raidzX doesn't perform that great, and added parity (raidz2 and raidz3) results in an even bigger performance hit
* With ZFS (and I imagine any RAID or RAID-like implementation, for that matter!) the speed of the I/O transaction is often limited to the speed of the slowest disk. I.e. if you have a single disk that is performing like crap (much worse than the others), it's going to influence I/O on the entire pool. This has been proven time and time again on the FreeBSD lists, where a person has one disk that's performing like total crap (excessive ECC being done by the drive, etc.) and their I/O rates are abysmal. "zpool iostat -v 1" can help track this down, or "gstat -I500ms" on FreeBSD (I'd love to know what Linux has that's like gstat).
* Use of ZFSonLinux -- is this the fuse implementation or the kernel implementation? If the former then that explains it; if the latter then possibly all the internal/kernel-level I/O isn't optimised for speed? I'm grasping at straws on this one (if kernel-level)
* The extremely large number of abstraction layers between client I/O layer and physical disk layer. Take a look: Windows client I/O -> NTFS filesystem -> iSCSI client -> Ethernet -> iSCSI server -> ZFS on Linux -> physical disk. As an *IX admin who heavily applies the KISS principle in every way/shape/form -- yuck!
We can safely rule out improper alignment due to 4096-byte sectors because the 5K3000 2TB model drives use 512-byte sectors.
P.S. -- PhReE5 , seek times are extremely low not because of "caching" but because of use of iSCSI. Seek times can't be measured reliably this way given the use of Ethernet (see sk1939 's iSCSI results too -- same thing); you have to look at seek times on the actual machine that has the physical disks. So yes, you can safely ignore the seek times in the benchmark, but it's not due to "caching".  -- Making life hard for others since 1977. I speak for myself and not my employer/affiliates of my employer. | |  sk1939Premium join:2010-10-23 Washington, DC kudos:9 Reviews:
·T-Mobile US
| Makes sense.
To expand on the I/O in my post a bit further:
-Further testing shows that writes FROM the NAS are faster than writes to, peaking out around 680Mbps (85MB/s), with a single large file, non-directory.
- FTP on the NAS nets me around 5-10MB/s of performance over iSCSI, however file navigation isn't as nice.
-NFS is a PITA, and with basic (default) configuration (which you can't change without root access), performance is about the same depending on file type. Plays well with Linux though.
-Dual port NICs are useless if you don't have a switch that can do port channels/teaming, and have it enabled (oops). Dosen't affect performance though. | |  DarkLogixTexan and ProudPremium join:2008-10-23 Baytown, TX kudos:3 | reply to koitsu  With Write Cache on the QNap off
 With Write Cache on the QNap on
any ideas to improve the starting speeds?
This is on a QNAP TS-469-Pro with 3x 2TB drives all with 64mb cache on the drives. | |  koitsuPremium,MVM join:2002-07-16 Mountain View, CA kudos:19 | You're using HD Tune 2.55. Why? Those benchmarks are read benchmarks -- HD Tune 2.55 (free) doesn't do write benchmarks! HD Tune's home page even states that quite boldly:
quote: 12 February 2008: HD Tune Pro released!
HD Tune Pro is an extended version of HD Tune which includes many new features such as: write benchmark, secure erasing, AAM setting, folder usage view, disk monitor, command line parameters and file benchmark.
So write caching isn't going to have any effect on a read benchmark.
You will need to talk to QNAP about why the NAS does not get good read speeds until the ~40% point (which is 40% of 2200GBytes). Only they can explain this behaviour.
Talking about write caching:
The term "write caching" is also too vague in this context, I'm sorry to say. Individual disk drives have write caching (and it can be toggled), so possibly that setting you've adjusted does that. I don't know. But NAS units which have actual RAM used for I/O caching (think: hardware RAID controllers with RAM) also have a form of write caching, where they actually store data written to the array in RAM and flush it to the physical disk when the NAS firmware deems it convenient/best. That form of write caching is 100% independent of disk write caching. So possibly the setting adjusts that -- but again, I don't know.
Like I've told you in PM, to work out things like this, you really need to rely 100% on the vendor (QNAP). They are the only ones who know how their product functions behind the scenes. -- Making life hard for others since 1977. I speak for myself and not my employer/affiliates of my employer. | |  DarkLogixTexan and ProudPremium join:2008-10-23 Baytown, TX kudos:3 | I used the free one because I had installed the trial before and it's timed out so I'd have to pay for it.
The cacheing is related to the EXT4 filesystem, I'll get a screenshot of the page with the setting | |  DarkLogixTexan and ProudPremium join:2008-10-23 Baytown, TX kudos:3 | Well just tried HD tune pro (to see if the timeout thing had stopped and it says I need to remove all partitions to run the write test. | |  koitsuPremium,MVM join:2002-07-16 Mountain View, CA kudos:19 2 edits | said by DarkLogix:Well just tried HD tune pro (to see if the timeout thing had stopped and it says I need to remove all partitions to run the write test. Yes, and it's correct -- it's a device benchmark test, not a file benchmarking test. What do you think a write benchmark actually does when benchmarking a device? It writes a bunch of data to the device directly. And what do you think that's going to do to a filesystem that's on the device? 
A file-based benchmark is not going to provide the same information as a device-based benchmark. You have an entire filesystem abstraction layer in the way of the former, which has its own layer of caching as well as many other caveats. There are other utilities which use file-based tests if that's what interests you, but I do not care about those -- with a file-based benchmark, you can't do LBA benchmarking because there's no way to guarantee what LBA you get; you simply say "open a file, write X number of bytes to it" and how the filesystem layer chooses to organise those blocks of bytes is purely up to it. They may be linearly stored on the disk, and then again they may not be. Welcome to filesystem fragmentation!).
TL;DR -- yes, that's correct; and to do device-level write benchmarks, you need to remove all filesystems from the device. -- Making life hard for others since 1977. I speak for myself and not my employer/affiliates of my employer. | |  koitsuPremium,MVM join:2002-07-16 Mountain View, CA kudos:19 | reply to DarkLogix said by DarkLogix:The cacheing is related to the EXT4 filesystem, I'll get a screenshot of the page with the setting. Shame on QNAP for putting this at the "Hardware" level of the configuration interface -- this is not a hardware feature, and has nothing to do with neither disk write caching nor "hardware RAID" write caching. Vendors today, sigh...
What's being described there is a filesystem feature of ext4 called "delayed allocation". Rather than explain it, I'll just link folks to the details and they can read it themselves:
»ext4.wiki.kernel.org/index.php/E···location
And folks using ext4 natively on QNAP products should absolutely read this:
»en.wikipedia.org/wiki/Ext4#Delay···ata_loss
According to all I can find online, QNAP products (some of them anyway) appear to be Linux-based. Thus I would ask QNAP what exact Linux kernel version they're using. If 2.6.30 or later, then great. If 2.6.29 or 2.6.28, then I would ask them if they've backported the aforementioned ext4 delayed allocation patch. If not, then folks should turn that feature off else risk data loss when power is lost (unless the device is on a UPS and you trust the UPS fully, of course).
Use of that feature only applies if the NAS itself is using ext4 as a filesystem. If iSCSI can export a "volume" which actually gets written to the NAS filesystem as a single file, and that filesystem is ext4, then you'd be susceptible to this problem. If iSCSI only exports "volumes" that correlate directly (1:1) with a RAID or RAID-like series of devices (disks), and lets the iSCSI client choose to format the volume as whatever filesystem it wants (e.g. NTFS, ext3, ext4, etc.) then delayed allocation doesn't apply (at the NAS level -- instead, if your iSCSI client system used ext4, you may want to disable the feature there, see above paragraph of course).
Hope that explains things a bit more. This is just further proof, in my opinion, of how/why storage devices and solutions in general are not as simple as companies (and people) want to make them out to be. They are never, ever simple. -- Making life hard for others since 1977. I speak for myself and not my employer/affiliates of my employer. | |  DarkLogixTexan and ProudPremium join:2008-10-23 Baytown, TX kudos:3 | Well my understanding of what they do on the Qnap is it does a software raid on the drives and formats them directly with EXT4, then creates a lun on that Filesystem for iSCSI to target.
SO tehn on windows I have the iSCSI disk formatted with NTFS (as a GPT drive) (still not sure why windows won't let me entend the volume as I've already done the extension on the QNAP and windows sees the full 4TB just can't extend the 2TB partition to take the whole drive. | |  DarkLogixTexan and ProudPremium join:2008-10-23 Baytown, TX kudos:3 | reply to koitsu Ok with the kernel of 2.6.33.2 it should be safe to turn on the delay allocation right? | |  koitsuPremium,MVM join:2002-07-16 Mountain View, CA kudos:19 | First paragraph and 2nd-to-last: »en.wikipedia.org/wiki/Ext4#Delay···ata_loss
 -- Making life hard for others since 1977. I speak for myself and not my employer/affiliates of my employer. | |  Reviews:
·Speakeasy
2 edits | reply to koitsu -6 disks in RaidZ2 works well (you want n^2 data disks, so 1,2,4,... and I have 4)
-The disks are all identical, and all the same age
-This is the kernel modules implementation
-The reason I bothered with the iSCSI business at all instead of just using samba is because samba just sucks and is too slow. Using iSCSI and a windows box I get significantly better performance over SMB than samba alone.
-You cant eliminate the 4k alignment issue, except for the fact that I am (fairly certain at least) that I aligned then correctly. These drives internally DO use 4k sectors, only exposing 512b as emulation.
-Also I say the seek times are very low because I ran the benchmark twice and the first time I got seek times like what you would expect for 5400rpm drives, and then the second run the seek times were flat. That's caching 
In any case, yeah the performance is a bit disappointing. I haven't really bothered to look into it too much though as it is fine for my purposes. I wouldn't be surprised if I have the disks aligned wrong, as that could easily cause performance like I am seeing.
EDIT: The average read speed during a ZPOOL scrub is ~95MB/sec, still pretty weak. I wonder if folding on the box slows down ZFS much any. That could be part of it too.
EDIT2: Based on the output of zpool iostat the bottleneck is in the network. I see a big write of about 140MB to the array (evenly distributed across all 6 drives) and then about 4 seconds of nothing, then another big write, then 4 seconds of nothing. I could try messing with jumbo packets, and also putting better NIC's in the boxes for the machine-to-machine link. It might actually be a lot easier to make this fast than I thought... | |  koitsuPremium,MVM join:2002-07-16 Mountain View, CA kudos:19 1 edit | 1. I didn't say 6 disks in raidz2 didn't work. I said performance-wise you're going to get better performance with a set of striped vdevs, where each vdev is a mirror consisting of 2 disks. This is akin to RAID 1+0, and as I'm sure you know, RAID 1+0 speed-wise (reads and writes) beats the living pants off RAID-6. This same design applies to ZFS as well.
raidz1 is faster than raidz2, and raidz2 is faster than raidz3. Thus, WRT point #4 below and the scrub "benchmark" you give, I am not surprised in the least, but #5 may also play a role.
2. First you say you went to iSCSI because Samba is too slow, and that using iSCSI and a Windows machine gets you good performance over SMB. I'm confused. You just said that you went with iSCSI rather than CIFS/SMB...?
3. The Hitachi 5K3000 2TB models use physical 512 byte sectors. I can assure you because 1) I sold one of these drives to sk1939 , 2) Hitachi's own documentation says so, and 3) smartctl also confirms HGST's documentation even for their 3TB models. It's fine to 4096-byte align a drive of this sort, of course! I'm just saying that alignment isn't a factor in this case. The 3TB disks use 4096-byte sectors; the highest capacity disk you can provide with 512-byte sectors is 2TB. You can't go any larger without 4096-byte, due to LBA addressing limitations. Edit: Nope, I'm completely wrong on this part. In fact I'm not even sure what I was thinking when I said that, to be honest. Maximum capacity with LBA48 = 144PB:
2^48 = 281474976710656 LBAs
281474976710656 * 512 byte sector = 144,115,188,075,855,872 byte capacity
281474976710656 * 4096 byte sector = 1,152,921,504,606,846,976 byte capacity
4. A ZFS scrub is a read-only operation unless anomalies are found. So from this we can tell that reading from the pool is around 95MBytes/second (barring anything #5 causes), which is almost 2x higher than what iSCSI is putting out.
But as I said above -- and I am happy to show you benchmarks if you want to see them -- at least with ZFS+Samba on my setup (testing read speeds from the pool only, pool = a single mirror of 2 disks), I can get ~70MBytes/second via gigE, but with ZFS+FTP I can get 97-98MBytes/second. This is why I say that 95MBytes/second using a raidz2 pool sounds about right.
5. As you know ZFS is CPU-bound, so yes, I imagine your Folding client does result in a hit on performance. How much should be easy to discern: shut off the Folding client, re-run the tests, voilà. :-)
6. You cannot use "zpool iostat" SOLELY to discern where the bottleneck is. ZFS reads/writes to the disks in "bursts" (specifically it's the TXG timeout -- I'm talking about tunable vfs.zfs.txg.timeout. DO NOT LET THE WORD "TIMEOUT" MAKE YOU THINK ITS SOME KIND OF FAILURE TIMEOUT), so the behaviour will always look like that. I'm familiar with this tunable because on FreeBSD it used to default to 30, which was way too long; it's now 5.
What you need is something like gstat -I500ms on FreeBSD or iostat -x 1 on Solaris. What you need to see is device I/O speed being performed on a per-device basis, and I'm talking about at the I/O layer (e.g. libata), not at the filesystem layer. I don't know enough about Linux to know what it has that can do this.
What I'm trying to say here: I do not believe the problem is necessarily a network bottleneck. What you're describing, especially the "every 4 seconds" thing, is absolutely how ZFS behaves (on Solaris and on FreeBSD, and it sounds like on Linux as well). That's just how it works.
-- Making life hard for others since 1977. I speak for myself and not my employer/affiliates of my employer. | |  koitsuPremium,MVM join:2002-07-16 Mountain View, CA kudos:19 1 edit | reply to sk1939
I'll toss in some of my own benchmarks just for the hell of it. Client: - Windows XP Pro SP3 32-bit, 16GB RAM (only 4GB usable, obviously) - Intel i7-2600K - Gigabyte GA-P67A-UD3-B3 - AHCI enabled, NCQ enabled - Intel RST driver set 11.2.0.1006 - Realtek 8111E NIC @ gigE - Realtek driver set 5.800.0719.2012 - Drive D: = WD10EFRX (WD Red, 1TB); attached to SATA600 port - Drive Y: = CIFS/SMB mount to server \\icarus\CD_ImagesClient network stack tunings: Windows Registry Editor Version 5.00
; Decimal equivalents
; ------------------------------------------------------------------
; DefaultTTL = 64
; EnablePMTUBHDetect = 1 (OS default, entry is for clarity)
; EnablePMTUDiscovery = 1 (OS default, entry is for clarity)
; SackOpts = 1
; Tcp1323Opts = 1 (window scaling enabled, TS disabled)
; TcpMaxDupAcks = 2 (OS default, entry is for clarity)
; TcpWindowSize = 496400 ((85*1460) * (2^2)) -- 85 is adjustable
;
; NOTE: For TcpWindowSize, the Tweak Tester on DSLR recommends a
; value between 470120 and 1251220 for our connection.
; ------------------------------------------------------------------
;
; RFC1323 timestamps are disabled "as a precaution" WRT this issue:
; http://www.dslreports.com/forum/r26290497-DSLR-timeouts-from-Linux-land
;
; ------------------------------------------------------------------
[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters]
"DefaultTTL"=dword:00000040
"EnablePMTUBHDetect"=dword:00000001
"EnablePMTUDiscovery"=dword:00000001
"SackOpts"=dword:00000001
"Tcp1323Opts"=dword:00000001
"TcpMaxDupAcks"=dword:00000002
"TcpWindowSize"=dword:00079310
Server: - FreeBSD 9.1-PRERELEASE, 64-bit, 8GB RAM, kernel build date: Fri Oct 12 04:37:26 PDT 2012 - Intel Q9550 - Supermicro X7SBA - AHCI enabled, NCQ enabled - Intel 82573E NIC @ gigE - FreeBSD em0 (Intel NIC) driver version 7.3.2 - ZFS v28 used - Samba 3.6.7 with AIO enabled - FTP enabled - Disk ada1: WD10EFRX (WD Red, 1TB); attached to SATA300 port - Disk ada2: WD10EFRX (WD Red, 1TB); attached to SATA300 port - Disks ada1 and ada2 are a ZFS mirror (effectively RAID-1) - /storage/CD_Images exported via Samba Server ZFS pool details: root@icarus:/root # zpool status data
pool: data
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
data ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
ada1 ONLINE 0 0 0
ada2 ONLINE 0 0 0
errors: No known data errors
root@icarus:/root # df -k
Filesystem 1024-blocks Used Avail Capacity Mounted on
...
data/storage 943827908 426242276 517585632 45% /storage
Server tunings that play a role in throughput (network or disk or other features): /boot/loader.conf --
# We use Samba built with AIO support for increased network I/O speed
#
aio_load="yes"
/etc/sysctl.conf --
# Increase send/receive buffer maximums from 256KB to 16MB.
# FreeBSD 7.x and later will auto-tune the size, but only up to the max.
net.inet.tcp.sendbuf_max=16777216
net.inet.tcp.recvbuf_max=16777216
# Double send/receive TCP datagram memory allocation. This defines the
# amount of memory taken up by default *per socket*.
net.inet.tcp.sendspace=65536
net.inet.tcp.recvspace=65536
Server Samba tuning parameters: #
# The below options increase throughput substantially. Be aware
# that AIO support requires the aio.ko kernel module loaded,
# and Samba to be built with AIO enabled. Important notes:
#
# http://www.samba.org/samba/docs/man/manpages-3/smb.conf.5.html
#
# 1) If any of the path directories reside on ZFS, disabling sendfile
# support is a wise choice. There are known problems with sendfile
# and mmap on ZFS, including resulting in 2x the amount of memory
# used on the machine (VM cache + ZFS cache). For further details,
# see freebsd-fs or freebsd-stable thread, subject "8.1-STABLE:
# zfs and sendfile: problem still exists".
#
# 2) (2011/10/03) "socket options" SO_SNDBUF and SO_RCVBUF do not
# appear to matter on FreeBSD, or our /etc/sysctl.conf adjustments
# somehow take care of this (or maybe AIO?). The performance is the
# same with or without these two socket options on 8.2-STABLE and
# newer. TCP_NODELAY is also default since Samba 2.x.
#
# 3) (2011/10/03) My previously-mentioned "aio write behind" option
# is incorrect; see the officia smb.conf(5) man page for the syntax.
# It's not a yes/no toggleable, thus serves no purpose.
#
# 4) "aio {read,write} size" variables define the minimum number of
# bytes before using AIO. We set these to 1024, although many websites
# recommend a value of 1 (which seems a bit aggressive).
#
# 4) "read raw" is enabled by default, but has no real bearing on
# speed when disabled. Thus we leave it enabled.
#
use sendfile = no
aio read size = 1024
aio write size = 1024
Relevant Samba server config for CD_Images export: [CD_Images]
path = /storage/CD_Images
writable = yes
guest ok = yes
guest only = yes
valid users = @storage
force user = storage
force group = storage
Files involved in testing: -rwxr--r-- 1 storage storage 668362752 Sep 22 03:11 /storage/CD_Images/FreeBSD/9.1-RC/FreeBSD-9.1-RC1-amd64-disc1.iso
-rwxr--r-- 1 storage storage 714375168 Sep 22 03:12 /storage/CD_Images/FreeBSD/9.1-RC/FreeBSD-9.1-RC1-amd64-memstick.img
Server rebooted to clear ZFS ARC cache. This ensures that the data being read off disk by ZFS has to be done right then and there, e.g. file is not being served from RAM. Verification using top (note ARC size is 15MBytes): ARC: 15M Total, 4824K MRU, 9858K MFU, 16K Anon, 146K Header, 472K Other
Test #1: FTP: client fetches FreeBSD-9.1-RC1-amd64-memstick.img from server D:\>ftp icarus.home.lan
Connected to icarus.home.lan.
220 icarus.home.lan FTP server (Version 6.00LS) ready.
User (icarus.home.lan:(none)): jdc
331 Password required for jdc.
Password:
230 User jdc logged in.
ftp> bi
200 Type set to I.
ftp> cd /storage/CD_Images/FreeBSD/9.1-RC
250 CWD command successful.
ftp> get FreeBSD-9.1-RC1-amd64-memstick.img
200 PORT command successful.
150 Opening BINARY mode data connection for 'FreeBSD-9.1-RC1-amd64-memstick.img' (714375168 bytes).
226 Transfer complete.
ftp: 714375168 bytes received in 7.52Seconds 95047.25Kbytes/sec.
ftp> quit
221 Goodbye.
Results: 95MBytes/second Checking ZFS ARC size after test #1, since that file's contents should now be in the ARC: ARC: 700M Total, 683M MRU, 13M MFU, 16K Anon, 1869K Header, 1685K Other
Test #2: Samba: client fetches FreeBSD-9.1-RC1-amd64-disc1.iso from server Note: the reason I'm using FreeBSD-9.1-RC1-amd64-disc1.iso (different file from test #1) is because if I used the file from test #1, data would be served from the ZFS ARC (i.e. straight out of RAM) and I wanted to make sure that I was transferring data that wasn't already in the ARC. Note: Since Windows copy.exe doesn't provide throughput/speed indicators, and I didn't want to deal with GUI crap, I resorted to using netstat -i -b -n 1 on the FreeBSD box, which indicates network throughput, then issue a copy command in Windows. It's the best I can do; sorry. D:\>copy Y:\FreeBSD\9.1-RC\FreeBSD-9.1-RC1-amd64-disc1.iso
1 file(s) copied.
root@icarus:/root # netstat -inb 1
input (Total) output
packets errs idrops bytes packets errs bytes colls
1 0 0 60 1 0 90 0
1 0 0 60 1 0 90 0
1 0 0 60 1 0 90 0
2 0 0 150 2 0 180 0
1177 0 0 87688 442 0 13743784 0
3995 0 0 293736 950 0 47911412 0
4607 0 0 338778 1096 0 55280760 0
4471 0 0 328794 1063 0 53659096 0
4531 0 0 333306 1078 0 54500648 0
4458 0 0 327900 1061 0 53515502 0
4449 0 0 327132 1057 0 53392154 0
4269 0 0 313938 1016 0 51236930 0
4403 0 0 323859 1048 0 52899581 0
4516 0 0 332631 1085 0 54193717 0
4487 0 0 330207 1076 0 53762549 0
4493 0 0 330935 1080 0 53639864 0
4635 0 0 341031 1107 0 55179296 0
1435 0 0 106198 356 0 17016818 0
17 0 0 1784 17 0 1370 0
15 0 0 1644 16 0 1276 0
16 0 0 1714 15 0 1182 0
^C
Results: about 55MBytes/second. Checking ZFS ARC size after test #2, since that file's contents should now also be in the ARC: ARC: 1341M Total, 1320M MRU, 15M MFU, 16K Anon, 3482K Header, 2826K Other
Yup, it is. Test #3: same as test #2, except since the file contents are now in the ARC, there's basically no physical disk I/O needed for reading that file (it's completely in RAM), but obviously read(2) calls still have to be used to read from the filesystem: D:\>del FreeBSD-9.1-RC1-amd64-disc1.iso
D:\>copy Y:\FreeBSD\9.1-RC\FreeBSD-9.1-RC1-amd64-disc1.iso
1 file(s) copied.
root@icarus:/root # netstat -inb 1
input (Total) output
packets errs idrops bytes packets errs bytes colls
1 0 0 60 1 0 154 0
1 0 0 60 1 0 122 0
667 0 0 49906 294 0 7366712 0
3325 0 0 244587 791 0 39938494 0
3705 0 0 272403 881 0 44405301 0
3406 0 0 250416 808 0 40702159 0
3770 0 0 277386 899 0 45386522 0
4699 0 0 345495 1116 0 56323601 0
3336 0 0 245304 793 0 39938611 0
4662 0 0 342819 1109 0 55966313 0
4652 0 0 342162 1106 0 55859700 0
4639 0 0 341154 1106 0 55609528 0
4671 0 0 344180 1121 0 56102596 0
4484 0 0 330385 1080 0 53701382 0
4666 0 0 343756 1122 0 55979771 0
4664 0 0 343636 1121 0 55918136 0
582 0 0 43447 155 0 6732742 0
16 0 0 1714 17 0 1370 0
16 0 0 1714 15 0 1198 0
Bottom line: you can see from all the above evidence that physical disk I/O speed is not the bottleneck here (else FTP would have shown the same speed as CIFS/SMB), nor is the physical network. That leaves two possibilities: 1) Samba's file reading I/O model might not be very optimised (e.g. read(2) using 16KByte buffers rather than maybe 64KBytes), or 2) CIFS/SMB protocol or protocol implementation model. Hard to say which without looking at source code or using ktrace/truss (strace on Linux). I do not use iSCSI (nor will I -- no need), so I can't do those tests as a comparison. Welcome to all the nuances of doing "benchmarks" when combined with a network interface and filesystem-related protocols, and the fact that most of this crap (including Samba, as well as those NAS units -- I'm left with the impression the QNAP device actually uses Samba) is still "black box". -- Making life hard for others since 1977. I speak for myself and not my employer/affiliates of my employer. | |
|