head 1.1; access; symbols cjep_sun2x:1.1.0.18 cjep_sun2x-base:1.1 cjep_staticlib_x-base1:1.1 cjep_staticlib_x:1.1.0.16 cjep_staticlib_x-base:1.1 phil-wifi-20200421:1.1 phil-wifi-20200411:1.1 phil-wifi-20200406:1.1 pgoyette-compat-merge-20190127:1.1 pgoyette-compat-20190127:1.1 pgoyette-compat-20190118:1.1 pgoyette-compat-1226:1.1 pgoyette-compat-1126:1.1 pgoyette-compat-1020:1.1 pgoyette-compat-0930:1.1 pgoyette-compat-0906:1.1 pgoyette-compat-0728:1.1 pgoyette-compat-0625:1.1 pgoyette-compat-0521:1.1 pgoyette-compat-0502:1.1 pgoyette-compat-0422:1.1 pgoyette-compat-0415:1.1 pgoyette-compat-0407:1.1 pgoyette-compat-0330:1.1 pgoyette-compat-0322:1.1 pgoyette-compat-0315:1.1 pgoyette-compat:1.1.0.14 pgoyette-compat-base:1.1 prg-localcount2-base3:1.1 prg-localcount2-base2:1.1 prg-localcount2-base1:1.1 prg-localcount2:1.1.0.12 prg-localcount2-base:1.1 pgoyette-localcount-20170426:1.1 bouyer-socketcan-base1:1.1 pgoyette-localcount-20170320:1.1 bouyer-socketcan:1.1.0.10 bouyer-socketcan-base:1.1 pgoyette-localcount-20170107:1.1 pgoyette-localcount-20161104:1.1 localcount-20160914:1.1 pgoyette-localcount-20160806:1.1 pgoyette-localcount-20160726:1.1 pgoyette-localcount:1.1.0.8 pgoyette-localcount-base:1.1 yamt-pagecache-base9:1.1 tls-earlyentropy:1.1.0.4 tls-earlyentropy-base:1.1 riastradh-xf86-video-intel-2-7-1-pre-2-21-15:1.1 riastradh-drm2-base3:1.1 agc-symver:1.1.0.6 agc-symver-base:1.1 tls-maxphys-base:1.1 yamt-pagecache-base8:1.1 yamt-pagecache-base7:1.1 yamt-pagecache-base6:1.1 tls-maxphys:1.1.0.2; locks; strict; comment @# @; 1.1 date 2012.09.12.06.15.31; author tls; state dead; branches 1.1.2.1; next ; 1.1.2.1 date 2012.09.12.06.15.31; author tls; state Exp; branches; next ; desc @@ 1.1 log @file MAXPHYS-NOTES was initially added on branch tls-maxphys. @ text @@ 1.1.2.1 log @Initial snapshot of work to eliminate 64K MAXPHYS. Basically works for physio (I/O to raw devices); needs more doing to get it going with the filesystems, but it shouldn't damage data. All work's been done on amd64 so far. Not hard to add support to other ports. If others want to pitch in, one very helpful thing would be to sort out when and how IDE disks can do 128K or larger transfers, and adjust the various PCI IDE (or at least ahcisata) drivers and wd.c accordingly -- it would make testing much easier. Another very helpful thing would be to implement a smart minphys() for RAIDframe along the lines detailed in the MAXPHYS-NOTES file. @ text @a0 76 Notes on eliminating fixed (usually 64K) MAXPHYS, for more efficient operation both with single disk drives/SSDs (transfers in the 128K-256K range of sizes are advantageous for many workloads), and particularly with RAID sets (consider a typical 12-disk chassis of 2.5" SAS drives, set up as an entirely ordinary P+Q parity RAID array with a single hot spare. To feed 64K transfers to each of the resulting 8 data disks requires 512K transfers fed to the RAID controller -- is it any wonder NetBSD performs so poorly with such hardware for many workloads?). The basic approach taken here: 1) Propagate maximum-transfer size down the device tree at autoconf time. Drivers take the max of their own transfer-size limitations and their parents' limitations, apply that in their minphys() routines (if they are disk drivers) and propagate it down to their children. 2) This is just about sufficient, for physio, since once you've got the disk, you can find its minphys routine, and *that* can get access to the device-instance's softc which has the size determined by autoconf. 3) For filesystem I/O, however, we need to be able to find that maximum transfer size starting not with a device_t but with a disk driver name (or major number) and unit number. The "disk" interface within the kernel is extended to let us fish out the dkdevice's minphys routine starting with the data we've got. We then feed a fake, huge buffer to that minphys and see what we get back. This is stashed in the mount point's datastructure and is then available to the filesystem and pager code via vp->v_mount any time you've got a filesystem-backed vnode. The rest is a "simple" matter of making the necessary MD adjustments and figuring out where the rest of the hidden 64K bottlenecks are.... MAXPHYS is retained and is used as a default. A new MACHINE_MAXPHYS must be defined, and is the actual largest transfer any hardware for a given port can do, or which the portmaster considers appropriate. MACHINE_MAXPHYS is used to size some on-stack arrays in the pager code so don't go too crazy with it. ==== STATUS ==== All work done on amd64. Not hard to get it going on other ports. Every top-level bus attachment will need code to clamp transfer sizes appropriately; see the PCI or ISA code here, or for an unfortunate example of when you have to clamp more than you'd like, the pnpbios code. Access through physio: done? Disk drivers other than sd, cd, wd will need their minphys functions adjusted like those were, and will be limited to MAXPHYS per transfer until they do. A notable exception is RAIDframe. It could benefit immediately but needs something a little more sophisticated done to its minphys -- per-unit, it needs to sum up the maxphyses of the unit's data (not parity!) components and return that value. Access through filesystems - for read, controlled by uvm readahead code. We can stash the ra max size in the ra ctx -- we can get it from v_mount in the vnode (the uobj!) *if* we put it into struct mount. Then we only have to do the awful walk-the-device-list crap at mount time. This likely wins! Unfortunately, there is still a bottleneck, probably from the pager code (genfs I/O code). The genfs read/getpages code is repellent and huge. Haven't even started on it yet. I have attacked the genfs write path already, but though my printfs show the appropriate maxpages value propagates down, the resulting stream of I/O requests is 64K. This needs further investigation: with maxcontig now gone from the FFS code, where on earth are we still clamping the I/O size? @