C10K 问题 --- The C10K problem(http://www.kegel.com/c10k.html)
The C10K problem C10K 问题
[Help save the best Linux news source on the web -- subscribe to Linux Weekly News!]
[帮助保存 Web 上最好的 Linux 新闻源 -- 订阅 Linux Weekly News!
It's time for web servers to handle ten thousand clients simultaneously, don't you think? After all, the web is a big place now.
现在是 Web 服务器同时处理一万个客户端的时候了,你不觉得吗?毕竟,Web 现在是一个很大的地方。
And computers are big, too. You can buy a 1000MHz machine with 2 gigabytes of RAM and an 1000Mbit/sec Ethernet card for $1200 or so. Let's see - at 20000 clients, that's 50KHz, 100Kbytes, and 50Kbits/sec per client. It shouldn't take any more horsepower than that to take four kilobytes from the disk and send them to the network once a second for each of twenty thousand clients. (That works out to $0.08 per client, by the way. Those $100/client licensing fees some operating systems charge are starting to look a little heavy!) So hardware is no longer the bottleneck.
计算机也很大。您可以花 1000 美元左右购买一台具有 1000 GB RAM 和 1000Mbit/sec 以太网卡的 1200MHz 机器。让我们看看 - 在 20000 个客户端中,每个客户端为 50KHz、100KB 和 50Kbits/秒。从 20000 个客户端中的每一个客户端中每秒一次地从磁盘中取出 4 KB 并将它们发送到网络,这应该不需要更多的马力。(顺便说一句,每个客户的费用为 0.08 美元。一些操作系统收取的 100 美元/客户端许可费用开始看起来有点沉重了!因此,硬件不再是瓶颈。
In 1999 one of the busiest ftp sites, cdrom.com, actually handled 10000 clients simultaneously through a Gigabit Ethernet pipe. As of 2001, that same speed is now being offered by several ISPs, who expect it to become increasingly popular with large business customers.
1999 年,最繁忙的 ftp 站点之一 cdrom.com, 实际同时处理 10000 个客户端 通过千兆以太网管道。 截至 2001 年,现在的速度相同 由多个 ISP 提供,他们希望它在大型企业客户中越来越受欢迎。
And the thin client model of computing appears to be coming back in style -- this time with the server out on the Internet, serving thousands of clients.
瘦客户端计算模型似乎又回来了 —— 这一次服务器在 Internet 上,为数千个客户端提供服务。
With that in mind, here are a few notes on how to configure operating systems and write code to support thousands of clients. The discussion centers around Unix-like operating systems, as that's my personal area of interest, but Windows is also covered a bit.
考虑到这一点,这里有一些关于如何配置操作系统和编写代码以支持数千个客户端的说明。讨论围绕类 Unix 操作系统展开,因为这是我个人感兴趣的领域,但也对 Windows 进行了一些介绍。
Contents 内容
- The C10K problem C10K 问题
- Related Sites 相关网站
- Book to Read First 先读书
- I/O frameworks I/O 框架
- I/O Strategies I/O 策略
- Serve many clients with each thread, and use nonblocking I/O and level-triggered readiness notification
使用每个线程为多个客户端提供服务,并使用非阻塞 I/O 和级别触发的就绪通知- The traditional select() 传统的 select()
- The traditional poll() 传统的 poll()
- /dev/poll (Solaris 2.7+)
/dev/poll (Solaris 2.7+) - kqueue (FreeBSD, NetBSD)
kqueue (FreeBSD、NetBSD)
- Serve many clients with each thread, and use nonblocking I/O and readiness change notification
使用每个线程为多个客户端提供服务,并使用非阻塞 I/O 和就绪情况更改通知- epoll (Linux 2.6+)
epoll (Linux 2.6+) - Polyakov's kevent (Linux 2.6+)
Polyakov 的 kevent (Linux 2.6+) - Drepper's New Network Interface (proposal for Linux 2.6+)
Drepper 的新网络接口(针对 Linux 2.6+ 的提案) - Realtime Signals (Linux 2.4+)
实时信号 (Linux 2.4+) - Signal-per-fd 每 fd 个信号数
- kqueue (FreeBSD, NetBSD)
kqueue (FreeBSD、NetBSD)
- epoll (Linux 2.6+)
- Serve many clients with each thread, and use asynchronous I/O and completion notification
使用每个线程为多个客户端提供服务,并使用异步 I/O 和完成通知 - Serve one client with each server thread
为每个服务器线程提供一个客户端服务- LinuxThreads (Linux 2.0+)
Linux线程 (Linux 2.0+) - NGPT (Linux 2.4+)
NGPT (Linux 2.4+) - NPTL (Linux 2.6, Red Hat 9)
NPTL(Linux 2.6、Red Hat 9) - FreeBSD threading support
FreeBSD 线程支持 - NetBSD threading support NetBSD 线程支持
- Solaris threading support
Solaris 线程支持 - Java threading support in JDK 1.3.x and earlier
JDK 1.3.x 及更早版本中的 Java 线程支持 - Note: 1:1 threading vs. M:N threading
注意:1:1 螺纹加工与 M:N 螺纹加工
- LinuxThreads (Linux 2.0+)
- Build the server code into the kernel
将服务器代码构建到内核中 - Bring the TCP stack into userspace
将 TCP 堆栈引入用户空间
- Serve many clients with each thread, and use nonblocking I/O and level-triggered readiness notification
- Comments 评论
- Limits on open filehandles
打开文件句柄的限制 - Limits on threads 线程限制
- Java issues [Updated 27 May 2001]
Java 问题 [2001 年 5 月 27 日更新] - Other tips 其他提示
- Zero-Copy 零拷贝
- The sendfile() system call can implement zero-copy networking.
sendfile() 系统调用可以实现零拷贝网络。 - Avoid small frames by using writev (or TCP_CORK)
使用 writev(或 TCP_CORK)避免使用小帧 - Some programs can benefit from using non-Posix threads.
某些程序可以从使用非 Posix 线程中受益。 - Caching your own data can sometimes be a win.
缓存您自己的数据有时可能是一个胜利。
- Other limits 其他限制
- Kernel Issues 内核问题
- Measuring Server Performance
测量 Server 性能 - Examples 例子
- Interesting select()-based servers
有趣的基于 select() 的服务器 - Interesting /dev/poll-based servers
有趣的基于 /dev/poll 的服务器 - Interesting epoll-based servers
有趣的基于 epoll 的服务器 - Interesting kqueue()-based servers
基于 kqueue() 的有趣服务器 - Interesting realtime signal-based servers
有趣的基于实时信号的服务器 - Interesting thread-based servers
有趣的基于线程的服务器 - Interesting in-kernel servers
有趣的内核服务器
- Interesting select()-based servers
- Other interesting links 其他有趣的链接
Related Sites 相关网站
See Nick Black's execellent Fast UNIX Servers page for a circa-2009 look at the situation.
请参阅 Nick Black 的卓越 Fast UNIX 服务器 页 关于大约 2009 年的情况。
In October 2003, Felix von Leitner put together an excellent web page and presentation about network scalability, complete with benchmarks comparing various networking system calls and operating systems. One of his observations is that the 2.6 Linux kernel really does beat the 2.4 kernel, but there are many, many good graphs that will give the OS developers food for thought for some time. (See also the Slashdot comments; it'll be interesting to see whether anyone does followup benchmarks improving on Felix's results.)
2003 年 10 月,Felix von Leitner 整理了一个关于网络可伸缩性的优秀网页和演示文稿,并完成了比较各种网络系统调用和操作系统的基准测试。他的一个观察是,2.6 Linux 内核确实击败了 2.4 内核,但有很多很多好的图表可以让操作系统开发人员思考一段时间。(另请参见 Slashdot 的评论;看看是否有人对 Felix 的结果进行后续基准测试改进会很有趣。
Book to Read First 先读书
If you haven't read it already, go out and get a copy of Unix Network Programming : Networking Apis: Sockets and Xti (Volume 1) by the late W. Richard Stevens. It describes many of the I/O strategies and pitfalls related to writing high-performance servers. It even talks about the 'thundering herd' problem. And while you're at it, go read Jeff Darcy's notes on high-performance server design.
如果您还没有阅读它,请出去获取一份 Unix Network Programming : Networking Apis: Sockets and Xti (Volume 1) 已故的 W. Richard Stevens 所著。它描述了许多与编写高性能服务器相关的 I/O 策略和陷阱。它甚至谈到了“雷霆群”问题。当您使用它时,请阅读 Jeff Darcy 关于高性能服务器设计的笔记。
(Another book which might be more helpful for those who are using rather than writing a web server is Building Scalable Web Sites by Cal Henderson.)
(另一本书可能对那些人更有帮助 谁 使用 而不是 编写 Web 服务器是 构建可扩展的网站,作者:Cal Henderson。
I/O frameworks I/O 框架
Prepackaged libraries are available that abstract some of the techniques presented below, insulating your code from the operating system and making it more portable.
可以使用预打包的库来抽象下面介绍的一些技术,从而将代码与操作系统隔离开来,使其更具可移植性。
-
ACE, a heavyweight C++ I/O framework, contains object-oriented implementations of some of these I/O strategies and many other useful things. In particular, his Reactor is an OO way of doing nonblocking I/O, and Proactor is an OO way of doing asynchronous I/O.
ACE 是一个重量级的 C++ I/O 框架,包含其中一些 I/O 策略的面向对象实现和许多其他有用的内容。特别是,他的 Reactor 是执行非阻塞 I/O 的 OO 方式,而 Proactor 是执行异步 I/O 的 OO 方式。 -
ASIO is an C++ I/O framework which is becoming part of the Boost library. It's like ACE updated for the STL era.
ASIO 是一个 C++ I/O 框架,它正在成为 Boost 库的一部分。这就像 ACE 为 STL 时代更新。 -
libevent is a lightweight C I/O framework by Niels Provos. It supports kqueue and select, and soon will support poll and epoll. It's level-triggered only, I think, which has both good and bad sides. Niels has a nice graph of time to handle one event as a function of the number of connections. It shows kqueue and sys_epoll as clear winners.
libevent 是一个轻量级的 C 语言 I/O 框架由 Niels Provos 提供。 它支持 kqueue 和 select, 并且很快将支持 Polor 和 ePoll。 我认为,它只是关卡触发的, 它既有好的一面,也有坏的一面。 Niels 有 处理一个事件的漂亮时间图 作为连接数的函数。 它显示 kqueue 和 sys_epoll 作为明显的赢家。 -
My own attempts at lightweight frameworks (sadly, not kept up to date):
我自己对轻量级框架的尝试(遗憾的是,没有跟上更新):
- Poller is a lightweight C++ I/O framework that implements a level-triggered readiness API using whatever underlying readiness API you want (poll, select, /dev/poll, kqueue, or sigio). It's useful for benchmarks that compare the performance of the various APIs. This document links to Poller subclasses below to illustrate how each of the readiness APIs can be used.
Poller 是一个轻量级 C++ I/O 框架,它使用您想要的任何底层就绪 API(poll、select、/dev/poll、kqueue 或 sigio)实现级别触发的就绪 API。它对于比较各种 API 性能的基准测试非常有用。本文档链接到下面的 Poller 子类,以说明如何使用每个就绪 API。 - rn is a lightweight C I/O framework that was my second try after Poller. It's lgpl (so it's easier to use in commercial apps) and C (so it's easier to use in non-C++ apps). It was used in some commercial products.
rn 是一个轻量级的 C I/O 框架,这是我在 Poller 之后的第二次尝试。它是 lgpl(因此更容易在商业应用程序中使用)和 C(因此更容易在非 C++ 应用程序中使用)。它被用于一些商业产品中。
- Poller is a lightweight C++ I/O framework that implements a level-triggered readiness API using whatever underlying readiness API you want (poll, select, /dev/poll, kqueue, or sigio). It's useful for benchmarks that compare the performance of the various APIs. This document links to Poller subclasses below to illustrate how each of the readiness APIs can be used.
-
Matt Welsh wrote a paper in April 2000 about how to balance the use of worker thread and event-driven techniques when building scalable servers. The paper describes part of his Sandstorm I/O framework.
Matt Welsh 在 2000 年 4 月写了一篇论文,介绍了在构建可伸缩服务器时如何平衡 worker 线程和事件驱动技术的使用。该白皮书介绍了他的 Sandstorm I/O 框架的一部分。 -
Cory Nelson's Scale! library - an async socket, file, and pipe I/O library for Windows
Cory Nelson 的 Scale! 库 - 用于 Windows 的异步套接字、文件和管道 I/O 库
I/O Strategies I/O 策略
Designers of networking software have many options. Here are a few:
网络软件的设计者有很多选择。 以下是一些:
-
Whether and how to issue multiple I/O calls from a single thread
是否以及如何从单个线程发出多个 I/O 调用
- Don't; use blocking/synchronous calls throughout, and possibly use multiple threads or processes to achieve concurrency
不要;在整个过程中使用阻塞/同步调用,并可能使用多个线程或进程来实现并发 - Use nonblocking calls (e.g. write() on a socket set to O_NONBLOCK) to start I/O, and readiness notification (e.g. poll() or /dev/poll) to know when it's OK to start the next I/O on that channel. Generally only usable with network I/O, not disk I/O.
使用非阻塞调用(例如,在设置为 O_NONBLOCK的套接字上 write() )来启动 I/O,并使用就绪通知(例如 poll() 或 /dev/poll)来了解何时可以在该通道上启动下一个 I/O。通常只能用于网络 I/O,而不能用于磁盘 I/O。 - Use asynchronous calls (e.g. aio_write()) to start I/O, and completion notification (e.g. signals or completion ports) to know when the I/O finishes. Good for both network and disk I/O.
使用异步调用 (例如 aio_write()) 来启动 I/O,并使用完成通知 (例如信号或完成端口) 来了解 I/O 何时完成。适用于网络和磁盘 I/O。
- Don't; use blocking/synchronous calls throughout, and possibly use multiple threads or processes to achieve concurrency
-
How to control the code servicing each client
如何控制为每个客户端提供服务的代码
-
one process for each client (classic Unix approach, used since 1980 or so)
每个客户端一个进程(经典的 Unix 方法,自 1980 年左右开始使用) -
one OS-level thread handles many clients; each client is controlled by:
一个 OS 级线程处理多个客户端;每个客户端由以下因素控制:
-
a user-level thread (e.g. GNU state threads, classic Java with green threads)
用户级线程(例如 GNU 状态线程、带有绿色线程的经典 Java) -
a state machine (a bit esoteric, but popular in some circles; my favorite)
状态机(有点深奥,但在某些圈子里很流行,我最喜欢的) -
a continuation (a bit esoteric, but popular in some circles)
延续(有点深奥,但在某些圈子里很流行) -
one OS-level thread for each client (e.g. classic Java with native threads)
每个客户端一个操作系统级线程(例如,具有本机线程的经典 Java) -
one OS-level thread for each active client (e.g. Tomcat with apache front end; NT completion ports; thread pools)
每个活动客户端一个操作系统级线程(例如,带有 apache 前端的 Tomcat;NT 完成端口;thread pools)
-
-
Whether to use standard O/S services, or put some code into the kernel (e.g. in a custom driver, kernel module, or VxD)
是使用标准的 O/S 服务,还是将一些代码放入内核中(例如,在自定义驱动程序、内核模块或 VxD 中)
The following five combinations seem to be popular:
以下五种组合似乎很受欢迎:
- Serve many clients with each thread, and use nonblocking I/O and level-triggered readiness notification
使用每个线程为多个客户端提供服务,并使用非阻塞 I/O 和级别触发的就绪通知 - Serve many clients with each thread, and use nonblocking I/O and readiness change notification
使用每个线程为多个客户端提供服务,并使用非阻塞 I/O 和就绪情况更改通知 - Serve many clients with each server thread, and use asynchronous I/O
使用每个服务器线程为多个客户端提供服务,并使用异步 I/O - serve one client with each server thread, and use blocking I/O
每个服务器线程为一个客户端提供服务,并使用阻塞 I/O - Build the server code into the kernel
将服务器代码构建到内核中
1. Serve many clients with each thread, and use nonblocking I/O and level-triggered readiness notification 1. 使用每个线程为多个客户端提供服务,并使用非阻塞 I/O 和级别触发的就绪通知
... set nonblocking mode on all network handles, and use select() or poll() to tell which network handle has data waiting. This is the traditional favorite. With this scheme, the kernel tells you whether a file descriptor is ready, whether or not you've done anything with that file descriptor since the last time the kernel told you about it. (The name 'level triggered' comes from computer hardware design; it's the opposite of 'edge triggered'. Jonathon Lemon introduced the terms in his BSDCON 2000 paper on kqueue().)
...在所有网络句柄上设置非阻塞模式,并使用 select() 或 poll() 来判断哪个网络句柄正在等待数据。这是传统的最爱。使用这种方案,内核会告诉你文件描述符是否准备好了,自上次内核告诉你以来,你是否对该文件描述符做了任何事情。(“level triggered”这个名字来自计算机硬件设计;它与“edge triggered”相反。 乔纳森·莱蒙 (Jonathon Lemon) 在他的 BSDCON 2000 关于 kqueue() 的论文。
Note: it's particularly important to remember that readiness notification from the kernel is only a hint; the file descriptor might not be ready anymore when you try to read from it. That's why it's important to use nonblocking mode when using readiness notification.
注意:特别重要的是要记住,来自内核的就绪通知只是一个提示;当您尝试从文件描述符中读取时,文件描述符可能不再准备就绪。这就是为什么在使用就绪通知时使用非阻塞模式很重要的原因。
An important bottleneck in this method is that read() or sendfile() from disk blocks if the page is not in core at the moment; setting nonblocking mode on a disk file handle has no effect. Same thing goes for memory-mapped disk files. The first time a server needs disk I/O, its process blocks, all clients must wait, and that raw nonthreaded performance goes to waste.
此方法中的一个重要瓶颈是 read() 或 sendfile() 如果页面目前不在 core 中,则从 disk blocks 开始; 在磁盘文件句柄上设置非阻塞模式不起作用。 内存映射磁盘文件也是如此。 服务器第一次需要磁盘 I/O 时,其进程会阻止 所有客户端都必须等待,并且原始非线程性能将浪费掉。
This is what asynchronous I/O is for, but on systems that lack AIO, worker threads or processes that do the disk I/O can also get around this bottleneck. One approach is to use memory-mapped files, and if mincore() indicates I/O is needed, ask a worker to do the I/O, and continue handling network traffic. Jef Poskanzer mentions that Pai, Druschel, and Zwaenepoel's 1999 Flash web server uses this trick; they gave a talk at Usenix '99 on it. It looks like mincore() is available in BSD-derived Unixes like FreeBSD and Solaris, but is not part of the Single Unix Specification. It's available as part of Linux as of kernel 2.3.51, thanks to Chuck Lever.
这就是异步 I/O 的用途,但在缺乏 AIO 的系统上, 执行磁盘 I/O 的工作线程或进程也可以解决此问题 瓶颈。 一种方法是使用内存映射文件 如果 mincore() 指示需要 I/O,请让 worker 执行 I/O。 并继续处理网络流量。 Jef Poskanzer 提到 Pai、Druschel 和 Zwaenepoel 的 1999 年 Flash Web 服务器使用了这个技巧;他们在 Usenix '99 在上面。 看起来 mincore() 在 BSD 派生的 Unix 中可用 与 FreeBSD 和 Solaris 类似,但不是单一 Unix 规范的一部分。 从内核 2.3.51 开始,它作为 Linux 的一部分提供。 感谢 Chuck Lever。
But in November 2003 on the freebsd-hackers list, Vivek Pei et al reported very good results using system-wide profiling of their Flash web server to attack bottlenecks. One bottleneck they found was mincore (guess that wasn't such a good idea after all) Another was the fact that sendfile blocks on disk access; they improved performance by introducing a modified sendfile() that return something like EWOULDBLOCK when the disk page it's fetching is not yet in core. (Not sure how you tell the user the page is now resident... seems to me what's really needed here is aio_sendfile().) The end result of their optimizations is a SpecWeb99 score of about 800 on a 1GHZ/1GB FreeBSD box, which is better than anything on file at spec.org.
但是在 2003 年 11 月,在 freebsd 黑客列表中,Vivek Pei 等人报告了使用他们的 Flash Web 服务器的系统范围分析来攻击瓶颈的非常好的结果。他们发现的一个瓶颈是 mincore(猜这毕竟不是一个好主意)另一个是 sendfile 阻止磁盘访问的事实;他们通过引入修改后的 sendfile() 来提高性能,当它正在获取的磁盘页面尚未位于核心中时,该 sendfile() 会返回类似 EWOULDBLOCK 的内容。(不确定您如何告诉用户该页面现在是常驻的......在我看来,这里真正需要的是 aio_sendfile()。他们优化的最终结果是在 1GHZ/1GB FreeBSD 机器上获得大约 800 的 SpecWeb99 分数,这比 spec.org 上的任何文件都要好。
There are several ways for a single thread to tell which of a set of nonblocking sockets are ready for I/O:
单个线程有几种方法可以判断一组非阻塞套接字中的哪些已准备好进行 I/O:
-
The traditional select() 传统的 select()
Unfortunately, select() is limited to FD_SETSIZE handles. This limit is compiled in to the standard library and user programs. (Some versions of the C library let you raise this limit at user app compile time.)
不幸的是,select() 仅限于 FD_SETSIZE 句柄。 此限制被编译到标准库和用户程序中。 (某些版本的 C 库允许您在用户应用程序编译时提高此限制。
See Poller_select (cc, h) for an example of how to use select() interchangeably with other readiness notification schemes.
看 Poller_select (CC, h) 有关如何与其他就绪通知方案互换使用 select() 的示例。 -
The traditional poll() 传统的 poll()
There is no hardcoded limit to the number of file descriptors poll() can handle, but it does get slow about a few thousand, since most of the file descriptors are idle at any one time, and scanning through thousands of file descriptors takes time.
poll() 可以处理的文件描述符数量没有硬编码限制, 但是它确实会慢到几千个,因为大多数文件描述符 处于空闲状态,并扫描数千个文件描述符 需要时间。
Some OS's (e.g. Solaris 8) speed up poll() et al by use of techniques like poll hinting, which was implemented and benchmarked by Niels Provos for Linux in 1999.
一些操作系统(例如 Solaris 8)通过使用 poll hinting、 那是 由 Niels Provos 于 1999 年为 Linux 实施和基准测试。See Poller_poll (cc, h, benchmarks) for an example of how to use poll() interchangeably with other readiness notification schemes.
看 Poller_poll (CC, h, 基准测试)有关如何将 poll() 与其他就绪通知方案互换使用的示例。 -
/dev/poll ' dev/poll 中
This is the recommended poll replacement for Solaris.
这是 Solaris 的推荐轮询替换。
The idea behind /dev/poll is to take advantage of the fact that often poll() is called many times with the same arguments. With /dev/poll, you get an open handle to /dev/poll, and tell the OS just once what files you're interested in by writing to that handle; from then on, you just read the set of currently ready file descriptors from that handle.
/dev/poll 背后的想法是利用 poll() 经常使用相同的参数被多次调用的事实。使用 /dev/poll,您可以获得 /dev/poll 的开放句柄,并通过写入该句柄来告诉操作系统您感兴趣的文件一次;从那时起,您只需从该 handle 中读取当前就绪的文件描述符集。It appeared quietly in Solaris 7 (see patchid 106541) but its first public appearance was in Solaris 8; according to Sun, at 750 clients, this has 10% of the overhead of poll().
它悄无声息地出现在 Solaris 7 中(请参见 patchid 106541) 但它的第一次公开露面是在 索拉里斯 8; 根据 Sun 的说法,在 750 个客户端中,这相当于 poll() 的 10% 的开销。Various implementations of /dev/poll were tried on Linux, but none of them perform as well as epoll, and were never really completed. /dev/poll use on Linux is not recommended.
在 Linux 上尝试了各种 /dev/poll 的实现,但没有一个能像 epoll 那样表现得好,而且从未真正完成。不建议在 Linux 上使用 /dev/poll。See Poller_devpoll (cc, h benchmarks ) for an example of how to use /dev/poll interchangeably with many other readiness notification schemes. (Caution - the example is for Linux /dev/poll, might not work right on Solaris.)
看 Poller_devpoll (CC, hbenchmarks ) 了解如何将 /dev/poll 与许多其他就绪通知方案互换使用。(注意 - 该示例适用于 Linux /dev/poll,可能无法在 Solaris 上运行。 -
kqueue() kqueue()
This is the recommended poll replacement for FreeBSD (and, soon, NetBSD).
这是 FreeBSD(以及即将推出的 NetBSD)的推荐 poll 替代品。
See below. kqueue() can specify either edge triggering or level triggering.
见下文。kqueue() 可以指定 Edge Triggering(边缘触发)或 Level Triggering(级别触发)。
2. Serve many clients with each thread, and use nonblocking I/O and readiness change notification 2. 使用每个线程为多个客户端提供服务,并使用非阻塞 I/O 和就绪情况更改通知
Readiness change notification (or edge-triggered readiness notification) means you give the kernel a file descriptor, and later, when that descriptor transitions from not ready to ready, the kernel notifies you somehow. It then assumes you know the file descriptor is ready, and will not send any more readiness notifications of that type for that file descriptor until you do something that causes the file descriptor to no longer be ready (e.g. until you receive the EWOULDBLOCK error on a send, recv, or accept call, or a send or recv transfers less than the requested number of bytes).
就绪情况更改通知(或边缘触发的就绪情况通知) 表示您为内核提供文件描述符,稍后,当该描述符从 Not ready to ready,内核会以某种方式通知您。 然后,它假定您 知道文件描述符已准备就绪,并且不会再发送任何就绪情况 该文件描述符的该类型的通知,直到您执行某些操作 这会导致文件描述符不再准备就绪(例如,直到您收到 发送、接收或接受调用,或者发送或 recv 传输时出现 EWOULDBLOCK 错误 小于请求的字节数)。
When you use readiness change notification, you must be prepared for spurious events, since one common implementation is to signal readiness whenever any packets are received, regardless of whether the file descriptor was already ready.
使用就绪情况更改通知时,必须为虚假事件做好准备,因为一种常见的实现是在收到任何数据包时发出就绪信号,而不管文件描述符是否已准备就绪。
This is the opposite of "level-triggered" readiness notification. It's a bit less forgiving of programming mistakes, since if you miss just one event, the connection that event was for gets stuck forever. Nevertheless, I have found that edge-triggered readiness notification made programming nonblocking clients with OpenSSL easier, so it's worth trying.
这与 “level-triggered” 就绪通知相反。它对编程错误的容忍度较低,因为如果你只错过一个事件,该事件所用于的连接将永远卡住。尽管如此,我发现边缘触发的就绪通知使使用 OpenSSL 对非阻塞客户端进行编程变得更加容易,因此值得一试。
[Banga, Mogul, Drusha '99] described this kind of scheme in 1999.
[Banga, Mogul, Drusha '99] 在 1999 年描述了这种计划。
There are several APIs which let the application retrieve 'file descriptor became ready' notifications:
有几个 API 允许应用程序检索“file descriptor became ready”通知:
-
kqueue()
This is the recommended edge-triggered poll replacement for FreeBSD (and, soon, NetBSD).
kqueue() 这是 FreeBSD(以及即将推出的 NetBSD)的推荐边缘触发轮询替代品。
FreeBSD 4.3 and later, and NetBSD-current as of Oct 2002, support a generalized alternative to poll() called kqueue()/kevent(); it supports both edge-triggering and level-triggering. (See also Jonathan Lemon's page and his BSDCon 2000 paper on kqueue().)
FreeBSD 4.3 和更高版本,以及截至 2002 年 10 月的 NetBSD 版本, 支持名为 kqueue()/kevent();它支持边沿触发和电平触发。(另请参见 Jonathan Lemon 的页面和他在 kqueue() 上的 BSDCon 2000 论文。Like /dev/poll, you allocate a listening object, but rather than opening the file /dev/poll, you call kqueue() to allocate one. To change the events you are listening for, or to get the list of current events, you call kevent() on the descriptor returned by kqueue(). It can listen not just for socket readiness, but also for plain file readiness, signals, and even for I/O completion.
与 /dev/poll 一样,您分配了一个侦听对象,但不是打开文件 /dev/poll,而是调用 kqueue() 来分配一个。要更改您正在侦听的事件或获取当前事件的列表,请在 kqueue() 返回的描述符上调用 kevent()。它不仅可以侦听套接字准备情况,还可以侦听普通文件准备情况、信号,甚至 I/O 完成。Note: as of October 2000, the threading library on FreeBSD does not interact well with kqueue(); evidently, when kqueue() blocks, the entire process blocks, not just the calling thread.
注意: 截至 2000 年 10 月,FreeBSD 上的线程库与 kqueue() 的交互效果不佳;显然,当 kqueue() 阻塞时,整个进程都会阻塞,而不仅仅是调用线程。See Poller_kqueue (cc, h, benchmarks) for an example of how to use kqueue() interchangeably with many other readiness notification schemes.
看 Poller_kqueue (CC, h, 基准测试)有关如何将 kqueue() 与许多其他就绪通知方案互换使用的示例。Examples and libraries using kqueue():
使用 kqueue() 的示例和库:- PyKQueue -- a Python binding for kqueue()
PyKQueue -- kqueue() 的 Python 绑定 - Ronald F. Guilmette's example echo server; see also his 28 Sept 2000 post on freebsd.questions.
Ronald F. Guilmette 的示例回声服务器;另请参阅 他于 2000 年 9 月 28 日在 freebsd.questions 上发表的帖子。
- PyKQueue -- a Python binding for kqueue()
-
epoll 电子投票
This is the recommended edge-triggered poll replacement for the 2.6 Linux kernel.
这是 2.6 Linux 内核的推荐边缘触发轮询替换。
On 11 July 2001, Davide Libenzi proposed an alternative to realtime signals; his patch provides what he now calls /dev/epoll www.xmailserver.org/linux-patches/nio-improve.html. This is just like the realtime signal readiness notification, but it coalesces redundant events, and has a more efficient scheme for bulk event retrieval.
2001 年 7 月 11 日,Davide Libenzi 提出了实时信号的替代方案;他 Patch 提供了他现在所说的 /dev/epoll www.xmailserver.org/linux-patches/nio-improve.html。这就像实时信号就绪通知一样,但它合并了冗余事件,并且具有更高效的批量事件检索方案。Epoll was merged into the 2.5 kernel tree as of 2.5.46 after its interface was changed from a special file in /dev to a system call, sys_epoll. A patch for the older version of epoll is available for the 2.4 kernel.
Epoll 的接口从 /dev 中的特殊文件更改为系统调用 sys_epoll 后,从 2.5.46 开始被合并到 2.5 内核树中。旧版本的 epoll 补丁可用于 2.4 内核。There was a lengthy debate about unifying epoll, aio, and other event sources on the linux-kernel mailing list around Halloween 2002. It may yet happen, but Davide is concentrating on firming up epoll in general first.
关于以下内容进行了长时间的辩论 在 2002 年万圣节前后将 epoll、aio 和其他事件源统一到 linux-kernel 邮件列表中。它可能还会发生,但 Davide 首先会专注于巩固 epoll。 -
Polyakov's kevent
(Linux 2.6+)
News flash: On 9 Feb 2006, and again on 9 July 2006, Evgeniy Polyakov posted patches which seem to unify epoll and aio; his goal is to support network AIO.
See:
Polyakov 的 kevent (Linux 2.6+) 新闻快讯:在 2006 年 2 月 9 日和 2006 年 7 月 9 日,叶夫根尼·波利亚科夫 (Evgeniy Polyakov) 发布了似乎将 epoll 和 aio 统一起来的补丁;他的目标是支持网络 AIO。 看:
-
Drepper's New Network Interface
(proposal for Linux 2.6+)
Drepper 的新网络接口(针对 Linux 2.6+ 的提案)
At OLS 2006, Ulrich Drepper proposed a new high-speed asynchronous networking API. See:
在 OLS 2006 大会上,Ulrich Drepper 提出了一种新的高速异步网络 API。 看:
-
Realtime Signals 实时信号
This is the recommended edge-triggered poll replacement for the 2.4 Linux kernel.
这是 2.4 Linux 内核的推荐边缘触发轮询替换。
The 2.4 linux kernel can deliver socket readiness events via a particular realtime signal. Here's how to turn this behavior on:
2.4 linux 内核可以通过特定的实时信号提供套接字就绪事件。以下是开启此行为的方法:/* Mask off SIGIO and the signal you want to use. */ sigemptyset(&sigset); sigaddset(&sigset, signum); sigaddset(&sigset, SIGIO); sigprocmask(SIG_BLOCK, &m_sigset, NULL); /* For each file descriptor, invoke F_SETOWN, F_SETSIG, and set O_ASYNC. */ fcntl(fd, F_SETOWN, (int) getpid()); fcntl(fd, F_SETSIG, signum); flags = fcntl(fd, F_GETFL); flags |= O_NONBLOCK|O_ASYNC; fcntl(fd, F_SETFL, flags); /* 屏蔽 SIGIO 和你想使用的信号。*/ sigemptyset(&sigset); sigaddset(&sigset, signum); sigaddset(&sigset, SIGIO); sigprocmask(SIG_BLOCK, &m_sigset, NULL); /* 对于每个文件描述符,调用 F_SETOWN、F_SETSIG 并设置 O_ASYNC。*/ fcntl(fd, F_SETOWN, (int) getpid()); fcntl(fd, F_SETSIG, signum); 标志 = fcntl(fd, F_GETFL); 标志 |= O_NONBLOCK|O_ASYNC; fcntl(fd, F_SETFL, flags);
This sends that signal when a normal I/O function like read() or write() completes. To use this, write a normal poll() outer loop, and inside it, after you've handled all the fd's noticed by poll(), you loop calling
sigwaitinfo()
.
这会在正常 I/O 函数(如 read() 或 write())完成时发送该信号。 要使用它,请编写一个普通的 poll() 外部循环,并在处理完所有 fd 被 poll() 注意到,你循环调用 sigwaitinfo() 的
If sigwaitinfo or sigtimedwait returns your realtime signal, siginfo.si_fd and siginfo.si_band give almost the same information as pollfd.fd and pollfd.revents would after a call to poll(), so you handle the i/o, and continue calling sigwaitinfo().
如果 sigwaitinfo 或 sigtimedwait 返回您的实时信号,则 siginfo.si_fd 和 siginfo.si_band提供与 pollfd.fd 和 pollfd.revents 几乎相同的信息 在调用 poll() 之后,您处理 I/O,并继续调用 sigwaitinfo()。
If sigwaitinfo returns a traditional SIGIO, the signal queue overflowed, so you
flush the signal queue by temporarily changing the signal handler to SIG_DFL
, and break back to the outer poll() loop.
如果 sigwaitinfo 返回传统的 SIGIO,则信号队列溢出, 所以你 通过临时将 Signal Handler 更改为 SIG_DFL 来刷新 Signal 队列, 并中断回外部 poll() 循环。
See Poller_sigio (cc, h) for an example of how to use rtsignals interchangeably with many other readiness notification schemes.
看 Poller_sigio (CC, h) 有关如何将 RTsignals 与许多其他就绪通知方案互换使用的示例。See Zach Brown's phhttpd for example code that uses this feature directly. (Or don't; phhttpd is a bit hard to figure out...)
有关直接使用此功能的示例代码,请参阅 Zach Brown 的 phhttpd。(或者不要;phhttpd 有点难弄清楚......[Provos, Lever, and Tweedie 2000] describes a recent benchmark of phhttpd using a variant of sigtimedwait(), sigtimedwait4(), that lets you retrieve multiple signals with one call. Interestingly, the chief benefit of sigtimedwait4() for them seemed to be it allowed the app to gauge system overload (so it could behave appropriately). (Note that poll() provides the same measure of system overload.)
[Provos, Lever, and Tweedie 2000] 描述了 phhttpd 的最新基准测试,它使用 sigtimedwait() 的变体 sigtimedwait4(),它允许你通过一次调用检索多个信号。有趣的是,sigtimedwait4() 对他们来说的主要好处似乎是它允许应用程序测量系统过载(因此它可以正常运行)。(请注意, poll() 提供相同的系统过载度量.) -
Signal-per-fd 每 fd 个信号数
Chandra and Mosberger proposed a modification to the realtime signal approach called "signal-per-fd" which reduces or eliminates realtime signal queue overflow by coalescing redundant events. It doesn't outperform epoll, though. Their paper (
www.hpl.hp.com/techreports/2000/HPL-2000-174.html
) compares performance of this scheme with select() and /dev/poll.
Chandra 和 Mosberger 提出了一种对实时信号方法的修改,称为 “Signal-per-fd”,减少或消除实时信号队列溢出 通过合并冗余事件。 不过,它的表现并不优于 epoll。 他们的论文 ( www.hpl.hp.com/techreports/2000/HPL-2000-174.html)将此方案的性能与 select() 和 /dev/poll 进行比较。
Vitaly Luban announced a patch implementing this scheme on 18 May 2001; his patch lives at www.luban.org/GPL/gpl.html. (Note: as of Sept 2001, there may still be stability problems with this patch under heavy load. dkftpbench at about 4500 users may be able to trigger an oops.)
Vitaly Luban 于 2001 年 5 月 18 日宣布了实施此方案的补丁;他的补丁住在 www.luban.org/GPL/gpl.html。 (注意:截至 2001 年 9 月,此补丁在重负载下可能仍然存在稳定性问题。 DKFTPbench 在大约 4500 个用户时可能能够触发 OOPS。See Poller_sigfd (cc, h) for an example of how to use signal-per-fd interchangeably with many other readiness notification schemes.
看 Poller_sigfd (CC, h) 有关如何将 signal-per-fd 与许多其他就绪通知方案互换使用的示例。
3. Serve many clients with each server thread, and use asynchronous I/O 3. 使用每个服务器线程为多个客户端提供服务,并使用异步 I/O
This has not yet become popular in Unix, probably because few operating systems support asynchronous I/O, also possibly because it (like nonblocking I/O) requires rethinking your application. Under standard Unix, asynchronous I/O is provided by the aio_ interface (scroll down from that link to "Asynchronous input and output"), which associates a signal and value with each I/O operation. Signals and their values are queued and delivered efficiently to the user process. This is from the POSIX 1003.1b realtime extensions, and is also in the Single Unix Specification, version 2.
这在 Unix 中还没有流行起来,可能是因为很少有操作系统支持异步 I/O,也可能是因为它(如非阻塞 I/O)需要重新考虑您的应用程序。在标准 Unix 下,异步 I/O 由 aio_ 接口提供(从该链接向下滚动到“异步输入和输出”),该接口将信号和值与每个 I/O 操作相关联。信号及其值将排队并有效地传送到用户进程。这是来自 POSIX 1003.1b 实时扩展的,也是 Single Unix 规范版本 2 中的内容。
AIO is normally used with edge-triggered completion notification, i.e. a signal is queued when the operation is complete. (It can also be used with level triggered completion notification by calling aio_suspend(), though I suspect few people do this.)
AIO 通常与边缘触发的完成通知一起使用,即 信号在操作完成时排队。 (也可以使用 通过调用 Level 触发完成通知 aio_suspend(),尽管我怀疑很少有人这样做。
glibc 2.1 and later provide a generic implementation written for standards compliance rather than performance.
glibc 2.1 及更高版本提供了为符合标准而非性能而编写的通用实现。
Ben LaHaise's implementation for Linux AIO was merged into the main Linux kernel as of 2.5.32. It doesn't use kernel threads, and has a very efficient underlying api, but (as of 2.6.0-test2) doesn't yet support sockets. (There is also an AIO patch for the 2.4 kernels, but the 2.5/2.6 implementation is somewhat different.) More info:
Ben LaHise 的 Linux AIO 实现在 2.5.32 时被合并到主 Linux 内核中。它不使用内核线程,并且具有非常高效的底层 api,但(截至 2.6.0-test2)尚不支持套接字。(2.4 内核也有一个 AIO 补丁,但 2.5/2.6 实现略有不同。更多信息:
- The page "Kernel Asynchronous I/O (AIO) Support for Linux" which tries to tie together all info about the 2.6 kernel's implementation of AIO (posted 16 Sept 2003)
“Kernel Asynchronous I/O (AIO) SUPPORT FOR Linux” 页面试图将有关 2.6 内核的 AIO 实现的所有信息联系在一起(发布于 2003 年 9 月 16 日) - Round 3: aio vs /dev/epoll by Benjamin C.R. LaHaise (presented at 2002 OLS)
第 3 轮:aio vs /dev/epoll,作者:Benjamin C.R. LaHaise(在 2002 年 OLS 上展示) - Asynchronous I/O Suport in Linux 2.5, by Bhattacharya, Pratt, Pulaverty, and Morgan, IBM; presented at OLS '2003
Linux 2.5 中的异步 I/O 支持,作者:IBM 的 Bhattacharya、Pratt、Pulaverty 和 Morgan;在 OLS '2003 上发表 - Design Notes on Asynchronous I/O (aio) for Linux by Suparna Bhattacharya -- compares Ben's AIO with SGI's KAIO and a few other AIO projects
Linux 异步 I/O (aio) 设计说明 作者 Suparna Bhattacharya -- 将 Ben 的 AIO 与 SGI 的 KAIO 和其他一些 AIO 项目进行比较 - Linux AIO home page - Ben's preliminary patches, mailing list, etc.
Linux AIO 主页 - Ben 的初步补丁、邮件列表等。 - linux-aio mailing list archives
linux-aio 邮件列表存档 - libaio-oracle - library implementing standard Posix AIO on top of libaio. First mentioned by Joel Becker on 18 Apr 2003.
libaio-oracle - 在 libaio 之上实现标准 Posix AIO 的库。 Joel Becker 于 2003 年 4 月 18 日首次提及。
Suparna also suggests having a look at the the DAFS API's approach to AIO.
Suparna 还建议查看 DAFS API 的 AIO 方法。
Red Hat AS and Suse SLES both provide a high-performance implementation on the 2.4 kernel; it is related to, but not completely identical to, the 2.6 kernel implementation.
红帽 AS 和 Suse SLES 都在 2.4 内核上提供高性能实现; 它与 2.6 内核实现相关,但并不完全相同。
In February 2006, a new attempt is being made to provide network AIO; see the note above about Evgeniy Polyakov's kevent-based AIO.
2006 年 2 月,正在进行新的尝试,以提供网络 AIO;请参阅上面关于 Evgeniy Polyakov 基于 kevent 的 AIO 的说明。
In 1999, SGI implemented high-speed AIO for Linux. As of version 1.1, it's said to work well with both disk I/O and sockets. It seems to use kernel threads. It is still useful for people who can't wait for Ben's AIO to support sockets.
1999 年,SGI 为 Linux 实施了高速 AIO。从 1.1 版本开始,据说它与磁盘 I/O 和套接字都能很好地工作。它似乎使用了内核线程。对于迫不及待地等待 Ben 的 AIO 支持 sockets 的人来说,它仍然很有用。
The O'Reilly book POSIX.4: Programming for the Real World is said to include a good introduction to aio.
O'Reilly 的书 POSIX.4:据说《真实世界编程》包括对 aio 的一个很好的介绍。
A tutorial for the earlier, nonstandard, aio implementation on Solaris is online at Sunsite. It's probably worth a look, but keep in mind you'll need to mentally convert "aioread" to "aio_read", etc.
Solaris 上早期非标准 aio 实现的教程 在线 太阳站点。这可能值得一看,但请记住,您需要在脑海中将 “aioread” 转换为 “aio_read” 等。
Note that AIO doesn't provide a way to open files without blocking for disk I/O; if you care about the sleep caused by opening a disk file, Linus suggests you should simply do the open() in a different thread rather than wishing for an aio_open() system call.
请注意,AIO 不提供在不阻塞磁盘 I/O 的情况下打开文件的方法; 如果你在乎打开磁盘文件导致的休眠, Linus 建议你应该简单地在不同的线程中执行 open() ,而不是希望 aio_open() 系统调用。
Under Windows, asynchronous I/O is associated with the terms "Overlapped I/O" and IOCP or "I/O Completion Port". Microsoft's IOCP combines techniques from the prior art like asynchronous I/O (like aio_write) and queued completion notification (like when using the aio_sigevent field with aio_write) with a new idea of holding back some requests to try to keep the number of running threads associated with a single IOCP constant. For more information, see Inside I/O Completion Ports by Mark Russinovich at sysinternals.com, Jeffrey Richter's book "Programming Server-Side Applications for Microsoft Windows 2000" (Amazon, MSPress), U.S. patent #06223207, or MSDN.
在 Windows 下,异步 I/O 与术语 “重叠的 I/O”和 IOCP 或“I/O 完成端口”。 Microsoft 的 IOCP 结合了 异步 I/O(如 aio_write)和排队完成等现有技术 通知(如将 aio_sigevent 字段与 aio_write 一起使用时) 有一个新的想法,即保留一些请求以尝试保留数字 与单个 IOCP 常量关联的正在运行的线程数。 有关更多信息,请参阅 Mark Russinovich 在 sysinternals.com 上发表的《Inside I/O Completion Ports》一书《Programming Server-Side Applications for Microsoft Windows 2000》(Amazon、 MSPress)、 美国专利 #06223207,或 MSDN 的。
4. Serve one client with each server thread 4. 为每个服务器线程服务一个客户端
... and let read() and write() block. Has the disadvantage of using a whole stack frame for each client, which costs memory. Many OS's also have trouble handling more than a few hundred threads. If each thread gets a 2MB stack (not an uncommon default value), you run out of virtual memory at (2^30 / 2^21) = 512 threads on a 32 bit machine with 1GB user-accessible VM (like, say, Linux as normally shipped on x86). You can work around this by giving each thread a smaller stack, but since most thread libraries don't allow growing thread stacks once created, doing this means designing your program to minimize stack use. You can also work around this by moving to a 64 bit processor.
...并让 read() 和 write() 阻塞。缺点是每个客户端使用整个堆栈帧,这会消耗内存。许多操作系统在处理超过几百个线程时也遇到问题。如果每个线程都有一个 2MB 的堆栈(不是一个罕见的默认值),那么在具有 1GB 用户可访问的 VM 的 32 位机器上,您在 (2^30 / 2^21) = 512 个线程处耗尽了虚拟内存,例如,通常位于 x86 上的 Linux。您可以通过为每个线程提供较小的堆栈来解决此问题,但由于大多数线程库在创建后不允许增加线程堆栈,因此这样做意味着设计程序以最大限度地减少堆栈使用。您还可以通过迁移到 64 位处理器来解决此问题。
The thread support in Linux, FreeBSD, and Solaris is improving, and 64 bit processors are just around the corner even for mainstream users. Perhaps in the not-too-distant future, those who prefer using one thread per client will be able to use that paradigm even for 10000 clients. Nevertheless, at the current time, if you actually want to support that many clients, you're probably better off using some other paradigm.
Linux、FreeBSD 和 Solaris 中的线程支持正在改进,即使对于主流用户来说,64 位处理器也指日可待。也许在不久的将来,那些喜欢每个客户端使用一个线程的人将能够对 10000 个客户端使用该范例。尽管如此,在当前,如果您真的想支持那么多客户,您最好使用其他范例。
For an unabashedly pro-thread viewpoint, see Why Events Are A Bad Idea (for High-concurrency Servers) by von Behren, Condit, and Brewer, UCB, presented at HotOS IX. Anyone from the anti-thread camp care to point out a paper that rebuts this one? :-)
有关毫不掩饰的支持线程的观点,请参阅 为什么事件是个坏主意(适用于高并发服务器),作者:von Behren、Condit 和 UCB 的 Brewer,在 HotOS IX 上发表。反线程阵营中有人愿意指出一篇反驳这篇文章的论文吗?:-)
LinuxThreads Linux线程
LinuxTheads is the name for the standard Linux thread library. It is integrated into glibc since glibc2.0, and is mostly Posix-compliant, but with less than stellar performance and signal support.
LinuxTheads 是 标准 Linux 线程库的名称。 它被集成到 glibc 中,因为 glibc2.0 的 Clamp 方法,并且大部分符合 Posix 标准,但性能不佳 和 Signal Support 的 S S S S T
NGPT: Next Generation Posix Threads for Linux NGPT:适用于 Linux 的下一代 Posix 线程
NGPT is a project started by IBM to bring good Posix-compliant thread support to Linux. It's at stable version 2.2 now, and works well... but the NGPT team has announced that they are putting the NGPT codebase into support-only mode because they feel it's "the best way to support the community for the long term". The NGPT team will continue working to improve Linux thread support, but now focused on improving NPTL. (Kudos to the NGPT team for their good work and the graceful way they conceded to NPTL.)
NGPT 是一个项目 由 IBM 发起,旨在为 Linux 带来良好的 Posix 兼容线程支持。 它 现在是 2.2 稳定版,并且运行良好......但 NGPT 团队已经 宣布 他们正在将 NGPT 代码库置于仅支持模式 因为他们认为这是“支持社区的最佳方式” 长期”。 NGPT 团队将继续努力改进 Linux 线程支持,但现在专注于改进 NPTL。 (感谢 NGPT 团队的出色工作和优雅的方式 让给 NPTL。
NPTL: Native Posix Thread Library for Linux NPTL:适用于 Linux 的原生 Posix 线程库
NPTL is a project by Ulrich Drepper (the benevolent dict^H^H^H^Hmaintainer of glibc) and Ingo Molnar to bring world-class Posix threading support to Linux.
NPTL 是 乌尔里希·德雷普尔 (仁慈的 dict^H^H^H^Hmaintainer 的 glibc)和 英戈·莫尔纳 为 Linux 带来世界一流的 Posix 线程支持。
As of 5 October 2003, NPTL is now merged into the glibc cvs tree as an add-on directory (just like linuxthreads), so it will almost certainly be released along with the next release of glibc.
截至 2003 年 10 月 5 日,NPTL 现在作为附加目录合并到 glibc cvs 树中(就像 linuxthreads 一样),因此它几乎肯定会与 glibc 的下一个版本一起发布。
The first major distribution to include an early snapshot of NPTL was Red Hat 9. (This was a bit inconvenient for some users, but somebody had to break the ice...)
第一个包含 NPTL 早期快照的主要发行版是 Red Hat 9。(这对一些用户来说有点不方便,但有人必须打破僵局......
NPTL links: NPTL 链接:
- Mailing list for NPTL discussion
NPTL 讨论的邮件列表 - NPTL source code NPTL 源代码
- Initial announcement for NPTL
NPTL 的初步公告 - Original whitepaper describing the goals for NPTL
描述 NPTL 目标的原始白皮书 - Revised whitepaper describing the final design of NPTL
描述 NPTL 最终设计的修订版白皮书 - Ingo Molnar's first benchmark showing it could handle 10^6 threads
Ingo Molnar 的第一个基准测试表明它可以处理 10^6 个线程 - Ulrich's benchmark comparing performance of LinuxThreads, NPTL, and IBM's NGPT. It seems to show NPTL is much faster than NGPT.
Ulrich 的基准测试比较了 LinuxThreads、NPTL 和 IBM 的 NGPT 的性能。它似乎表明 NPTL 比 NGPT 快得多。
Here's my try at describing the history of NPTL (see also Jerry Cooperstein's article):
这是我对 NPTL 历史的描述 (另见 Jerry Cooperstein 的文章):
In March 2002, Bill Abt of the NGPT team, the glibc maintainer Ulrich Drepper, and others met to figure out what to do about LinuxThreads. One idea that came out of the meeting was to improve mutex performance; Rusty Russell et al subsequently implemented fast userspace mutexes (futexes)), which are now used by both NGPT and NPTL. Most of the attendees figured NGPT should be merged into glibc.
2002 年 3 月,NGPT 团队的 Bill Abt、glibc 维护者 Ulrich Drepper 和其他人会面 来弄清楚如何处理 LinuxThreads。 会议提出的一个想法是提高互斥锁性能; Rusty Russell 等人随后实施了 fast userspace mutexs (futexes)),现在 NGPT 和 NPTL 都使用。大多数与会者认为 NGPT 应该合并到 glibc 中。
Ulrich Drepper, though, didn't like NGPT, and figured he could do better. (For those who have ever tried to contribute a patch to glibc, this may not come as a big surprise :-) Over the next few months, Ulrich Drepper, Ingo Molnar, and others contributed glibc and kernel changes that make up something called the Native Posix Threads Library (NPTL). NPTL uses all the kernel enhancements designed for NGPT, and takes advantage of a few new ones. Ingo Molnar described the kernel enhancements as follows:
不过,Ulrich Drepper 不喜欢 NGPT,并认为他可以做得更好。(对于那些曾经尝试过为 glibc 贡献补丁的人来说,这可能并不奇怪 :-)在接下来的几个月里,Ulrich Drepper、Ingo Molnar 和其他人贡献了 glibc 和内核更改,这些更改构成了所谓的原生 Posix 线程库 (NPTL)。NPTL 使用了为 NGPT 设计的所有内核增强功能,并利用了一些新的内核增强功能。Ingo Molnar 对内核增强功能的描述如下:
While NPTL uses the three kernel features introduced by NGPT: getpid() returns PID, CLONE_THREAD and futexes; NPTL also uses (and relies on) a much wider set of new kernel features, developed as part of this project.
而 NPTL 使用 NGPT 引入的三个内核功能:getpid() 返回 PID、CLONE_THREAD 和 futex;NPTL 还使用 (并依赖于) 作为本项目的一部分开发的更广泛的新内核功能集。Some of the items NGPT introduced into the kernel around 2.5.8 got modified, cleaned up and extended, such as thread group handling (CLONE_THREAD). [the CLONE_THREAD changes which impacted NGPT's compatibility got synced with the NGPT folks, to make sure NGPT does not break in any unacceptable way.]
NGPT 在 2.5.8 左右引入内核的一些项目被修改、清理和扩展,例如线程组处理 (CLONE_THREAD)。[影响 NGPT 兼容性的 CLONE_THREAD 更改已与 NGPT 人员同步,以确保 NGPT 不会以任何不可接受的方式中断。The kernel features developed for and used by NPTL are described in the design whitepaper, http://people.redhat.com/drepper/nptl-design.pdf ...
设计白皮书 http://people.redhat.com/drepper/nptl-design.pdf ...A short list: TLS support, various clone extensions (CLONE_SETTLS, CLONE_SETTID, CLONE_CLEARTID), POSIX thread-signal handling, sys_exit() extension (release TID futex upon VM-release), the sys_exit_group() system-call, sys_execve() enhancements and support for detached threads.
简短的列表:TLS 支持、各种克隆扩展(CLONE_SETTLS、CLONE_SETTID、CLONE_CLEARTID)、POSIX 线程信号处理、sys_exit() 扩展(在 VM 发布时释放 TID futex)、sys_exit_group() 系统调用、sys_execve() 增强功能和对分离线程的支持。There was also work put into extending the PID space - eg. procfs crashed due to 64K PID assumptions, max_pid, and pid allocation scalability work. Plus a number of performance-only improvements were done as well.
还投入了工作来扩展 PID 空间 - 例如。procfs 由于 64K PID 假设、max_pid和 pid 分配可扩展性工作而崩溃。此外,还进行了许多仅性能的改进。In essence the new features are a no-compromises approach to 1:1 threading - the kernel now helps in everything where it can improve threading, and we precisely do the minimally necessary set of context switches and kernel calls for every basic threading primitive.
从本质上讲,新功能是一种不折不扣的 1:1 线程方法——内核现在可以帮助改进线程的所有事情,并且我们为每个基本线程原语精确地执行了最低限度必要的上下文切换和内核调用集。
One big difference between the two is that NPTL is a 1:1 threading model, whereas NGPT is an M:N threading model (see below). In spite of this, Ulrich's initial benchmarks seem to show that NPTL is indeed much faster than NGPT. (The NGPT team is looking forward to seeing Ulrich's benchmark code to verify the result.)
两者之间的一个很大区别是 NPTL 是 1:1 线程模型, 而 NGPT 是 M:N 线程模型(见下文)。 尽管如此, Ulrich 的初始基准 似乎表明 NPTL 确实比 NGPT 快得多。 (NGPT 团队 期待看到 Ulrich 的基准测试代码来验证结果。
FreeBSD threading support FreeBSD 线程支持
FreeBSD supports both LinuxThreads and a userspace threading library. Also, a M:N implementation called KSE was introduced in FreeBSD 5.0. For one overview, see www.unobvious.com/bsd/freebsd-threads.html.
FreeBSD 支持 LinuxThreads 和用户空间线程库。 此外,在 FreeBSD 5.0 中引入了一个名为 KSE 的 M:N 实现。 有关一个概述,请参阅 www.unobvious.com/bsd/freebsd-threads.html。
On 25 Mar 2003, Jeff Roberson posted on freebsd-arch:
2003 年 3 月 25 日, Jeff Roberson 在 freebsd-arch 上发帖:
... Thanks to the foundation provided by Julian, David Xu, Mini, Dan Eischen, and everyone else who has participated with KSE and libpthread development Mini and I have developed a 1:1 threading implementation. This code works in parallel with KSE and does not break it in any way. It actually helps bring M:N threading closer by testing out shared bits. ...
...感谢 Julian、David Xu、Mini、Dan Eischen 以及参与 KSE 和 libpthread 开发的其他人提供的基础,我和 Mini 开发了 1:1 线程实现。此代码与 KSE 并行工作,不会以任何方式中断它。它实际上通过测试共享位来帮助使 M:N 线程更紧密。...
And in July 2006, Robert Watson proposed that the 1:1 threading implementation become the default in FreeBsd 7.x:
而在 2006 年 7 月, Robert Watson 提议将 1:1 线程实现成为 FreeBsd 7.x 中的默认实现:
I know this has been discussed in the past, but I figured with 7.x trundling forward, it was time to think about it again. In benchmarks for many common applications and scenarios, libthr demonstrates significantly better performance over libpthread... libthr is also implemented across a larger number of our platforms, and is already libpthread on several. The first recommendation we make to MySQL and other heavy thread users is "Switch to libthr", which is suggestive, also! ... So the strawman proposal is: make libthr the default threading library on 7.x.
我知道这在过去已经讨论过了,但我认为随着 7.x 的推进,是时候重新考虑一下了。在许多常见应用程序和场景的基准测试中,libthr 的性能明显优于 libpthread...libthr 也在我们的更多平台上实现,并且已经在多个平台上实现了 libpthread。我们向 MySQL 和其他重线程用户提出的第一个建议是 “Switch to libthr”,这也是一个建议性!...所以稻草人的建议是:让 libthr 成为 7.x 上的默认线程库。
NetBSD threading support NetBSD 线程支持
According to a note from Noriyuki Soda:
根据 Noriyuki Soda 的一份说明:
Kernel supported M:N thread library based on the Scheduler Activations model is merged into NetBSD-current on Jan 18 2003.
基于 Scheduler Activations 模型的内核支持的 M:N 线程库已于 2003 年 1 月 18 日合并到 NetBSD-current 中。
For details, see An Implementation of Scheduler Activations on the NetBSD Operating System by Nathan J. Williams, Wasabi Systems, Inc., presented at FREENIX '02.
有关详细信息,请参阅 由 Wasabi Systems, Inc. 的 Nathan J. Williams 在 NetBSD 操作系统上实现调度程序激活,在 FREENIX '02 上发表。
Solaris threading support Solaris 线程支持
The thread support in Solaris is evolving... from Solaris 2 to Solaris 8, the default threading library used an M:N model, but Solaris 9 defaults to 1:1 model thread support. See Sun's multithreaded programming guide and Sun's note about Java and Solaris threading.
Solaris 中的线程支持正在不断发展...从 Solaris 2 到 Solaris 8,默认的 线程库使用 M:N 模型,但 Solaris 9 默认为 1:1 模型线程支持。 请参见 Sun 的多线程编程指南和 Sun 的有关 Java 和 Solaris 线程的注释。
Java threading support in JDK 1.3.x and earlier JDK 1.3.x 及更早版本中的 Java 线程支持
As is well known, Java up to JDK1.3.x did not support any method of handling network connections other than one thread per client. Volanomark is a good microbenchmark which measures throughput in messsages per second at various numbers of simultaneous connections. As of May 2003, JDK 1.3 implementations from various vendors are in fact able to handle ten thousand simultaneous connections -- albeit with significant performance degradation. See Table 4 for an idea of which JVMs can handle 10000 connections, and how performance suffers as the number of connections increases.
众所周知,JDK1.3.x 之前的 Java 不支持任何 处理每个客户端一个线程以外的网络连接。 Volanomark 是一个很好的微基准测试 它测量各种 同时连接的数量。 截至 2003 年 5 月,JDK 1.3 来自不同供应商的 implementations 实际上能够处理 一万个同时连接 -- 尽管连接量很大 性能下降。 见表 4 了解哪些 JVM 可以处理 10000 个连接以及如何处理 随着连接数量的增加,性能会受到影响。
Note: 1:1 threading vs. M:N threading 注意:1:1 螺纹加工与 M:N 螺纹加工
There is a choice when implementing a threading library: you can either put all the threading support in the kernel (this is called the 1:1 threading model), or you can move a fair bit of it into userspace (this is called the M:N threading model). At one point, M:N was thought to be higher performance, but it's so complex that it's hard to get right, and most people are moving away from it.
在实现线程库时有一个选择:您可以 将所有线程支持放在内核中(这称为 1:1 线程 模型),或者你可以将相当一部分移动到用户空间(这称为 M:N threading 模型)。 M:N 曾一度被认为具有更高的性能, 但它是如此复杂,以至于很难正确处理,而且大多数人都在远离它。
- Why Ingo Molnar prefers 1:1 over M:N
为什么 Ingo Molnar 更喜欢 1:1 而不是 M:N - Sun is moving to 1:1 threads
Sun 正在转向 1:1 线程 - NGPT is an M:N threading library for Linux.
NGPT 是适用于 Linux 的 M:N 线程库。 - Although Ulrich Drepper planned to use M:N threads in the new glibc threading library, he has since switched to the 1:1 threading model.
尽管 Ulrich Drepper 计划在新的 glibc 线程库中使用 M:N 线程,但他后来切换到了 1:1 线程模型。 - MacOSX appears to use 1:1 threading.
MacOSX 似乎使用 1:1 线程。 - FreeBSD and NetBSD appear to still believe in M:N threading... The lone holdouts? Looks like freebsd 7.0 might switch to 1:1 threading (see above), so perhaps M:N threading's believers have finally been proven wrong everywhere.
FreeBSD 和 NetBSD 公司 似乎仍然相信 M:N 线程...... 孤独的坚守者? 看起来 freebsd 7.0 可能会切换到 1:1 线程 (见上文), 所以 也许 M:N 线程的信徒终于被证明是错误的 到处。
5. Build the server code into the kernel 5. 将服务器代码构建到内核中
Novell and Microsoft are both said to have done this at various times, at least one NFS implementation does this, khttpd does this for Linux and static web pages, and "TUX" (Threaded linUX webserver) is a blindingly fast and flexible kernel-space HTTP server by Ingo Molnar for Linux. Ingo's September 1, 2000 announcement says an alpha version of TUX can be downloaded from ftp://ftp.redhat.com/pub/redhat/tux, and explains how to join a mailing list for more info.
据说 Novell 和 Microsoft 都在不同的时间这样做了。 至少有一个 NFS 实现执行此操作, khttpd 为 Linux 执行此操作 和静态网页,以及 “TUX”(线程式 linUX Web 服务器)是 Ingo Molnar 为 Linux 推出的一款速度惊人且灵活的内核空间 HTTP 服务器。Ingo 2000 年 9 月 1 日的公告 表示可以从 ftp://ftp.redhat.com/pub/redhat/tux, 并解释了如何加入邮件列表以获取更多信息。
The linux-kernel list has been discussing the pros and cons of this approach, and the consensus seems to be instead of moving web servers into the kernel, the kernel should have the smallest possible hooks added to improve web server performance. That way, other kinds of servers can benefit. See e.g. Zach Brown's remarks about userland vs. kernel http servers. It appears that the 2.4 linux kernel provides sufficient power to user programs, as the X15 server runs about as fast as Tux, but doesn't use any kernel modifications.
linux-kernel 列表一直在讨论这样做的利弊 方法,共识似乎是而不是移动 Web 服务器 添加到内核中,内核应添加尽可能小的钩子 以提高 Web 服务器性能。 这样,其他类型的服务器 可以从中受益。 参见 e.g. 扎克·布朗的评论 关于 Userland vs. Kernel HTTP 服务器。 看起来 2.4 linux 内核为用户程序提供了足够的能力,因为 X15 服务器的运行速度与 Tux 差不多,但不使用任何 内核修改。
Bring the TCP stack into userspace 将 TCP 堆栈引入用户空间
See for instance the netmap packet I/O framework, and the Sandstorm proof-of-concept web server based on it.
例如,请参阅 netmap 数据包 I/O 框架和 沙尘暴 基于它的概念验证 Web 服务器。
Comments 评论
Richard Gooch has written a paper discussing I/O options.
理查德·古奇 (Richard Gooch) 写道 一篇讨论 I/O 选项的论文。
In 2001, Tim Brecht and MMichal Ostrowski measured various strategies for simple select-based servers. Their data is worth a look.
2001 年,蒂姆·布莱希特 (Tim Brecht) 和米查尔·奥斯特洛夫斯基 (MMichal Ostrowski) 测量了基于 SELECT 的简单服务器的各种策略。他们的数据值得一看。
In 2003, Tim Brecht posted source code for userver, a small web server put together from several servers written by Abhishek Chandra, David Mosberger, David Pariag, and Michal Ostrowski. It can use select(), poll(), epoll(), or sigio.
2003 年,蒂姆·布莱希特 (Tim Brecht) 发帖 userver 的源代码,一个小型 Web 服务器,由 Abhishek Chandra、David Mosberger、David Pariag 和 Michal Ostrowski 编写的多个服务器组合而成。它可以使用 select()、poll()、epoll() 或 sigio。
Back in March 1999, Dean Gaudet posted:
回到 1999 年 3 月, Dean Gaudet 发帖说:
I keep getting asked "why don't you guys use a select/event based model like Zeus? It's clearly the fastest." ...
我一直被问到“为什么你们不使用像 Zeus 这样基于选择/事件的模型?这显然是最快的。
His reasons boiled down to "it's really hard, and the payoff isn't clear". Within a few months, though, it became clear that people were willing to work on it.
他的理由归结为“这真的很困难,而且回报并不清楚”。 然而,在几个月内,很明显人们愿意为此努力。
Mark Russinovich wrote an editorial and an article discussing I/O strategy issues in the 2.2 Linux kernel. Worth reading, even he seems misinformed on some points. In particular, he seems to think that Linux 2.2's asynchronous I/O (see F_SETSIG above) doesn't notify the user process when data is ready, only when new connections arrive. This seems like a bizarre misunderstanding. See also comments on an earlier draft, Ingo Molnar's rebuttal of 30 April 1999, Russinovich's comments of 2 May 1999, a rebuttal from Alan Cox, and various posts to linux-kernel. I suspect he was trying to say that Linux doesn't support asynchronous disk I/O, which used to be true, but now that SGI has implemented KAIO, it's not so true anymore.
马克·鲁西诺维奇 (Mark Russinovich) 写道 社论和 一篇文章 讨论 2.2 Linux 内核中的 I/O 策略问题。 甚至值得一读 他似乎在某些方面被误导了。 特别是,他 似乎认为 Linux 2.2 的异步 I/O (请参阅上面的F_SETSIG)在数据准备就绪时不通知用户进程,仅 当新的连接到达时。 这似乎是一个奇怪的误会。 另请参阅 对早期草稿的评论, Ingo Molnar 对 1999 年 4 月 30 日的反驳, Russinovich 在 1999 年 5 月 2 日的评论, 艾伦·考克斯 (Alan Cox) 的反驳, 和各种 发布到 linux-kernel 上。我怀疑他是想说 Linux 不支持异步磁盘 I/O,这在过去是正确的,但现在 SGI 已经实现了 KAIO,现在不再如此了。
See these pages at sysinternals.com and MSDN for information on "completion ports", which he said were unique to NT; in a nutshell, win32's "overlapped I/O" turned out to be too low level to be convenient, and a "completion port" is a wrapper that provides a queue of completion events, plus scheduling magic that tries to keep the number of running threads constant by allowing more threads to pick up completion events if other threads that had picked up completion events from this port are sleeping (perhaps doing blocking I/O).
请参阅 sysinternals.com 和 MSDN 提供有关“完成端口”的信息,他说这是 NT 独有的;简而言之,Win32 的“重叠 I/O”级别太低,不方便,“完成端口”是一个包装器,它提供完成事件队列,加上调度魔术,如果其他从该端口获取完成事件的线程正在休眠(可能正在执行阻塞 I/O),则允许更多线程拾取完成事件,从而尝试保持正在运行的线程数不变。
See also OS/400's support for I/O completion ports.
另请参阅 OS/400 对 I/O 完成端口的支持。
There was an interesting discussion on linux-kernel in September 1999 titled "> 15,000 Simultaneous Connections" (and the second week of the thread). Highlights:
1999 年 9 月在 linux-kernel 上有一个有趣的讨论,标题为“> 15,000 个同时连接”(以及该帖子的第二周)。突出:
- Ed Hall posted a few notes on his experiences; he's achieved >1000 connects/second on a UP P2/333 running Solaris. His code used a small pool of threads (1 or 2 per CPU) each managing a large number of clients using "an event-based model".
埃德·霍尔 发布了一些关于他经历的笔记;他在运行 Solaris 的 UP P2/333 上实现了 >1000 次连接/秒。他的代码使用一个小线程池(每个 CPU 1 或 2 个),每个线程池使用“基于事件的模型”管理大量客户端。 - Mike Jagdis posted an analysis of poll/select overhead, and said "The current select/poll implementation can be improved significantly, especially in the blocking case, but the overhead will still increase with the number of descriptors because select/poll does not, and cannot, remember what descriptors are interesting. This would be easy to fix with a new API. Suggestions are welcome..."
Mike Jagdis 发表了一篇关于 poll/select 开销的分析,他说:“当前的 select/poll 实现可以得到显著改进,尤其是在阻塞情况下,但开销仍然会随着描述符数量的增加而增加,因为 select/poll 不会也不能记住哪些描述符是有趣的。使用新的 API 很容易解决这个问题。欢迎提出建议...” - Mike posted about his work on improving select() and poll().
Mike 发布了他改进 select() 和 poll() 的工作。 - Mike posted a bit about a possible API to replace poll()/select(): "How about a 'device like' API where you write 'pollfd like' structs, the 'device' listens for events and delivers 'pollfd like' structs representing them when you read it? ... "
Mike 发布了一些关于替换 poll()/select() 的可能 API 的文章:“一个'类似设备'的 API,你编写'类似 pollfd'的结构,'设备'监听事件并在你读取它时提供代表它们的'类似 pollfd'的结构,怎么样?..." - Rogier Wolff suggested using "the API that the digital guys suggested", http://www.cs.rice.edu/~gaurav/papers/usenix99.ps
罗吉尔·沃尔夫 建议 使用“数字人员建议的 API”, http://www.cs.rice.edu/~gaurav/papers/usenix99.ps - Joerg Pommnitz pointed out that any new API along these lines should be able to wait for not just file descriptor events, but also signals and maybe SYSV-IPC. Our synchronization primitives should certainly be able to do what Win32's WaitForMultipleObjects can, at least.
Joerg Pommnitz 指出,任何类似的新 API 不仅应该能够等待文件描述符事件,还应该能够等待信号,也许还有 SYSV-IPC。至少,我们的同步基元肯定能够执行 Win32 的 WaitForMultipleObjects 所能执行的操作。 - Stephen Tweedie asserted that the combination of F_SETSIG, queued realtime signals, and sigwaitinfo() was a superset of the API proposed in http://www.cs.rice.edu/~gaurav/papers/usenix99.ps. He also mentions that you keep the signal blocked at all times if you're interested in performance; instead of the signal being delivered asynchronously, the process grabs the next one from the queue with sigwaitinfo().
Stephen Tweedie 断言,F_SETSIG、排队实时信号和 sigwaitinfo() 的组合是 http://www.cs.rice.edu/~gaurav/papers/usenix99.ps 年提出的 API 的超集。他还提到,如果您对性能感兴趣,请始终阻止信号;该进程不是异步传递信号,而是使用 sigwaitInfo() 从队列中获取下一个信号。 - Jayson Nordwick compared completion ports with the F_SETSIG synchronous event model, and concluded they're pretty similar.
Jayson Nordwick 比较 completion ports 替换为 F_SETSIG 同步事件模型, 并得出结论,它们非常相似。 - Alan Cox noted that an older rev of SCT's SIGIO patch is included in 2.3.18ac.
Alan Cox 指出,2.3.18ac 中包含了 SCT 的 SIGIO 补丁的旧版本。 - Jordan Mendelson posted some example code showing how to use F_SETSIG.
Jordan Mendelson 发布了一些示例代码,展示了如何使用 F_SETSIG。 - Stephen C. Tweedie continued the comparison of completion ports and F_SETSIG, and noted: "With a signal dequeuing mechanism, your application is going to get signals destined for various library components if libraries are using the same mechanism," but the library can set up its own signal handler, so this shouldn't affect the program (much).
Stephen C. Tweedie 继续比较了 completion ports 和 F_SETSIG,并指出:“使用信号出列机制,如果库使用相同的机制,您的应用程序将获得发往各种库组件的信号,”但库可以设置自己的信号处理程序,因此这应该不会对程序产生太大影响。 - Doug Royer noted that he'd gotten 100,000 connections on Solaris 2.6 while he was working on the Sun calendar server. Others chimed in with estimates of how much RAM that would require on Linux, and what bottlenecks would be hit.
道格·罗耶 注意到他在 Solaris 2.6 上获得了 100,000 个连接,而他 正在 Sun 日历服务器上工作。 其他人则对这需要多少 RAM 的估计进行了估计 以及会遇到哪些瓶颈。
Interesting reading! 有趣的阅读!
Limits on open filehandles 打开文件句柄的限制
-
Any Unix: the limits set by ulimit or setrlimit.
Any Unix:由 ulimit 或 setrlimit 设置的限制。 -
Solaris: see the Solaris FAQ, question 3.46 (or thereabouts; they renumber the questions periodically).
Solaris:请参见 Solaris FAQ 中的问题 3.46(或类似问题;它们会定期对问题进行重新编号)。 -
FreeBSD:
FreeBSD 的:
Edit /boot/loader.conf, add the line
编辑 /boot/loader.conf,添加一行
set kern.maxfiles=XXXX 设置 kern.maxfiles=XXXX
where XXXX is the desired system limit on file descriptors, and reboot. Thanks to an anonymous reader, who wrote in to say he'd achieved far more than 10000 connections on FreeBSD 4.3, and says
其中 XXXX 是文件描述符的所需系统限制, 并重新启动。 感谢一位匿名读者,他写信给 说他在 FreeBSD 4.3 上实现了超过 10000 个连接, 并说
"FWIW: You can't actually tune the maximum number of connections in FreeBSD trivially, via sysctl.... You have to do it in the /boot/loader.conf file.
“FWIW:您实际上无法调整最大连接数 在 FreeBSD 中,通过 sysctl 轻松... 您必须在 /boot/loader.conf 文件。
The reason for this is that the zalloci() calls for initializing the sockets and tcpcb structures zones occurs very early in system startup, in order that the zone be both type stable and that it be swappable.
这样做的原因是 zalloci() 调用初始化 sockets 和 tcpcb 结构区域在 系统启动,以便区域类型稳定且 它是可交换的。
You will also need to set the number of mbufs much higher, since you will (on an unmodified kernel) chew up one mbuf per connection for tcptempl structures, which are used to implement keepalive."
您还需要将 mbuf 的数量设置得更高,因为 您将(在未修改的内核上)每个连接咀嚼一个 MBUF 用于实现 Keepalive 的 tcptempl 结构。Another reader says
另一位读者说
"As of FreeBSD 4.4, the tcptempl structure is no longer allocated; you no longer have to worry about one mbuf being chewed up per connection."
“从 FreeBSD 4.4 开始,tcptempl 结构不再被分配;您不再需要担心每个连接会消耗一个 MBUF。See also:
另请参阅:
- the FreeBSD handbook FreeBSD 手册
- SYSCTL TUNING, LOADER TUNABLES, and KERNEL CONFIG TUNING in 'man tuning'
SYSCTL 调优 / LOADER 可调参数,以及 “man tuning” 中的 KERNEL CONFIG TUNING - The Effects of Tuning a FreeBSD 4.3 Box for High Performance, Daemon News, Aug 2001
The Effects of Tuning a FreeBSD 4.3 Box for High Performance(调整 FreeBSD 4.3 机器以获得高性能),Daemon News,2001 年 8 月 - postfix.org tuning notes, covering FreeBSD 4.2 and 4.4
- the Measurement Factory's notes, circa FreeBSD 4.3
Measurement Factory 的笔记, 大约 FreeBSD 4.3
-
OpenBSD: A reader says
OpenBSD 的: 一位读者说
"In OpenBSD, an additional tweak is required to increase the number of open filehandles available per process: the openfiles-cur parameter in /etc/login.conf needs to be increased. You can change kern.maxfiles either with sysctl -w or in sysctl.conf but it has no effect. This matters because as shipped, the login.conf limits are a quite low 64 for nonprivileged processes, 128 for privileged."
“在 OpenBSD 中,需要额外的调整来增加 每个进程可用的 Open FileHandles:中的 openfiles-cur 参数 /etc/login.conf 需要增加。您可以使用 sysctl -w 或在 sysctl.conf 中更改 kern.max文件,但这不起作用。这很重要,因为在发布时,非特权进程的 login.conf 限制非常低,为 64,特权进程为 128。 -
Linux: See
Bodo Bauer's /proc documentation
. On 2.4 kernels:
Linux:请参阅 Bodo Bauer 的 /proc 文档。 在 2.4 内核上:
echo 32768 > /proc/sys/fs/file-max
increases the system limit on open files, and
增加打开文件的系统限制,以及
ulimit -n 32768
increases the current process' limit.
增加当前进程的限制。
On 2.2.x kernels, 在 2.2.x 内核上,
echo 32768 > /proc/sys/fs/file-max echo 65536 > /proc/sys/fs/inode-max
increases the system limit on open files, and
增加打开文件的系统限制,以及
ulimit -n 32768
increases the current process' limit.
增加当前进程的限制。
I verified that a process on Red Hat 6.0 (2.2.5 or so plus patches) can open at least 31000 file descriptors this way. Another fellow has verified that a process on 2.2.12 can open at least 90000 file descriptors this way (with appropriate limits). The upper bound seems to be available memory.
我验证了 Red Hat 6.0 上的一个进程 (2.2.5 左右加上补丁)可以通过这种方式打开至少 31000 个文件描述符。 另一个人已经验证了 2.2.12 上的进程至少可以打开 90000 个文件描述符(具有适当的限制)。 上限 似乎是可用内存。
Stephen C. Tweedie posted about how to set ulimit limits globally or per-user at boot time using initscript and pam_limit.
Stephen C. Tweedie 发布 关于如何在启动时使用 initscript 和 pam_limit。
In older 2.2 kernels, though, the number of open files per process is still limited to 1024, even with the above changes.
但是,在较旧的 2.2 内核中,每个进程打开的文件数为 即使有上述更改,仍然仅限于 1024。
See also Oskar's 1998 post, which talks about the per-process and system-wide limits on file descriptors in the 2.0.36 kernel.
另请参阅 Oskar 1998 年的帖子, 其中讨论了文件描述符的每个进程和系统范围的限制 在 2.0.36 内核中。
Limits on threads 线程限制
On any architecture, you may need to reduce the amount of stack space allocated for each thread to avoid running out of virtual memory. You can set this at runtime with pthread_attr_init() if you're using pthreads.
在任何体系结构上,你都可能需要减少为每个线程分配的堆栈空间量,以避免耗尽虚拟内存。如果您使用的是 pthreads,则可以在运行时使用 pthread_attr_init() 进行设置。
-
Solaris: it supports as many threads as will fit in memory, I hear.
Solaris:我听说它支持多少个线程。 -
Linux 2.6 kernels with NPTL: /proc/sys/vm/max_map_count may need to be increased to go above 32000 or so threads. (You'll need to use very small stack threads to get anywhere near that number of threads, though, unless you're on a 64 bit processor.) See the NPTL mailing list, e.g. the thread with subject "Cannot create more than 32K threads?", for more info.
带有 NPTL 的 Linux 2.6 内核:/proc/sys/vm/max_map_count 可能需要增加到 32000 个左右的线程。(不过,您需要使用非常小的堆栈线程才能获得接近该数量的线程,除非您使用的是 64 位处理器。有关更多信息,请参阅 NPTL 邮件列表,例如主题为 “Cannot create more than 32K threads?” 的帖子。 -
Linux 2.4: /proc/sys/kernel/threads-max is the max number of threads; it defaults to 2047 on my Red Hat 8 system. You can set increase this as usual by echoing new values into that file, e.g. "echo 4000 > /proc/sys/kernel/threads-max"
Linux 2.4:/proc/sys/kernel/threads-max 是最大线程数;在我的 Red Hat 8 系统上,它默认为 2047。你可以通过将新值回显到该文件中来设置增加此值,例如 “echo 4000 > /proc/sys/kernel/threads-max” -
Linux 2.2: Even the 2.2.13 kernel limits the number of threads, at least on Intel. I don't know what the limits are on other architectures.
Mingo posted a patch for 2.1.131 on Intel
that removed this limit. It appears to be integrated into 2.3.20.
Linux 2.2:即使是 2.2.13 内核也限制了线程的数量, 至少在 Intel 上是这样。 我不知道其他架构的限制是什么。 Mingo 在 Intel 上发布了 2.1.131 的补丁,删除了此限制。 它似乎已集成到 2.3.20 中。
See also Volano's detailed instructions for raising file, thread, and FD_SET limits in the 2.2 kernel. Wow. This document steps you through a lot of stuff that would be hard to figure out yourself, but is somewhat dated.
另请参阅 Volano 在 2.2 内核中提高文件、线程和FD_SET限制的详细说明。哇。本文档将引导您完成许多您自己很难弄清楚但有些过时的内容。 -
Java: See Volano's detailed benchmark info, plus their info on how to tune various systems to handle lots of threads.
Java:请参阅 Volano 的详细基准测试信息,以及有关如何调整各种系统的信息 来处理大量线程。
Java issues Java 问题
Up through JDK 1.3, Java's standard networking libraries mostly offered the one-thread-per-client model. There was a way to do nonblocking reads, but no way to do nonblocking writes.
在 JDK 1.3 之前,Java 的标准网络库大多提供 one-thread-per-client模型。有一种方法可以执行非阻塞读取,但没有办法执行非阻塞写入。
In May 2001, JDK 1.4 introduced the package java.nio to provide full support for nonblocking I/O (and some other goodies). See the release notes for some caveats. Try it out and give Sun feedback!
在 2001 年 5 月,JDK 1.4 引入了 java.nio 包,以完全支持非阻塞 I/O(以及其他一些好东西)。有关一些注意事项,请参阅发行说明。试用一下,并向 Sun 提供反馈!
HP's java also includes a Thread Polling API.
HP 的 java 还包括一个线程轮询 API。
In 2000, Matt Welsh implemented nonblocking sockets for Java; his performance benchmarks show that they have advantages over blocking sockets in servers handling many (up to 10000) connections. His class library is called java-nbio; it's part of the Sandstorm project. Benchmarks showing performance with 10000 connections are available.
2000 年,Matt Welsh 为 Java 实现了非阻塞套接字;他的表现 基准测试表明,它们比阻止服务器中的套接字更具优势 处理许多 (最多 10000) 个连接。 他的类库称为 java-nbio 的; 它是 Sandstorm 项目。 基准测试显示 提供 10000 个连接的性能。
See also Dean Gaudet's essay on the subject of Java, network I/O, and threads, and the paper by Matt Welsh on events vs. worker threads.
另请参阅 Dean Gaudet 的文章 在 Java、网络 I/O 和线程主题上,以及 Matt Welsh 关于事件与工作线程的论文。
Before NIO, there were several proposals for improving Java's networking APIs:
在 NIO 之前,有几项改进 Java 网络 API 的建议:
- Matt Welsh's Jaguar system proposes preserialized objects, new Java bytecodes, and memory management changes to allow the use of asynchronous I/O with Java.
马特·威尔士 (Matt Welsh) 的 捷豹系统 提议预序列化对象、新的 Java 字节码和内存管理 更改以允许将异步 I/O 与 Java 一起使用。 - Interfacing Java to the Virtual Interface Architecture, by C-C. Chang and T. von Eicken, proposes memory management changes to allow the use of asynchronous I/O with Java.
将 Java 连接到虚拟接口体系结构,由 C-C 提供。Chang 和 T. von Eicken 提出了内存管理更改,以允许在 Java 中使用异步 I/O。 - JSR-51 was the Sun project that came up with the java.nio package. Matt Welsh participated (who says Sun doesn't listen?).
JSR-51 协议 是 java.nio 包的 Sun 项目。 Matt Welsh 参加了(谁说 Sun 不听?
Other tips 其他提示
-
Zero-Copy 零拷贝
Normally, data gets copied many times on its way from here to there. Any scheme that eliminates these copies to the bare physical minimum is called "zero-copy".
通常,数据在从这里到那里的途中会被复制多次。 将这些副本消除到最基本的物理最小值的任何方案都称为 “零副本”。
-
Thomas Ogrisegg's zero-copy send patch for mmaped files under Linux 2.4.17-2.4.20. Claims it's faster than sendfile().
Thomas Ogrisegg 的 Linux 2.4.17-2.4.20 下 mmap 文件的零副本发送补丁。声称它比 sendfile() 更快。 -
IO-Lite is a proposal for a set of I/O primitives that gets rid of the need for many copies.
IO-Lite 系列 是一组 I/O 原语的提案,无需许多副本。 -
Alan Cox noted that zero-copy is sometimes not worth the trouble back in 1999. (He did like sendfile(), though.)
Alan Cox 在 1999 年指出,零拷贝有时不值得麻烦。(不过,他确实喜欢 sendfile()。 -
Ingo implemented a form of zero-copy TCP in the 2.4 kernel for TUX 1.0 in July 2000, and says he'll make it available to userspace soon.
Ingo 于 2000 年 7 月在 TUX 1.0 的 2.4 内核中实现了一种零拷贝 TCP 形式,并表示他将很快将其提供给用户空间。 -
Drew Gallatin and Robert Picco have added some zero-copy features to FreeBSD
; the idea seems to be that if you call write() or read() on a socket, the pointer is page-aligned, and the amount of data transferred is at least a page, and you don't immediately reuse the buffer, memory management tricks will be used to avoid copies. But see
followups to this message on linux-kernel
for people's misgivings about the speed of those memory management tricks.
Drew Gallatin 和 Robert Picco 为 FreeBSD 添加了一些零拷贝功能;这个想法似乎是这样的 如果你在套接字上调用 write() 或 read(),指针是页面对齐的, 并且传输的数据量至少是一个页面,和你 不要立即重用缓冲区,内存管理技巧将是 用于避免副本。 但是,请参阅 在 linux-kernel 上对此消息的后续处理 让人们对那些内存管理技巧的速度感到疑虑。
According to a note from Noriyuki Soda:
根据 Noriyuki Soda 的一份说明:Sending side zero-copy is supported since NetBSD-1.6 release by specifying "SOSEND_LOAN" kernel option. This option is now default on NetBSD-current (you can disable this feature by specifying "SOSEND_NO_LOAN" in the kernel option on NetBSD_current). With this feature, zero-copy is automatically enabled, if data more than 4096 bytes are specified as data to be sent.
从 NetBSD-1.6 版本开始,通过指定 “SOSEND_LOAN” 内核选项来支持发送端零拷贝。此选项现在是 NetBSD-current 的默认值(您可以通过在 NetBSD_current 的 kernel 选项中指定 “SOSEND_NO_LOAN” 来禁用此功能)。使用此功能,如果指定超过 4096 字节的数据作为要发送的数据,则会自动启用零拷贝。- The sendfile() system call can implement zero-copy networking.
sendfile() 系统调用可以实现零拷贝网络。
The sendfile() function in Linux and FreeBSD lets you tell the kernel to send part or all of a file. This lets the OS do it as efficiently as possible. It can be used equally well in servers using threads or servers using nonblocking I/O. (In Linux, it's poorly documented at the moment;
use _syscall4 to call it
. Andi Kleen is writing new man pages that cover this. See also
Exploring The sendfile System Call
by Jeff Tranter in Linux Gazette issue 91.)
Rumor has it
, ftp.cdrom.com benefitted noticeably from sendfile().
Linux 和 FreeBSD 中的 sendfile() 函数允许您告诉内核发送部分 或整个文件。 这让操作系统可以尽可能高效地完成此操作。 它可以同样适用于使用线程的服务器或使用 非阻塞 I/O。 (在 Linux 中,目前它的记录很差;使用 _syscall4 来调用它。 Andi Kleen 正在编写涵盖此内容的新手册页。 另请参阅 Jeff Tranter 在 Linux Gazette 第 91 期中由 Jeff Tranter 编写的 Exploring The sendfile System Call。 有传言说, ftp.cdrom.com sendfile() 中获益匪浅。
A zero-copy implementation of sendfile() is on its way for the 2.4 kernel. See LWN Jan 25 2001.
sendfile() 的零拷贝实现正在为 2.4 内核提供。参见 LWN 2001 年 1 月 25 日。One developer using sendfile() with Freebsd reports that using POLLWRBAND instead of POLLOUT makes a big difference.
一位在 Freebsd 中使用 sendfile() 的开发人员报告说,使用 POLLWRBAND 而不是 POLLOUT 会产生很大的不同。Solaris 8 (as of the July 2001 update) has a new system call 'sendfilev'. A copy of the man page is here.. The Solaris 8 7/01 release notes also mention it. I suspect that this will be most useful when sending to a socket in blocking mode; it'd be a bit of a pain to use with a nonblocking socket.
Solaris 8(自 2001 年 7 月更新起)有一个新的系统调用“sendfilev”。 手册页的副本在这里.. Solaris 8 7/01 发行说明也提到了它。我怀疑这在以阻塞模式发送到套接字时最有用;与非阻塞套接字一起使用会有点痛苦。 -
-
Avoid small frames by using writev (or TCP_CORK)
使用 writev(或 TCP_CORK)避免使用小帧A new socket option under Linux, TCP_CORK, tells the kernel to avoid sending partial frames, which helps a bit e.g. when there are lots of little write() calls you can't bundle together for some reason. Unsetting the option flushes the buffer. Better to use writev(), though...
Linux 下的新套接字选项 TCP_CORK 告诉内核 避免发送部分帧,这有点帮助,例如,当有 由于某种原因,许多小的 write() 调用无法捆绑在一起。 取消设置该选项将刷新缓冲区。 不过,最好使用 writev()......
See LWN Jan 25 2001 for a summary of some very interesting discussions on linux-kernel about TCP_CORK and a possible alternative MSG_MORE.
参见 LWN Jan 25, 2001 的摘要,了解 linux-kernel 上一些关于 TCP_CORK 的非常有趣的讨论和可能的替代MSG_MORE。 -
Behave sensibly on overload.
在超负荷时表现得明智。
[Provos, Lever, and Tweedie 2000] notes that dropping incoming connections when the server is overloaded improved the shape of the performance curve, and reduced the overall error rate. They used a smoothed version of "number of clients with I/O ready" as a measure of overload. This technique should be easily applicable to servers written with select, poll, or any system call that returns a count of readiness events per call (e.g. /dev/poll or sigtimedwait4()).
[Provos、Lever 和 Tweedie 2000] 请注意,当服务器 是 overloaded 改进了性能曲线的形状, 并降低了总体错误率。 他们使用了平滑的 “I/O 就绪的客户端数”版本作为度量 的超载。 该技术应该很容易应用于 使用 select、poll 或任何返回 每次调用的就绪事件计数(例如 /dev/poll 或 sigtimedwait4())。 -
Some programs can benefit from using non-Posix threads.
某些程序可以从使用非 Posix 线程中受益。
Not all threads are created equal. The clone() function in Linux (and its friends in other operating systems) lets you create a thread that has its own current working directory, for instance, which can be very helpful when implementing an ftp server. See Hoser FTPd for an example of the use of native threads rather than pthreads.
并非所有线程都是一样的。 Linux 中的 clone() 函数 (及其在其他操作系统中的朋友) 允许您创建具有自己的当前工作目录的线程, 例如,这在实施 FTP 服务器时非常有用。 有关使用本机线程而不是 pthread 的示例,请参阅 Hoser FTPd。 -
Caching your own data can sometimes be a win.
缓存您自己的数据有时可能是一个胜利。"Re: fix for hybrid server problems" by Vivek Sadananda Pai (vivek@cs.rice.edu) on
new-httpd
, May 9th, states:
“Re: fix for hybrid server problems” 作者:Vivek Sadananda Pai (vivek@cs.rice.edu) 开启 new-httpd,5 月 9 日,声明:
"I've compared the raw performance of a select-based server with a multiple-process server on both FreeBSD and Solaris/x86. On microbenchmarks, there's only a marginal difference in performance stemming from the software architecture. The big performance win for select-based servers stems from doing application-level caching. While multiple-process servers can do it at a higher cost, it's harder to get the same benefits on real workloads (vs microbenchmarks). I'll be presenting those measurements as part of a paper that'll appear at the next Usenix conference. If you've got postscript, the paper is available at http://www.cs.rice.edu/~vivek/flash99/"
“我将基于选择的服务器的原始性能与 FreeBSD 和 Solaris/x86 上的多进程服务器。上 Microbenchmarks,性能上只有很小的差异 源于软件架构。巨大的性能胜利 基于 Select 的服务器源于执行应用程序级缓存。而 多进程服务器可以以更高的成本完成,更难 在实际工作负载上获得相同的优势(与 MicroBenchmarks 相比)。 我将把这些测量结果作为论文的一部分进行介绍,该论文将 出现在下一届 Usenix 会议上。如果你有 postscript, 该论文可在 http://www.cs.rice.edu/~vivek/flash99/”
Other limits 其他限制
- Old system libraries might use 16 bit variables to hold file handles, which causes trouble above 32767 handles. glibc2.1 should be ok.
旧的系统库可能会使用 16 位变量来保存文件句柄,这会导致 32767 句柄以上出现问题。glibc2.1 应该没问题。 - Many systems use 16 bit variables to hold process or thread id's. It would be interesting to port the Volano scalability benchmark to C, and see what the upper limit on number of threads is for the various operating systems.
许多系统使用 16 位变量来保存进程或线程 ID。将 Volano 可扩展性基准测试移植到 C 语言,并查看各种操作系统的线程数上限会很有趣。 - Too much thread-local memory is preallocated by some operating systems; if each thread gets 1MB, and total VM space is 2GB, that creates an upper limit of 2000 threads.
某些操作系统预先分配了过多的线程本地内存;如果每个线程获得 1MB,并且总 VM 空间为 2GB,则创建线程数上限为 2000 个。 - Look at the performance comparison graph at the bottom of http://www.acme.com/software/thttpd/benchmarks.html. Notice how various servers have trouble above 128 connections, even on Solaris 2.6? Anyone who figures out why, let me know.
查看 底部的性能对比图 http://www.acme.com/software/thttpd/benchmarks.html。 注意到各种服务器在 128 个连接以上时遇到的问题了吗,即使在 Solaris 2.6 上也是如此? 任何弄清楚原因的人,请告诉我。
Note: if the TCP stack has a bug that causes a short (200ms) delay at SYN or FIN time, as Linux 2.2.0-2.2.6 had, and the OS or http daemon has a hard limit on the number of connections open, you would expect exactly this behavior. There may be other causes.
注意:如果 TCP 堆栈存在导致短 (200ms) SYN 或 FIN 时间的延迟,就像 Linux 2.2.0-2.2.6 一样,操作系统或 HTTP 守护程序对打开的连接数有硬性限制, 您预计会有这种行为。 可能还有其他原因。
Kernel Issues 内核问题
For Linux, it looks like kernel bottlenecks are being fixed constantly. See Linux Weekly News, Kernel Traffic, the Linux-Kernel mailing list, and my Mindcraft Redux page.
对于 Linux,内核瓶颈似乎正在不断得到修复。请参阅 Linux Weekly News, 核流量 / Linux-Kernel 邮件列表和我的 Mindcraft Redux 页面。
In March 1999, Microsoft sponsored a benchmark comparing NT to Linux at serving large numbers of http and smb clients, in which they failed to see good results from Linux. See also my article on Mindcraft's April 1999 Benchmarks for more info.
1999 年 3 月,Microsoft 赞助了一个基准测试,将 NT 与 Linux 在为大量 http 和 smb 客户端提供服务方面进行了比较,在该基准测试中,他们未能从 Linux 中看到良好的结果。有关更多信息,请参阅我关于 Mindcraft 1999 年 4 月基准测试的文章。
See also The Linux Scalability Project. They're doing interesting work, including Niels Provos' hinting poll patch, and some work on the thundering herd problem.
另请参阅 Linux 可伸缩性项目。他们正在做一些有趣的工作,包括 Niels Provos 的暗示性民意调查补丁,以及一些关于雷霆万钧的羊群问题的工作。
See also Mike Jagdis' work on improving select() and poll(); here's Mike's post about it.
另请参见 Mike Jagdis 关于改进 select() 和 poll() 的工作;这是 Mike 关于它的帖子。
Measuring Server Performance 测量 Server 性能
Two tests in particular are simple, interesting, and hard:
特别是两个测试是 simple、interesting 和 hard:
- raw connections per second (how many 512 byte files per second can you serve?)
每秒原始连接数(每秒可以处理多少个 512 字节文件? - total transfer rate on large files with many slow clients (how many 28.8k modem clients can simultaneously download from your server before performance goes to pot?)
具有许多慢速客户端的大文件的总传输速率(在性能下降之前,可以同时从您的服务器下载多少个 28.8K 调制解调器客户端?
Jef Poskanzer has published benchmarks comparing many web servers. See http://www.acme.com/software/thttpd/benchmarks.html for his results.
Jef Poskanzer 发布了比较许多 Web 服务器的基准测试。有关他的结果,请参见 http://www.acme.com/software/thttpd/benchmarks.html。
I also have a few old notes about comparing thttpd to Apache that may be of interest to beginners.
我也有 关于比较 thttpd 和 Apache 的一些旧说明,初学者可能会感兴趣。
Chuck Lever keeps reminding us about Banga and Druschel's paper on web server benchmarking. It's worth a read.
Chuck Lever 不断提醒我们 Banga 和 Druschel 关于 Web 服务器基准测试的论文。值得一读。
IBM has an excellent paper titled Java server benchmarks [Baylor et al, 2000]. It's worth a read.
IBM 有一篇题为 Java server benchmarks [Baylor et al, 2000] 的优秀论文。值得一读。
Examples 例子
Nginx is a web server that uses whatever high-efficiency network event mechanism is available on the target OS. It's getting popular; there are even books about it (and since this page was originally written, many more, including a fourth edition of that book.)
Nginx 是一个 Web 服务器,它使用任何东西 高效网络事件机制在目标操作系统上可用。 它越来越受欢迎;甚至还有关于它的书籍(由于此页面最初是写的,因此还有更多,包括该书的第四版。
Interesting select()-based servers 有趣的基于 select() 的服务器
- thttpd Very simple. Uses a single process. It has good performance, but doesn't scale with the number of CPU's. Can also use kqueue.
thttpd 很简单。 使用单个进程。 它具有良好的性能, 但不随 CPU 的数量而扩展。 也可以使用 kqueue。 - mathopd. Similar to thttpd.
马托普德。类似于 thttpd。 - fhttpd
- boa 蟒蛇
- Roxen 罗克森
- Zeus, a commercial server that tries to be the absolute fastest. See their tuning guide.
Zeus,一个试图成为绝对最快的商业服务器。请参阅他们的 tuning guide。 - The other non-Java servers listed at http://www.acme.com/software/thttpd/benchmarks.html
http://www.acme.com/software/thttpd/benchmarks.html 中列出的其他非 Java 服务器 - BetaFTPd 贝塔FTPd
- Flash-Lite - web server using IO-Lite.
Flash-Lite - 使用 IO-Lite 的 Web 服务器。 - Flash: An efficient and portable Web server -- uses select(), mmap(), mincore()
Flash:一个高效且可移植的 Web 服务器 -- 使用 select()、mmap() 和 mincore() - The Flash web server as of 2003 -- uses select(), modified sendfile(), async open()
截至 2003 年的 Flash Web 服务器 -- 使用 select()、修改后的 sendfile()、async open() - xitami - uses select() to implement its own thread abstraction for portability to systems without threads.
xitami - 使用 select() 实现自己的线程抽象,以便移植到没有线程的系统。 - Medusa - a server-writing toolkit in Python that tries to deliver very high performance.
Medusa - 一个 Python 服务器编写工具包,试图提供非常高的性能。 - userver - a small http server that can use select, poll, epoll, or sigio
userver - 可以使用 select、poll、epoll 或 sigio 的小型 HTTP 服务器
Interesting /dev/poll-based servers 有趣的基于 /dev/poll 的服务器
- N. Provos, C. Lever, "Scalable Network I/O in Linux," May, 2000. [FREENIX track, Proc. USENIX 2000, San Diego, California (June, 2000).] Describes a version of thttpd modified to support /dev/poll. Performance is compared with phhttpd.
N. Provos, C. Lever, “Linux 中的可扩展网络 I/O,” 2000 年 5 月。[FREENIX 轨道,Proc. USENIX 2000,加利福尼亚州圣地亚哥(2000 年 6 月)。 描述 修改了 thttpd 版本以支持 /dev/poll。 性能比较 与 phhttpd 一起。
Interesting epoll-based servers 有趣的基于 epoll 的服务器
- ribs2 肋骨2
- cmogstored - uses epoll/kqueue for most networking, threads for disk and accept4
cmogstored - 将 epoll/kqueue 用于大多数网络,将线程用于 disk 和 accept4
Interesting kqueue()-based servers 基于 kqueue() 的有趣服务器
- thttpd (as of version 2.21?)
thttpd(截至 2.21 版本? - Adrian Chadd says "I'm doing a lot of work to make squid actually LIKE a kqueue IO system"; it's an official Squid subproject; see http://squid.sourceforge.net/projects.html#commloops. (This is apparently newer than Benno's patch.)
Adrian Chadd 说:“我做了很多工作来让 squid 真正像一个 kqueue IO 系统”; 它是一个官方的 Squid 子项目;看 http://squid.sourceforge.net/projects.html#commloops。(这显然比 Benno 的 补丁。
Interesting realtime signal-based servers 有趣的基于实时信号的服务器
- Chromium's X15. This uses the 2.4 kernel's SIGIO feature together with sendfile() and TCP_CORK, and reportedly achieves higher speed than even TUX. The source is available under a community source (not open source) license. See the original announcement by Fabio Riccardi.
铬的X15 的。这将 2.4 内核的 SIGIO 功能与 sendfile() 和 TCP_CORK 一起使用,据报道甚至比 TUX 还快。源代码位于 社区源(非开源)许可证。 看 原始公告 法比奥·里卡迪 (Fabio Riccardi) 著。 - Zach Brown's phhttpd - "a quick web server that was written to showcase the sigio/siginfo event model. consider this code highly experimental and yourself highly mental if you try and use it in a production environment." Uses the siginfo features of 2.3.21 or later, and includes the needed patches for earlier kernels. Rumored to be even faster than khttpd. See his post of 31 May 1999 for some notes.
Zach Brown 的phhttpd - “一个快速的 Web 服务器,用于展示 sigio/siginfo 事件模型。如果您尝试在生产环境中使用这段代码,请将此代码视为高度实验性代码,并且您自己会高度精神化。使用 2.3.21 或更高版本的 siginfo 功能,并包含早期内核所需的补丁。传闻比 khttpd 更快。请参阅他 1999 年 5 月 31 日的帖子 对于一些注释。
Interesting thread-based servers 有趣的基于线程的服务器
- Hoser FTPD. See their benchmark page.
霍塞尔 FTPD。请参阅他们的基准测试页面。 - Peter Eriksson's phttpd and
Peter Eriksson 的 phttpd 和 - pftpd
- The Java-based servers listed at http://www.acme.com/software/thttpd/benchmarks.html
http://www.acme.com/software/thttpd/benchmarks.html 中列出的基于 Java 的服务器 - Sun's Java Web Server (which has been reported to handle 500 simultaneous clients)
Sun 的 Java Web 服务器 (已经 报告处理 500 个并发客户端)
Interesting in-kernel servers 有趣的内核服务器
- khttpd
- "TUX" (Threaded linUX webserver) by Ingo Molnar et al. For 2.4 kernel.
Ingo Molnar 等人为 2.4 内核编写的 “TUX” (Threaded linUX Web server)。
Other interesting links 其他有趣的链接
- Jeff Darcy's notes on high-performance server design
Jeff Darcy 关于高性能服务器设计的笔记 - Ericsson's ARIES project -- benchmark results for Apache 1 vs. Apache 2 vs. Tomcat on 1 to 12 processors
Ericsson 的 ARIES 项目 -- Apache 1 与 Apache 2 与 Tomcat 在 1 到 12 个处理器上的基准测试结果 - Prof. Peter Ladkin's Web Server Performance page.
Peter Ladkin 教授的 Web 服务器性能页面。 - Novell's FastCache -- claims 10000 hits per second. Quite the pretty performance graph.
Novell 的 FastCache —— 每秒声明 10000 次点击。相当漂亮的性能图表。 - Rik van Riel's Linux Performance Tuning site
Rik van Riel 的 Linux 性能调优网站
Translations 翻译
Belorussian translation provided by Patric Conrad at Ucallweconn
白俄罗斯语 翻译 由 Ucallweconn 的 Patric Conrad 提供
Changelog 更改日志
2011/07/21
Added nginx.org2011/07/21
已添加 nginx.org$Log: c10k.html,v $
Revision 1.212 2006/09/02 14:52:13 dank
added asio$Log:c10k.html,v $
修订版 1.212 2006/09/02 14:52:13 阴暗
添加了 ASIORevision 1.211 2006/07/27 10:28:58 dank
Link to Cal Henderson's book.修订版 1.211 2006/07/27 10:28:58 dank
链接到 Cal Henderson 的书。Revision 1.210 2006/07/27 10:18:58 dank
Listify polyakov links, add Drepper's new proposal, note that FreeBSD 7 might move to 1:1修订版 1.210 2006/07/27 10:18:58 dank
列出 polyakov 链接,添加 Drepper 的新提案,注意 FreeBSD 7 可能会转向 1:1Revision 1.209 2006/07/13 15:07:03 dank
link to Scale! library, updated Polyakov links修订版 1.209 2006/07/13 15:07:03 潮湿
链接到 Scale!库,更新了 Polyakov 链接Revision 1.208 2006/07/13 14:50:29 dank
Link to Polyakov's patches修订版 1.208 2006/07/13 14:50:29 潮湿
链接到 Polyakov 的补丁Revision 1.207 2003/11/03 08:09:39 dank
Link to Linus's message deprecating the idea of aio_open修订版 1.207 2003/11/03 08:09:39 潮湿
链接到 Linus 的信息,该消息弃用了 aio_open 的想法Revision 1.206 2003/11/03 07:44:34 dank
link to userver修订版 1.206 2003/11/03 07:44:34 潮湿
链接到 userverRevision 1.205 2003/11/03 06:55:26 dank
Link to Vivek Pei's new Flash paper, mention great specweb99 score修订版 1.205 2003/11/03 06:55:26 潮湿
链接到 Vivek Pei 的新 Flash 论文,提及 specweb99 的出色评分
Copyright 1999-2018 Dan Kegel
版权所有 1999-2018 Dan Kegel
dank@kegel.com
Last updated: 5 February 2014
最后更新日期:2014 年 2 月 5 日
(minor correction: 28 May 2019)
(小幅更正:2019 年 5 月 28 日)
[Return to www.kegel.com]
[返回 www.kegel.com]
发表评论