Archive for the Performance Category

Speed Up Your Mail Server

Posted by Filed Under Performance with Comments Off

One of the most important factors of speed for a mail server is the ability to query DNS as quickly as possible.  DNS queries can be made faster by using one of two options.  The first option is to provide a DNS server in the /etc/resolv.conf file that is local and that provides recursive lookups.  Proximity for DNS means speed.  The transfer of queries over long distances just increases the delay so use a DNS server that is close to the mail server.  Also list at least two nameservers in /etc/resolv.conf.

nameserver 12.32.34.32
nameser ver 192.168.4.1

Note in this example one DNS server is local the other is outside the local network.  By providing two mail servers the mail server can still function if one DNS server is not available.

The second necessary option is to make sure that the DNS server you use for the mail server will allow the mail server to make recursive requests, not just iterative requests.  When a machine is able to make recursive requests of a DNS server, that DNS server is required to find a definitive answer to any queries requested.  In other words, the DNS server must come up with “the answer” to any queries.  If a request is only iterative, it means the DNS server can provide it’s best guess, it is not required to do the research for a definitive answer.   Below is an options line found in a DNS server that indicates that a subnet, the localhost and a single IP Address have the access to make recursive requests.

options {
allow-recursion { 192.168.4.0/24; localhost;  192.168.3.2; };
};

If enough speed cannot be attained by using a DNS server, then a caching-nameserver can be installed on the mail server.  A caching  nameserver that is either located on the Postfix mail server itself or very close on the network is one the the most significant options you can use.  Because mail is closely tied to DNS, the faster you can resolve domains the more efficient everything will be.  The cache is significant because once a domain is in the cache the lookup is almost instant.

yum install -y caching-nameserver
cd /etc
cp named.caching-nameserver.conf named.conf
chown root:named named.conf
service named start

Note the configuration of the file that was copied to named.conf allows the localhost (the mail server) recursive queries and a cache.

options {
listen-on port 53 { 127.0.0.1; };
listen-on-v6 port 53 { ::1; };
directory       “/var/named”;
dump-file       “/var/named/data/cache_dump.db”;
statistics-file “/var/named/data/named_stats.txt”;
memstatistics-file “/var/named/data/named_mem_stats.txt”;

allow-query     { localhost; };
allow-query-cache { localhost; };
};
logging {
channel default_debug {
file “data/named.run”;
severity dynamic;
};
};
view localhost_resolver {
match-clients      { localhost; };
match-destinations { localhost; };
recursion yes;
include “/etc/named.rfc1912.zones”;
};

Edit /etc/resolv.conf and make sure the first nameserver is the localhost.
nameserver 127.0.0.1

You can add a second and third nameserver if you want redundancy.

Test your caching nameserver by installing bind-utils so you can so some tests.

yum install -y bind-utils

After you have installed the caching-nameserver correctly perform perform a query for a domain and note the time it takes (highlighted).  Then perform it again and note how much it has changed as the second query comes from the cache.

dig google.com

; <<>> DiG 9.3.6-P1-RedHat-9.3.6-4.P1.el5_5.3 <<>> google.com
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49530
;; flags: qr rd ra; QUERY: 1, ANSWER: 6, AUTHORITY: 4, ADDITIONAL: 0

;; QUESTION SECTION:
;google.com.            IN    A

;; ANSWER SECTION:
google.com.        300    IN    A    209.85.225.103
google.com.        300    IN    A    209.85.225.104
google.com.        300    IN    A    209.85.225.105
google.com.        300    IN    A    209.85.225.106
google.com.        300    IN    A    209.85.225.147
google.com.        300    IN    A    209.85.225.99

;; AUTHORITY SECTION:
google.com.        172800    IN    NS    ns4.google.com.
google.com.        172800    IN    NS    ns1.google.com.
google.com.        172800    IN    NS    ns2.google.com.
google.com.        172800    IN    NS    ns3.google.com.

;; Query time: 144 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sat Feb 12 12:27:37 2011
;; MSG SIZE  rcvd: 196

dig google.com
—cut—
;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Sat Feb 12 12:28:07 2011
;; MSG SIZE  rcvd: 196

Manage Mail Server Connections

Posted by Filed Under Performance with Comments Off

One aspect of managing mail server connections is managing Keep-Alives.  Managing Keep-Alives with TCP connections may increase reliability of connections or save resources on the server.

Once a connection is made with a mail server, the TCP protocol does not determine that data must be exchanged in order to maintain the connection.  It is possible for a connection to remain open for a long period of time without exchanging data.  Keep-Alive helps the server determine if the connection is no longer available as there is no point in maintaining resources if the connection is not available.

Resource Management
Here is an example of a client connected to a mail server.  Note how many connections are made to the secure IMAP.  Depending upon how many folders in your IMAP account and depending on how many accounts, you will have multiple connections to manage.

tcp        0      0 192.168.3.4:49215     192.168.3.69:993        ESTABLISHED
tcp        0      0 192.168.3.4:49216     192.168.3.69:993        ESTABLISHED
tcp        0      0 192.168.3.4:44262     192.168.3.69:993        ESTABLISHED
tcp        0      0 192.168.3.4:44226     192.168.3.69:993        ESTABLISHED
tcp        0      0 192.168.3.4:44263     192.168.3.69:993        ESTABLISHED

The problem with so many connections to the mail server is  to manage resources for the mail server when you have a lot of  users and many connections.  Keep-Alives is one aspect of managing server resources.

By managing Keep-Alive settings you can either save resources that are being wasted or increase the Keep-Alive settings to insure more stable connections.

Keep-Alive Settings
There are three variables that refer to keep alives.
.
This setting is the interval between subsequential keepalive tests.  This setting occurs regardless of what is happening on the connection.
/proc/sys/net/ipv4/tcp_keepalive_intvl

This setting is the interval between the last data packet sent and the first keepalive test.  Once the connection is marked as keepalive, the counter is not used.  Note, ACKs are not going to be considered data.
/proc/sys/net/ipv4/tcp_keepalive_time

This setting is the number of unacknowledged tests to send before considering the connection dead and then notifiying the application layer.
/proc/sys/net/ipv4/tcp_keepalive_probes

Here are default settings.
cat /proc/sys/net/ipv4/tcp_keepalive_intvl
75
cat /proc/sys/net/ipv4/tcp_keepalive_time
7200
cat /proc/sys/net/ipv4/tcp_keepalive_probes
9

These settings allow for connection getting dropped after 2 hours and 11 seconds.  Adjusting these settings can allow for longer connection times or lesser connection times to save on system resources.

Changing Keep-Alive Settings
For testing purposes the best thing to do is to echo a setting the current setting.  This will go away on restart.  For example, if your connections were not as reliable as you needed, clients complained about dropped connections, then increase your Keep-Alive settings.

echo 15 > /proc/sys/net/ipv4/tcp_keepalive_probes

If you were more interested in saving resources on the mail server, then decrease the time for Keep-Alive.

echo 6000 > /proc/sys/net/ipv4/tcp_keepalive_time

Whatever you do test and listen to clients to verify your settings.

Postfix Stress Test with smtp-source and top

Posted by Filed Under Performance with Comments Off

Testing Load with smtp-source and top
In order to evaluate the load on your box you can run smtp-source and combine that with snapshots of top to evaluate the load on the server.  Now open two terminals and in one run the smtp-source command and in the  other snapshots for top.

Terminal #1
# time /usr/sbin/smtp-source -s 40 -l 10120 -m 500 -c -f test@example.com -t        mike@example.com localhost:25

This example shows 40 parallel sessions (-s 40), almost 10KB sized messages (-l 10120), 500 messages sent (-m 500), counter display (-c), envelope sender and receiver (-f test@example.com -t  mike@example.com) and connection on port 25 of the localhost (localhost:25).

Terminal #2
top -b -n10 -d7 > top.txt
This command with top will give you 10 snapshots (-n10) at 7 second intervals (-d7) and create a file called top.txt.

As you evaluate the sample data there are several fields to pay close attention to.    The first is the Cpu wa or “amount of time the CPU has been waiting for I/O to complete.”  This at any sustained level will dramatically decrease the speed or your mail server.  Here is the wa from several snapshots taken and you can see that running at the load that is presented by the smtp-source is not sustainable.  Spikes in wa are not a problem is just that your mail server will not be able to maintain anything over 10%, maybe even less.

17.5%wa – 21.5%wa – 20.7%wa – 28.6%wa – 47.9%wa

When you evaluate your I/O be sure that you also evaluate the additional resource load from scanning for virus and also Spam.  When you add both of these on top of your mail server that  whole process can slow down even more.  This whole process is hard to nail down as a science but at least this kind of test will provide you with data that you could compare with multiple mail servers or that you can have a starting point for evaluation.

top – 06:57:35 up 25 min,  2 users,  load average: 0.02, 0.21, 0.17
Tasks: 107 total,   1 running, 106 sleeping,   0 stopped,   0 zombie
Cpu(s):  5.6%us,  7.8%sy,  0.0%ni, 66.9%id, 17.5%wa,  0.1%hi,  2.0%si,  0.0%st
Mem:    254368k total,   199080k used,    55288k free,    18384k buffers
Swap:   761848k total,        0k used,   761848k free,   110884k cached

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
4354 syslog    20   0  1936  684  532 S  1.1  0.3   0:01.41 syslogd
4684 postfix   20   0  6300 2728 1468 S  0.9  1.1   0:02.41 qmgr

top – 06:57:42 up 25 min,  2 users,  load average: 0.02, 0.20, 0.17
Tasks: 113 total,   2 running, 111 sleeping,   0 stopped,   0 zombie
Cpu(s):  9.0%us, 10.7%sy,  0.0%ni, 55.1%id, 21.5%wa,  0.4%hi,  3.3%si,  0.0%st
Mem:    254368k total,   204260k used,    50108k free,    19536k buffers
Swap:   761848k total,        0k used,   761848k free,   112776k cached

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
5485 postfix   20   0  5900 3056 2444 R  2.1  1.2   0:00.15 smtpd
4354 syslog    20   0  1936  684  532 S  1.6  0.3   0:01.52 syslogd
4684 postfix   20   0  6300 2728 1468 S  1.3  1.1   0:02.50 qmgr
5495 postfix   20   0  5476 1800 1460 S  1.3  0.7   0:00.09 cleanup

top – 06:57:49 up 26 min,  2 users,  load average: 0.02, 0.20, 0.17
Tasks: 112 total,   1 running, 111 sleeping,   0 stopped,   0 zombie
Cpu(s):  8.3%us, 13.0%sy,  0.0%ni, 54.5%id, 20.7%wa,  0.3%hi,  3.3%si,  0.0%st
Mem:    254368k total,   207408k used,    46960k free,    20836k buffers
Swap:   761848k total,        0k used,   761848k free,   114900k cached

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
4354 syslog    20   0  1936  684  532 S  1.9  0.3   0:01.65 syslogd
4684 postfix   20   0  6420 2760 1468 S  1.4  1.1   0:02.60 qmgr
4677 root      20   0  5396 1736 1408 S  1.0  0.7   0:00.91 master
5500 postfix   20   0  5412 1688 1372 S  0.9  0.7   0:00.06 trivial-rewrite

top – 06:57:56 up 26 min,  2 users,  load average: 1.62, 0.52, 0.27
Tasks: 155 total,   3 running, 152 sleeping,   0 stopped,   0 zombie
Cpu(s): 12.8%us, 16.4%sy,  0.0%ni, 40.5%id, 28.6%wa,  0.4%hi,  1.3%si,  0.0%st
Mem:    254368k total,   228268k used,    26100k free,    21600k buffers
Swap:   761848k total,        0k used,   761848k free,   117700k cached

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
5501 postfix   20   0  5900 3060 2444 S  4.0  1.2   0:00.29 smtpd
4684 postfix   20   0  6800 3164 1468 R  3.9  1.2   0:02.87 qmgr
5510 postfix   20   0  5476 1792 1460 D  2.6  0.7   0:00.18 cleanup
4354 syslog    20   0  1936  684  532 S  1.6  0.3   0:01.76 syslogd
4677 root      20   0  5396 1736 1408 S  1.4  0.7   0:01.01 master

top – 06:58:03 up 26 min,  2 users,  load average: 2.29, 0.68, 0.33
Tasks: 155 total,   1 running, 154 sleeping,   0 stopped,   0 zombie
Cpu(s):  9.7%us, 16.8%sy,  0.0%ni, 25.1%id, 47.9%wa,  0.0%hi,  0.6%si,  0.0%st
Mem:    254368k total,   233476k used,    20892k free,    22192k buffers
Swap:   761848k total,        0k used,   761848k free,   120992k cached

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
4684 postfix   20   0  6800 3164 1468 S  6.0  1.2   0:03.29 qmgr
4677 root      20   0  5396 1736 1408 S  1.8  0.7   0:01.14 master
4354 syslog    20   0  1936  684  532 S  1.3  0.3   0:01.85 syslogd

Solution for I/O Issues
Solutions may be increasing the CPU or number of CPUs available on the mail server and also increasing the ability to read/write on the disks.  Increasing read/write can be done by choosing SCSI or using RAID or even LVM.

SCSI
If you choose to select SCSI you have the advantages of the server being able to read/write to multiple disks at the same time.  When selecting SCSI be sure to select a brand with a large cache to enhance your speed and also select the fastest speeds you can afford.  Once you move to SCSI you can use hardware RAID or software RAID and use RAID 0 to increase read and writes.  The biggest problem with RAID 0 is that it does not provide redundancy.

LVM Striping
Striped logical volumes will lay down data on a number of drives, speeding up the I/O process.  When laying down each stripe the size of the stripe cannot exceed the size of the extent.  The striping will start with the first physical volume and each stripe or extent is then placed on the next physical volume.  The size of the stripe is limited to the size of the extent.
The LVM cannot determine if there are multiple physical volumes on the drive, so if you use striping on one disk with several physical volumes it will actually slow down performance instead of enhancing performance.

Postfix Stress Test

Posted by Filed Under Performance with Comments Off

One of the questions that you will want to solve is how much mail traffic can your hardware handle.  This is often why people overbuild hardware is that they just are not sure and no one wants to build a mail server and then rebuild in 3 months.  Fortunately, Postfix comes with a couple programs that you can stress your server with and get a general idea of what you need.

Hardware Considerations
When you are building a mail server, especially if you are building a mail server that will host multiple domains, it is very difficult to determine the necessary hardware as growth is unknown in two areas.  First, growth in terms of new domains or new accounts on the mail server is tough because business can change and staff may change dramatically in a 6 month period. When you build the mail server you want to build something that will potentially provide 3 years of service, maybe 5.  So you must compensate for the amount of growth for your business.  Second, growth is difficult to estimate based on the amount of Spam your server must be capable of managing.  This is a very frustrating aspect of mail servers in that potentially, Spam could triple in 3 months and it would have very little to do with how you are managing your mail server, so you must also prepare for these kinds of issues.  Spam is especially hard on resources as you will be running a programs like Spamassassin and an anti-virus program like ClamAv on each of these emails that hits your system.

Stress Test
The program smtp-source will use port 25  to simulate mail coming to your mail server to process.  In this simulation you can perform messages by themselves or in parallel.  There are several settings that you can modify to help determine the stress level that your server will best perform at.
In order to run the test you may have to comment out a few lines in your smtpd restrictions.
smtpd_recipient_restrictions =
warn_if_reject reject_non_fqdn_recipient
#   reject_non_fqdn_sender
#   reject_unknown_sender_domain
reject_unknown_recipient_domain
permit_mynetworks
reject_unauth_destination
reject_non_fqdn_hostname
reject_invalid_hostname
#   check_helo_access pcre:/etc/postfix/helo_checks
check_sender_mx_access cidr:/etc/postfix/bogus_mx
reject_rbl_client sbl-xbl.spamhaus.org
reject_unverified_sender
permit

Parallel sessions – This will indicate the number of concurrent sessions or maxprocesses that your server will be running.

Message size – You can test various message sizes to simulate the mail that you typically will receive on your server.

Total messages – You can determine the total messages  that you will test on receiving.

Display counter -This will just show a counter as the messages are received while the command is running.

# time /usr/sbin/smtp-source -s 20 -l 5120 -m 100 -c -f test@example.com -t mike@example.com localhost:25
100

real    0m2.664s
user    0m0.020s
sys    0m0.100s

This example shows 20 parallel sessions (-s 20), 5KB sized messages (-l 5120), 100 messages ent (-m 100), counter display (-c), envelope sender and receiver (-f test@example.com -t  mike@example.com) and connection on port 25 of the localhost (localhost:25).
The “100” indicates the total messages sent.  The real time (0m2.664s ) is the time the injection took,.
Here is an additional test on the same server that indicates an increase to 40 sessions, 10 KB mail size and 500 messages.  This gives you a way to evaluate the additional load on the server in terms of a comparison time.

# time /usr/sbin/smtp-source -s 40 -l 10120 -m 500 -c -f test@example.com -t    mike@example.com localhost:25
500

real    0m29.795s
user    0m0.200s
sys    0m0.530s