Mailing List Archive
tlug.jp Mailing List tlug archive tlug Mailing List Archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][tlug] Bogus load data
- Date: Tue, 12 Sep 2006 16:55:31 +0900
- From: "Josh Glover" <jmglov@example.com>
- Subject: [tlug] Bogus load data
Today I came across a host that was reporting bogus load data: : jglover@example.com; top 00:39:48 up 24 days, 5:16, 8 users, load average: 71.25, 71.17, 71.03 720 processes: 719 sleeping, 1 running, 0 zombie, 0 stopped CPU states: cpu user nice system irq softirq iowait idle total 2.8% 0.8% 1.1% 0.0% 0.2% 0.1% 94.5% cpu00 5.1% 0.9% 1.5% 0.0% 0.5% 0.1% 91.4% cpu01 0.5% 0.7% 0.7% 0.0% 0.0% 0.1% 97.6% Mem: 8209748k av, 8085384k used, 124364k free, 0k shrd, 195128k buff 3144408k actv, 3564156k in_d, 129840k in_c Swap: 2040212k av, 50420k used, 1989792k free 5901868k cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 8171 xxxxxx 15 0 47120 46M 12972 S 1.4 0.5 0:13 0 XxXxXxXxXxX 16297 xxxxxx 25 10 30672 29M 3284 S N 0.6 0.3 0:36 0 perl 14272 xxxxxx 0 -20 42288 40M 1004 S < 0.4 0.5 149:51 0 rvd [...] The root cause turned out to be: : jglover@example.com; tail /var/log/kern.log Sep 32 25:20:02 xxxxxxxxxxx kernel: nfs: server yyyyyyyyyyyyyyy not responding, still trying Sep 32 25:20:02 xxxxxxxxxxx kernel: nfs: server yyyyyyyyyyyyyyy not responding, still trying But while I was trying to figure it out, I whipped up a C program to test the system response to see if the load was really off. You can check it out at: http://www.rafb.net/paste/results/19YhpB15.html I will also include it inline so people can comment: #include <stdio.h> #include <stdlib.h> #include <unistd.h> #include <sys/time.h> int main(void) { struct timeval tv_start, tv_stop; unsigned int usec = 20000; // fastest response that I've seen on Linux while (1) { if (gettimeofday(&tv_start, NULL) == -1) { fprintf(stderr, "gettimeofday() failed to set tv_start\n"); exit(1); } if (usleep(usec) == -1) { fprintf(stderr, "usleep() failed to sleep for %u microseconds\n", usec); exit(2); } if (gettimeofday(&tv_stop, NULL) == -1) { fprintf(stderr, "gettimeofday() failed to set tv_stop\n"); exit(1); } if (tv_stop.tv_usec < tv_start.tv_usec) tv_stop.tv_usec += 1000000; fprintf(stderr, "Tried to sleep for %u microseconds, actually slept " "for %u microseconds\n", usec, (unsigned int)(tv_stop.tv_usec - tv_start.tv_usec)); usec += 20000; if (usec == 1000000) break; sleep(1); } return 0; } // main() So, thoughts? This is extremely unscientific, but do you think it might give some indication of a problem on a system that is *really* heavily loaded? My idea was that if the system was busy, this would report trying to sleep for a short time, but actually sleeping for a much longer one, due to slow context switches. The "sleep(1);" line at the end of the main loop is to ensure at least one context switch occurs between usleep() syscalls.From test runs, I found data like the following on an unloaded machine:: jglover@example.com; gcc -Wall -o usleep usleep.c : jglover@example.com; ./usleep Tried to sleep for 20000 microseconds, actually slept for 27528 microseconds Tried to sleep for 40000 microseconds, actually slept for 50047 microseconds Tried to sleep for 60000 microseconds, actually slept for 69989 microseconds Tried to sleep for 80000 microseconds, actually slept for 90004 microseconds Tried to sleep for 100000 microseconds, actually slept for 109777 microseconds [...] Tried to sleep for 500000 microseconds, actually slept for 510023 microseconds Tried to sleep for 520000 microseconds, actually slept for 530119 microseconds Tried to sleep for 540000 microseconds, actually slept for 550026 microseconds Tried to sleep for 560000 microseconds, actually slept for 570025 microseconds Tried to sleep for 580000 microseconds, actually slept for 590027 microseconds [...] Tried to sleep for 900000 microseconds, actually slept for 910042 microseconds Tried to sleep for 920000 microseconds, actually slept for 930042 microseconds Tried to sleep for 940000 microseconds, actually slept for 950062 microseconds Tried to sleep for 960000 microseconds, actually slept for 970053 microseconds Tried to sleep for 980000 microseconds, actually slept for 990054 microseconds Cheers, Josh
- Follow-Ups:
- Re: [tlug] Bogus load data
- From: Junichi Uekawa
Home | Main Index | Thread Index
- Prev by Date: Re: [tlug] OT Posting
- Next by Date: Re: [tlug] [ANN] TLUG Special Nomikai feat. Heinz-M. Graesing
- Previous by thread: Re: [tlug] OT Posting
- Next by thread: Re: [tlug] Bogus load data
- Index(es):
Home Page Mailing List Linux and Japan TLUG Members Links