Sie sind nicht angemeldet.

1

07.01.2016, 22:24

Nagios spinnt seit ein paar Tagen herum

Hallo Leute,

also seit gut 2 Wochen spinnt mein Nagios4 herum, aber nicht nur das anscheinend spinnt es nur weil der ganze Server zickt. DNSauflösungen dauern, Login dauert. Nagios lässt alle CPUkerne auf 100% laufen. Logs sehen dann so aus:

Wenn Nagios auf 100% läuft, dann sieht das im journalctl so aus:

Quellcode

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
an 05 18:52:54 itmgmt nagios[180]: wproc:   early_timeout=0; exited_ok=1; wait_status=768; error_code=0;
Jan 05 18:52:54 itmgmt nagios[180]: wproc:   stdout line 01: check_dhcp: Invalid hostname/address -
Jan 05 18:52:54 itmgmt nagios[180]: wproc: Core Worker 190: job 257638800 with pid 4269 reaped at timeout. timeouts=122; started=18538
Jan 05 18:52:54 itmgmt nagios[180]: wproc: Core Worker 190: job 18536 (pid=4276) timed out. Killing it
Jan 05 18:52:54 itmgmt nagios[180]: wproc: SERVICE PERFDATA job 18536 from worker Core Worker 190 is a non-check helper but exited with return code 3
Jan 05 18:52:54 itmgmt nagios[180]: wproc:   early_timeout=0; exited_ok=1; wait_status=768; error_code=0;
Jan 05 18:52:54 itmgmt nagios[180]: wproc:   stdout line 01: check_dhcp: Invalid hostname/address -
Jan 05 18:52:54 itmgmt nagios[180]: wproc: Core Worker 190: job 257656160 with pid 4276 reaped at timeout. timeouts=123; started=18538
Jan 05 18:52:54 itmgmt nagios[180]: wproc: Core Worker 190: kill(-4237, SIGKILL) failed: Operation not permitted
Jan 05 18:52:54 itmgmt nagios[180]: wproc: Core Worker 190: job 18533 (pid=4237): Dormant child reaped
Jan 05 18:52:59 itmgmt nagios[180]: wproc: Core Worker 192: job 18541 (pid=4347) timed out. Killing it
Jan 05 18:52:59 itmgmt nagios[180]: wproc: Core Worker 192: kill(-4347, SIGKILL) failed: Operation not permitted
Jan 05 18:52:59 itmgmt nagios[180]: wproc: Core Worker 188: job 18541 (pid=4351) timed out. Killing it
Jan 05 18:52:59 itmgmt nagios[180]: wproc: Core Worker 188: kill(-4351, SIGKILL) failed: Operation not permitted
Jan 05 18:52:59 itmgmt nagios[180]: wproc: Core Worker 191: job 18542 (pid=4360) timed out. Killing it
Jan 05 18:52:59 itmgmt nagios[180]: wproc: Core Worker 191: kill(-4360, SIGKILL) failed: Operation not permitted
Jan 05 18:52:59 itmgmt nagios[180]: wproc: Core Worker 187: job 18542 (pid=4352) timed out. Killing it
Jan 05 18:52:59 itmgmt nagios[180]: wproc: Core Worker 187: kill(-4352, SIGKILL) failed: Operation not permitted
Jan 05 18:52:59 itmgmt nagios[180]: wproc: Core Worker 190: job 18542 (pid=4359) timed out. Killing it
Jan 05 18:52:59 itmgmt nagios[180]: wproc: Core Worker 190: kill(-4359, SIGKILL) failed: Operation not permitted
Jan 05 18:52:59 itmgmt nagios[180]: wproc: Core Worker 189: job 18542 (pid=4367) timed out. Killing it
Jan 05 18:52:59 itmgmt nagios[180]: wproc: Core Worker 189: kill(-4367, SIGKILL) failed: Operation not permitted
Jan 05 18:53:19 itmgmt nagios[180]: wproc: SERVICE PERFDATA job 18541 from worker Core Worker 192 timed out after 25.13s
Jan 05 18:53:19 itmgmt nagios[180]: wproc:   early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Jan 05 18:53:19 itmgmt nagios[180]: wproc:   stdout line 01: check_dhcp: Invalid hostname/address -
Jan 05 18:53:19 itmgmt nagios[180]: wproc: Core Worker 192: job 18542 (pid=4361) timed out. Killing it
Jan 05 18:53:19 itmgmt nagios[180]: wproc: HOST PERFDATA job 18542 from worker Core Worker 192 is a non-check helper but exited with return code 3
Jan 05 18:53:19 itmgmt nagios[180]: wproc:   early_timeout=0; exited_ok=1; wait_status=768; error_code=0;
Jan 05 18:53:19 itmgmt nagios[180]: wproc:   stdout line 01: check_dhcp: Invalid hostname/address -
Jan 05 18:53:19 itmgmt nagios[180]: wproc: Core Worker 192: job 0 with pid 4361 reaped at timeout. timeouts=116; started=18546
Jan 05 18:53:19 itmgmt nagios[180]: wproc: Core Worker 192: job 18543 (pid=4372) timed out. Killing it
Jan 05 18:53:19 itmgmt nagios[180]: wproc: Core Worker 192: kill(-4372, SIGKILL) failed: Operation not permitted
Jan 05 18:53:19 itmgmt nagios[180]: wproc: SERVICE PERFDATA job 18543 from worker Core Worker 192 timed out after 24.94s
Jan 05 18:53:19 itmgmt nagios[180]: wproc:   early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Jan 05 18:53:19 itmgmt nagios[180]: wproc:   stdout line 01: check_dhcp: Invalid hostname/address -
Jan 05 18:53:19 itmgmt nagios[180]: wproc: Core Worker 192: job 18544 (pid=4379) timed out. Killing it
Jan 05 18:53:19 itmgmt nagios[180]: wproc: SERVICE PERFDATA job 18544 from worker Core Worker 192 is a non-check helper but exited with return code 3
Jan 05 18:53:19 itmgmt nagios[180]: wproc:   early_timeout=0; exited_ok=1; wait_status=768; error_code=0;
Jan 05 18:53:19 itmgmt nagios[180]: wproc:   stdout line 01: check_dhcp: Invalid hostname/address -
Jan 05 18:53:19 itmgmt nagios[180]: wproc: Core Worker 192: job 0 with pid 4379 reaped at timeout. timeouts=117; started=18546
Jan 05 18:53:19 itmgmt nagios[180]: wproc: Core Worker 192: job 18545 (pid=4387) timed out. Killing it
Jan 05 18:53:19 itmgmt nagios[180]: wproc: HOST PERFDATA job 18545 from worker Core Worker 192 is a non-check helper but exited with return code 3
Jan 05 18:53:19 itmgmt nagios[180]: wproc:   early_timeout=0; exited_ok=1; wait_status=768; error_code=0;
Jan 05 18:53:19 itmgmt nagios[180]: wproc:   stdout line 01: check_dhcp: Invalid hostname/address -
Jan 05 18:53:19 itmgmt nagios[180]: wproc: Core Worker 192: job -284925024 with pid 4387 reaped at timeout. timeouts=118; started=18546
Jan 05 18:53:19 itmgmt nagios[180]: wproc: Core Worker 192: kill(-4347, SIGKILL) failed: Operation not permitted
Jan 05 18:53:19 itmgmt nagios[180]: wproc: Core Worker 192: job 18541 (pid=4347): Dormant child reaped
Jan 05 18:53:19 itmgmt nagios[180]: wproc: Core Worker 192: kill(-4372, SIGKILL) failed: Operation not permitted
Jan 05 18:53:19 itmgmt nagios[180]: wproc: Core Worker 192: job 18543 (pid=4372): Dormant child reaped
Jan 05 18:53:19 itmgmt nagios[180]: wproc: HOST PERFDATA job 18541 from worker Core Worker 188 timed out after 25.13s
Jan 05 18:53:19 itmgmt nagios[180]: wproc:   early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Jan 05 18:53:19 itmgmt nagios[180]: wproc:   stdout line 01: check_dhcp: Invalid hostname/address -
Jan 05 18:53:19 itmgmt nagios[180]: wproc: Core Worker 188: job 18542 (pid=4368) timed out. Killing it
Jan 05 18:53:19 itmgmt nagios[180]: wproc: SERVICE PERFDATA job 18542 from worker Core Worker 188 is a non-check helper but exited with return code 3
Jan 05 18:53:19 itmgmt nagios[180]: wproc:   early_timeout=0; exited_ok=1; wait_status=768; error_code=0;
Jan 05 18:53:19 itmgmt nagios[180]: wproc:   stdout line 01: check_dhcp: Invalid hostname/address -
Jan 05 18:53:19 itmgmt nagios[180]: wproc: Core Worker 188: job 1592714656 with pid 4368 reaped at timeout. timeouts=116; started=18545
Jan 05 18:53:19 itmgmt nagios[180]: wproc: Core Worker 188: job 18543 (pid=4374) timed out. Killing it
Jan 05 18:53:19 itmgmt nagios[180]: wproc: SERVICE PERFDATA job 18543 from worker Core Worker 188 is a non-check helper but exited with return code 3
Jan 05 18:53:19 itmgmt nagios[180]: wproc:   early_timeout=0; exited_ok=1; wait_status=768; error_code=0;
Jan 05 18:53:19 itmgmt nagios[180]: wproc:   stdout line 01: check_dhcp: Invalid hostname/address -
Jan 05 18:53:19 itmgmt nagios[180]: wproc: Core Worker 188: job 1592713904 with pid 4374 reaped at timeout. timeouts=117; started=18545
Jan 05 18:53:19 itmgmt nagios[180]: wproc: Core Worker 188: job 18544 (pid=4380) timed out. Killing it
Jan 05 18:53:19 itmgmt nagios[180]: wproc: SERVICE PERFDATA job 18542 from worker Core Worker 187 timed out after 25.13s
Jan 05 18:53:19 itmgmt nagios[180]: wproc:   early_timeout=1; exited_ok=0; wait_status=0; error_code=62;
Jan 05 18:53:19 itmgmt nagios[180]: wproc:   stdout line 01: check_dhcp: Invalid hostname/address -
Jan 05 18:53:19 itmgmt nagios[180]: wproc: Core Worker 187: job 18543 (pid=4369) timed out. Killing it
Jan 05 18:53:19 itmgmt nagios[180]: wproc: SERVICE PERFDATA job 18543 from worker Core Worker 187 is a non-check helper but exited with return code 3
Jan 05 18:53:19 itmgmt nagios[180]: wproc:   early_timeout=0; exited_ok=1; wait_status=768; error_co


Stoppe ich Nagios sind die Meldungen mal weg. Läuft Nagios normal (ist nur bei einem Reboot möglich) schreibt es das Log auch total voll, war vorher auch nicht so:

Quellcode

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
Jan 07 22:20:11 itmgmt sudo[25421]:   nagios : TTY=unknown ; PWD=/tmp ; USER=root ; COMMAND=/usr/lib/nagios/plugins/check_dhcp -u -s
Jan 07 22:20:11 itmgmt sudo[25421]: pam_unix(sudo:session): session opened for user root by (uid=0)
Jan 07 22:20:11 itmgmt systemd[1]: Started Session c201 of user root.
Jan 07 22:20:11 itmgmt sudo[25421]: pam_unix(sudo:session): session closed for user root
Jan 07 22:20:11 itmgmt nagios[24482]: wproc: SERVICE PERFDATA job 66 from worker Core Worker 24487 is a non-check helper but exited with return code 3
Jan 07 22:20:11 itmgmt nagios[24482]: wproc:   early_timeout=0; exited_ok=1; wait_status=768; error_code=0;
Jan 07 22:20:11 itmgmt nagios[24482]: wproc:   stdout line 01: check_dhcp: Invalid hostname/address -
Jan 07 22:20:11 itmgmt sudo[25425]:   nagios : TTY=unknown ; PWD=/tmp ; USER=root ; COMMAND=/usr/lib/nagios/plugins/check_dhcp -u -s
Jan 07 22:20:11 itmgmt sudo[25425]: pam_unix(sudo:session): session opened for user root by (uid=0)
Jan 07 22:20:11 itmgmt systemd[1]: Started Session c202 of user root.
Jan 07 22:20:11 itmgmt sudo[25425]: pam_unix(sudo:session): session closed for user root
Jan 07 22:20:11 itmgmt nagios[24482]: wproc: SERVICE PERFDATA job 66 from worker Core Worker 24486 is a non-check helper but exited with return code 3
Jan 07 22:20:11 itmgmt nagios[24482]: wproc:   early_timeout=0; exited_ok=1; wait_status=768; error_code=0;
Jan 07 22:20:11 itmgmt nagios[24482]: wproc:   stdout line 01: check_dhcp: Invalid hostname/address -
Jan 07 22:20:11 itmgmt sudo[25427]:   nagios : TTY=unknown ; PWD=/tmp ; USER=root ; COMMAND=/usr/lib/nagios/plugins/check_dhcp -u -s
Jan 07 22:20:11 itmgmt sudo[25427]: pam_unix(sudo:session): session opened for user root by (uid=0)
Jan 07 22:20:11 itmgmt systemd[1]: Started Session c203 of user root.
Jan 07 22:20:11 itmgmt sudo[25427]: pam_unix(sudo:session): session closed for user root
Jan 07 22:20:11 itmgmt nagios[24482]: wproc: SERVICE PERFDATA job 67 from worker Core Worker 24483 is a non-check helper but exited with return code 3
Jan 07 22:20:11 itmgmt nagios[24482]: wproc:   early_timeout=0; exited_ok=1; wait_status=768; error_code=0;
Jan 07 22:20:11 itmgmt nagios[24482]: wproc:   stdout line 01: check_dhcp: Invalid hostname/address -
Jan 07 22:20:12 itmgmt sudo[25432]:   nagios : TTY=unknown ; PWD=/tmp ; USER=root ; COMMAND=/usr/lib/nagios/plugins/check_dhcp -u -s
Jan 07 22:20:12 itmgmt sudo[25432]: pam_unix(sudo:session): session opened for user root by (uid=0)
Jan 07 22:20:12 itmgmt systemd[1]: Started Session c204 of user root.
Jan 07 22:20:12 itmgmt sudo[25432]: pam_unix(sudo:session): session closed for user root
Jan 07 22:20:12 itmgmt nagios[24482]: wproc: SERVICE PERFDATA job 67 from worker Core Worker 24484 is a non-check helper but exited with return code 3
Jan 07 22:20:12 itmgmt nagios[24482]: wproc:   early_timeout=0; exited_ok=1; wait_status=768; error_code=0;
Jan 07 22:20:12 itmgmt nagios[24482]: wproc:   stdout line 01: check_dhcp: Invalid hostname/address -
Jan 07 22:20:14 itmgmt sudo[25439]:   nagios : TTY=unknown ; PWD=/tmp ; USER=root ; COMMAND=/usr/lib/nagios/plugins/check_dhcp -u -s
Jan 07 22:20:14 itmgmt sudo[25439]: pam_unix(sudo:session): session opened for user root by (uid=0)
Jan 07 22:20:14 itmgmt sudo[25440]:   nagios : TTY=unknown ; PWD=/tmp ; USER=root ; COMMAND=/usr/lib/nagios/plugins/check_dhcp -u -s
Jan 07 22:20:14 itmgmt sudo[25440]: pam_unix(sudo:session): session opened for user root by (uid=0)
Jan 07 22:20:14 itmgmt systemd[1]: Started Session c205 of user root.
Jan 07 22:20:14 itmgmt systemd[1]: Started Session c206 of user root.
Jan 07 22:20:14 itmgmt sudo[25440]: pam_unix(sudo:session): session closed for user root
Jan 07 22:20:14 itmgmt sudo[25439]: pam_unix(sudo:session): session closed for user root
Jan 07 22:20:14 itmgmt nagios[24482]: wproc: SERVICE PERFDATA job 68 from worker Core Worker 24484 is a non-check helper but exited with return code 3
Jan 07 22:20:14 itmgmt nagios[24482]: wproc:   early_timeout=0; exited_ok=1; wait_status=768; error_code=0;
Jan 07 22:20:14 itmgmt nagios[24482]: wproc:   stdout line 01: check_dhcp: Invalid hostname/address -
Jan 07 22:20:14 itmgmt nagios[24482]: wproc: SERVICE PERFDATA job 68 from worker Core Worker 24485 is a non-check helper but exited with return code 3
Jan 07 22:20:14 itmgmt nagios[24482]: wproc:   early_timeout=0; exited_ok=1; wait_status=768; error_code=0;
Jan 07 22:20:14 itmgmt nagios[24482]: wproc:   stdout line 01: check_dhcp: Invalid hostname/address -
Jan 07 22:20:15 itmgmt sudo[25450]:   nagios : TTY=unknown ; PWD=/tmp ; USER=root ; COMMAND=/usr/lib/nagios/plugins/check_dhcp -u -s
Jan 07 22:20:15 itmgmt sudo[25450]: pam_unix(sudo:session): session opened for user root by (uid=0)

und auch das in Hunderttausend facher Ausgabe:

Quellcode

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
Jan 06 16:54:16 itmgmt sudo[1353]: pam_systemd(sudo:session): Failed to create session: Unterbrechung während des Betriebssystemaufrufs
Jan 06 16:54:16 itmgmt sudo[1352]: pam_systemd(sudo:session): Failed to create session: Unterbrechung während des Betriebssystemaufrufs
Jan 06 16:54:16 itmgmt sudo[1351]: pam_systemd(sudo:session): Failed to create session: Unterbrechung während des Betriebssystemaufrufs
Jan 06 16:54:16 itmgmt sudo[1334]: pam_systemd(sudo:session): Failed to create session: Unterbrechung während des Betriebssystemaufrufs
Jan 06 16:54:16 itmgmt sudo[1350]: pam_systemd(sudo:session): Failed to create session: Unterbrechung während des Betriebssystemaufrufs
Jan 06 16:54:16 itmgmt sudo[1339]: pam_systemd(sudo:session): Failed to create session: Unterbrechung während des Betriebssystemaufrufs
Jan 06 16:54:16 itmgmt sudo[1333]: pam_systemd(sudo:session): Failed to create session: Unterbrechung während des Betriebssystemaufrufs
Jan 06 16:54:16 itmgmt sudo[1332]: pam_systemd(sudo:session): Failed to create session: Unterbrechung während des Betriebssystemaufrufs
Jan 06 16:54:16 itmgmt sudo[1307]: pam_systemd(sudo:session): Failed to create session: Unterbrechung während des Betriebssystemaufrufs
Jan 06 16:54:16 itmgmt sudo[1320]: pam_systemd(sudo:session): Failed to create session: Unterbrechung während des Betriebssystemaufrufs
Jan 06 16:54:13 itmgmt sudo[1253]: pam_systemd(sudo:session): Failed to create session: Activation of org.freedesktop.login1 timed out
Jan 06 16:54:13 itmgmt sudo[1254]: pam_systemd(sudo:session): Failed to create session: Activation of org.freedesktop.login1 timed out
Jan 06 16:54:13 itmgmt sudo[1238]: pam_systemd(sudo:session): Failed to create session: Activation of org.freedesktop.login1 timed out
Jan 06 16:54:13 itmgmt sudo[1255]: pam_systemd(sudo:session): Failed to create session: Activation of org.freedesktop.login1 timed out
Jan 06 16:54:13 itmgmt sudo[1251]: pam_systemd(sudo:session): Failed to create session: Activation of org.freedesktop.login1 timed out
Jan 06 16:54:13 itmgmt sudo[1256]: pam_systemd(sudo:session): Failed to create session: Activation of org.freedesktop.login1 timed out


Das was ich jetzt heute noch versucht habe war den Dienst "systemd-logind" neu zustarten: http://serverfault.com/questions/707377/…ogin1-timed-out
Tatsächlich tut es jetzt wieder normal. Nur wie lange. Ist wohl was ziemlich broken.

lg
boospy
Gentoo Can Do!

Wiki auf: http://deepdoc.at

Dieser Beitrag wurde bereits 1 mal editiert, zuletzt von »boospy« (07.01.2016, 22:59)