Modify

Opened 4 years ago

Closed 4 years ago

#14227 closed defect (fixed)

Procd system log: does not use the log_size option from /etc/config/system

Reported by: hnyman <hannu.nyman@…> Owned by: developers
Priority: normal Milestone: Chaos Calmer 15.05
Component: base system Version: Trunk
Keywords: procd systemlog Cc: hannu.nyman@…

Description

Earlier syslog implementations enabled the user to select the system log size. Default was 16 kB, but it could be increased with a log_size option in /etc/config/system

r36003 removed that as procd replaced old syslogd:
https://dev.openwrt.org/changeset/36003/#file31

As far as I can determine, the current implementation of procd syslog does apprently not use any options for the log size, but sets it always to 16 kB.

See lines 36, 46 and 270 here:
http://nbd.name/gitweb.cgi?p=luci2/procd.git;a=blob;f=syslog.c;h=01b6e7ec8472f4b003f40acf1e06b4e0c318799b;hb=f51f9cc5738d340423e44e678829402b367cf866#l36

E.g. with WNDR3700 the boot process produces so many log messages that the whole boot process is not any more visible in the 16 kB log after the boot.

Fix procd and add back to option to set a larger system log size.


Ps.
additionally, line 234 in syslog.c uses wrong variable in the error message. There should be "size" instead of "log_size", as "size" is used in the previous malloc.
http://nbd.name/gitweb.cgi?p=luci2/procd.git;a=blob;f=syslog.c;h=01b6e7ec8472f4b003f40acf1e06b4e0c318799b;hb=f51f9cc5738d340423e44e678829402b367cf866#l231

Attachments (0)

Change History (18)

comment:2 follow-up: Changed 4 years ago by hnyman <hannu.nyman@…>

As far as I understand, the 'logread' init code(?) that Jow pointed to, does not have any effect on the underlying procd log size.

Logging is initialised by state.c calling log_init() without any arguments, and then log_init in syslog.c initialises the log using the hardcoded 16 kB size to which the log_size variable is initialised (lines 270, 46 and 36).

I tested changing the default buffer size in syslog.c in procd source to 64 kB and I am again able to see the whole system log after boot.

+--- a/syslog.c
++++ b/syslog.c
+@@ -33,7 +33,7 @@
+ #include "procd.h"
+ #include "syslog.h"
+ 
+-#define LOG_DEFAULT_SIZE	(16 * 1024)
++#define LOG_DEFAULT_SIZE	(64 * 1024)
+ #define LOG_DEFAULT_SOCKET	"/dev/log"
+ #define LOG_LINE_LEN		256
+ #define SYSLOG_PADDING		16

comment:3 in reply to: ↑ 2 Changed 4 years ago by hnyman <hannu.nyman@…>

Replying to hnyman <hannu.nyman@…>:

I tested changing the default buffer size in syslog.c in procd source to 64 kB and I am again able to see the whole system log after boot.

Sadly that produced a nasty effect:
After some 15-20 hours uptime, the "logread" command stops responding and produces empty output after a 3 sec timeout.

The timeout made me wonder, and I decided to expriment with logread command line options. And to my surprise, the system log is there, even if it looks empty with plain "logread". By limiting the number of shown lines with "logread -l" option, practically the whole log can be brought visible. So, this is apparently related to reading from the log, and not to writing into the log.

Interestingly, each failing read attempt seems to cause the existing ubus instance to be killed, the effect of which can be seen in the system log:

Wed Oct 2 20:52:01 2013 kern.emerg Killing existing ubus instance, pid=408

Wed Oct 2 20:52:20 2013 kern.emerg Killing existing ubus instance, pid=2953

The 3s timeout probably comes out of logread.c in procd:
http://nbd.name/gitweb.cgi?p=luci2/procd.git;a=blob;f=logread.c;h=bd8c69710bb50671e225a0783444a5c5a1b0296b;hb=HEAD#l346

I have gone through procd sources (and also ubus to some extent) and I can't figure out why the increased log buffer size would cause trouble for logread.

comment:4 Changed 4 years ago by hnyman <hannu.nyman@…>

System log seems to remain readable with a 32 kB buffer, but a 48 or 64 kB caused the logread command to misbehave after 15-20 hours. 32 kB buffer has now run without problems over 8 days. I still haven't figured out, why a larger buffer would have that impact.

(One possible explanation is that somewhere in the code is a signed short integer with 16bit size and 32 kB would be its max value. Haven't found such variable, but still the best explanation so far.)

comment:5 Changed 4 years ago by hnyman <hannu.nyman@…>

Looks like it is not that straightforward:
my secondary router experienced yesterday a short interval of non-working syslog, but the functionality returned without any action a bit later... After 7.5 days of uptime, the router experienced a period of a non-working syslog, while after a few hours the functionality returned by itself. (Right now it is 8d 5h uptime and syslog works...)
During the period, when I noticed syslog not working, syslog showed "typical" ubus hangs...

Wed Oct 23 23:22:41 2013 daemon.notice netifd: Interface 'lan6' is now up
Wed Oct 23 23:23:11 2013 daemon.info dnsmasq[1371]: reading /tmp/resolv.conf.auto
Wed Oct 23 23:23:11 2013 daemon.info dnsmasq[1371]: using nameserver 2001:14b8:119::1#53
Wed Oct 23 23:23:11 2013 daemon.info dnsmasq[1371]: using nameserver 192.168.1.1#53
Wed Oct 23 23:23:11 2013 daemon.info dnsmasq[1371]: using local addresses only for domain lan
Wed Oct 23 23:26:05 2013 kern.emerg Killing existing ubus instance, pid=410

Wed Oct 23 23:26:05 2013 kern.emerg Killing existing ubus instance, pid=10370

Wed Oct 23 23:26:13 2013 kern.emerg Killing existing ubus instance, pid=10371

Wed Oct 23 23:26:24 2013 kern.emerg Killing existing ubus instance, pid=10390

Wed Oct 23 23:27:00 2013 kern.emerg Killing existing ubus instance, pid=10397

Wed Oct 23 23:52:40 2013 daemon.notice odhcp6c[1097]: Sending RENEW (timeout 1080s)
Wed Oct 23 23:52:40 2013 daemon.notice odhcp6c[1097]: Got a valid reply after 1ms
Wed Oct 23 23:52:40 2013 daemon.notice odhcp6c[1097]: Sending <POLL> (timeout 1800s)
Wed Oct 23 23:59:29 2013 authpriv.info dropbear[10523]: Child connection from 192.168.1.188:58741
Wed Oct 23 23:59:30 2013 authpriv.notice dropbear[10523]: Pubkey auth succeeded for 'root' with key md5 f5:54:a2:37:b2:0d:01:af:70:ac:3a:ac:9b:45:91:f9 from 192.168.1.188:58741

This behaviour make me to believe that something is wrong either with ubus or with procd.

comment:6 Changed 4 years ago by Anton van Bohemen <avbohemen@…>

I just found out the same. At first I thought it was because for some reason I still had syslogd and klogd enabled in busybox, but I just tested a new build without those and the logsize is still 16kB. I have set log_size to 256 kB in /etc/config/system, because a few vpn server connects also flood the log quite extensively (now even more with only 16kB). Any news on the investigation?

comment:7 Changed 4 years ago by hnyman <hannu.nyman@…>

I am still patching my build to get a larger log size. Meanwhile the code has been moved from procd to ubox, but I don't think the config option is used for anything in the new context, either. As far as I see, the size is still hardcoded into log daemon sources and the config option is not passed to the daemon.

--- trunk/package/system/ubox/patches/010-increase-log-size (revision 0)
+++ trunk/package/system/ubox/patches/010-increase-log-size (working copy)
@@ -0,0 +1,11 @@
+--- a/log/syslog.c
++++ b/log/syslog.c
+@@ -32,7 +32,7 @@
+ 
+ #include "syslog.h"
+ 
+-#define LOG_DEFAULT_SIZE  (16 * 1024)
++#define LOG_DEFAULT_SIZE  (32 * 1024)
+ #define LOG_DEFAULT_SOCKET    "/dev/log"
+ #define LOG_LINE_LEN      256
+ #define SYSLOG_PADDING        16

comment:8 Changed 4 years ago by Anton van Bohemen <avbohemen@…>

I see. Just did a build with size=256k. Here are my results:

  • After firstboot, logd was running, but the current syslog was empty.
  • Logread returned empty after about 2 seconds wait.
  • "logger testmessage" did not log the message.
  • Did a restart of logd, log showed up normal.
  • After filling the log with some data (> 32kB, don't know exact size), the log was empty again.
  • Had to restart logd to make it show anything again.

Created another build with size=32k. Log is still working fine, don't know for how long :) This is r38995 btw.

There still seems to be an issue with bigger logsizes...

comment:9 Changed 4 years ago by Anton van Bohemen <avbohemen@…>

Is r39333 supposed to fix this bug?

comment:10 Changed 4 years ago by hnyman <hannu.nyman@…>

Probably yes regarding the first/original part of this bug (size option not having any effect).

But it does not fix the second observation (that logread will often hang for 3 seconds and produce empty output with larger log sizes). However, the patch will likely make that bug more noticeable, as more users will be able to use larger log sizes without patching & recompiling ubox.

During the error condition the log is not empty, but instead logread fails to receive the log contents from ubox. So, even if "logread" fails, the command "logread -l 20" will work and will show correctly the last log items. I did some debugging during the weekend and it looked like the log data gets collected ok inside the "logd" daemon, but the data does not get passed back to logread which then returns empty after the 3 sec timeout. (In the error situation the callback function "read_cb" never gets called.)

comment:11 Changed 4 years ago by hnyman <hannu.nyman@…>

Like said in the previous comment, I have made debugging during the past few days. Debugging this is rather complicated, as logread sends the log request via ubus to logd daemon, which compiles the required data and sends it back to logread's callback function via ubus. Data is passed as blobmsg.

I have added debug code to both logread and logd. It looks like logd is able to always collect the log data correctly, but a large log output gets lost in the return journey via ubus and never reaches the callback function.

I added code to logread that prints a line when the request is sent via ubus. And another line is printed when the callback function read_cb gets called.

I added code to logd's read_log function's while loop that writes the collected log lines to a file "log.bin" while collecting the data and forming the blobmsg. And although the "logread -l 440" fails to print the 440 lines (no "read_cb start" visible - the callback function is never triggered), the log lines are correctly visible in log.bin, so the data is ok inside the logging system. Same for "logread" producing 581 lines that never reach the main program.

I make from this the conclusion that the problem is somewhere in the "return path". Somehow the data never gets back via ubus to the callback function in logread.

Below is some debug output trying to proof this. Explanation:

First two successful tries with 400 and 420 lines requested. Line "Logread read_cb start" indicates that the callback function gets called. And the logread output gets passed to tail showing the last 5 lines. The 10 last lines of log.bin contain the same messages, quite as expected. wc finds 401 & 421 lines in log.bin, matching the expectation.

Then two attempts with 440 and full log (of 581 lines) fail. Callback function never gets called, so no output goes to tail. But log.bin still contains the quite correct last lines and also the line count matches the expectation. So, the log data got collected ok.

SUCCESS
root@OpenWrt:/tmp/packages# logger "test 1" ; logread -l 400 | tail -n 5; tail -n 10 log.bin ; wc log.bin
ubus_invoke bhead:0x512168 readcb:0x40174c
Logread read_cb start
Mon Jan 20 20:19:21 2014 daemon.emerg sysinit: read_log log_list result first l-id:0
Mon Jan 20 20:19:21 2014 daemon.emerg sysinit: read_log blobmsg b:0x412618 lines:0x1234567
Mon Jan 20 20:19:21 2014 daemon.emerg sysinit: read_log ubussend ctx:0x9293e8 req:0x7fdf6470 bhead:0x9524c0 buflen:83712
Mon Jan 20 20:19:21 2014 daemon.emerg sysinit: read_log done
Mon Jan 20 20:21:54 2014 user.notice root: test 1
t:1390241946 m:sysinit: read_log log_list result first l-id:112
t:1390241946 m:sysinit: read_log blobmsg b:0x412618 lines:0x1234567
t:1390241946 m:sysinit: read_log ubussend ctx:0x9293e8 req:0x7fdf6470 bhead:0x9524c0 buflen:72960
t:1390241946 m:sysinit: read_log done
t:1390241961 m:sysinit: read_log log_list call count:0 lines:0x1234567
t:1390241961 m:sysinit: read_log log_list result first l-id:0
t:1390241961 m:sysinit: read_log blobmsg b:0x412618 lines:0x1234567
t:1390241961 m:sysinit: read_log ubussend ctx:0x9293e8 req:0x7fdf6470 bhead:0x9524c0 buflen:83712
t:1390241961 m:sysinit: read_log done
t:1390242114 m:root: test 1
      401      2556     27033 log.bin

SUCCESS
root@OpenWrt:/tmp/packages# logger "test 2" ; logread -l 420 | tail -n 5; tail -n 10 log.bin ; wc log.bin
ubus_invoke bhead:0x93f168 readcb:0x40174c
Logread read_cb start
Mon Jan 20 20:21:54 2014 daemon.emerg sysinit: read_log log_list result first l-id:163
Mon Jan 20 20:21:54 2014 daemon.emerg sysinit: read_log blobmsg b:0x412618 lines:0x1234567
Mon Jan 20 20:21:54 2014 daemon.emerg sysinit: read_log ubussend ctx:0x9293e8 req:0x7fdf6470 bhead:0x9524c0 buflen:83712
Mon Jan 20 20:21:54 2014 daemon.emerg sysinit: read_log done
Mon Jan 20 20:22:05 2014 user.notice root: test 2
t:1390241961 m:sysinit: read_log blobmsg b:0x412618 lines:0x1234567
t:1390241961 m:sysinit: read_log ubussend ctx:0x9293e8 req:0x7fdf6470 bhead:0x9524c0 buflen:83712
t:1390241961 m:sysinit: read_log done
t:1390242114 m:root: test 1
t:1390242114 m:sysinit: read_log log_list call count:400 lines:0x1234567
t:1390242114 m:sysinit: read_log log_list result first l-id:163
t:1390242114 m:sysinit: read_log blobmsg b:0x412618 lines:0x1234567
t:1390242114 m:sysinit: read_log ubussend ctx:0x9293e8 req:0x7fdf6470 bhead:0x9524c0 buflen:83712
t:1390242114 m:sysinit: read_log done
t:1390242125 m:root: test 2
      421      2680     28315 log.bin

FAIL
root@OpenWrt:/tmp/packages# logger "test 3" ; logread -l 440 | tail -n 5; tail -n 10 log.bin ; wc log.bin
ubus_invoke bhead:0x947168 readcb:0x40174c
t:1390242114 m:sysinit: read_log blobmsg b:0x412618 lines:0x1234567
t:1390242114 m:sysinit: read_log ubussend ctx:0x9293e8 req:0x7fdf6470 bhead:0x9524c0 buflen:83712
t:1390242114 m:sysinit: read_log done
t:1390242125 m:root: test 2
t:1390242125 m:sysinit: read_log log_list call count:420 lines:0x1234567
t:1390242125 m:sysinit: read_log log_list result first l-id:149
t:1390242125 m:sysinit: read_log blobmsg b:0x412618 lines:0x1234567
t:1390242125 m:sysinit: read_log ubussend ctx:0x9293e8 req:0x7fdf6470 bhead:0x9524c0 buflen:83712
t:1390242125 m:sysinit: read_log done
t:1390242136 m:root: test 3
      441      2808     29657 log.bin

FAIL
root@OpenWrt:/tmp/packages# logger "test 4" ; logread  | tail -n 5; tail -n 15 log.bin ; wc log.bin
ubus_invoke bhead:0x95d168 readcb:0x40174c
t:1390242114 m:sysinit: read_log ubussend ctx:0x9293e8 req:0x7fdf6470 bhead:0x9524c0 buflen:83712
t:1390242114 m:sysinit: read_log done
t:1390242125 m:root: test 2
t:1390242125 m:sysinit: read_log log_list call count:420 lines:0x1234567
t:1390242125 m:sysinit: read_log log_list result first l-id:149
t:1390242125 m:sysinit: read_log blobmsg b:0x412618 lines:0x1234567
t:1390242125 m:sysinit: read_log ubussend ctx:0x9293e8 req:0x7fdf6470 bhead:0x9524c0 buflen:83712
t:1390242125 m:sysinit: read_log done
t:1390242136 m:root: test 3
t:1390242136 m:sysinit: read_log log_list call count:440 lines:0x1234567
t:1390242136 m:sysinit: read_log log_list result first l-id:135
t:1390242136 m:sysinit: read_log blobmsg b:0x412618 lines:0x1234567
t:1390242136 m:sysinit: read_log ubussend ctx:0x9293e8 req:0x7fdf6470 bhead:0x9524c0 buflen:83712
t:1390242136 m:sysinit: read_log done
t:1390242151 m:root: test 4
      582      3708     39109 log.bin

Possibly I should create a new bug with a better title, as this is not anymore about procd, nor it is about the log_size option :-(

comment:12 Changed 4 years ago by hnyman

This bug should probably be closed as "fixed". The original problem, log size option handling, has now been enabled. And the still present logread silent crash is better discussed in #14838.

The logread crash is apparently due to a 64 kB ubus message maximum size. As logd adds quite a lot cleartext headers to each log line in the blobmsg returned via ubus to logread, the 64 kB limit can be easily exceeded. With 48 kB log size, the blob being returned can easily be over 90 kB.

As discussed in #14838, the max. log size should be restricted to 32 kB, unless the amount of log headers is strongly reduced in the logd-logread messaging.

comment:13 Changed 4 years ago by hnyman <hannu.nyman@…>

Well, looks like I was too hasty in my previous comment, as r39366 reverted the settings fix.
So, even the original bug is now still open :-(

comment:14 Changed 4 years ago by hnyman

r39898 re-enabled the log size setting, although the uci parameter name has changed to 'log_buffer_size'. (Log data is passed via a file, so the ubus message size limitation is bypassed.)

At the first glance it seems to work, but so far all log items have time stamp of 1970:

Sat Jan 17 05:24:18 1970 daemon.info dnsmasq[2119]: reading /tmp/resolv.conf.auto
Sat Jan 17 05:24:18 1970 daemon.info dnsmasq[2119]: using nameserver 62.241.198.246#53
Sat Jan 17 05:24:18 1970 daemon.info dnsmasq[2119]: using nameserver 62.241.198.245#53
Sat Jan 17 05:24:18 1970 daemon.info dnsmasq[2119]: using local addresses only for domain lan

A quick calculation makes me believe that the shown timestamp value (difference to 1.1.1970) is 1/1000 of the real one.

comment:15 follow-up: Changed 4 years ago by anonymous

Apart from the timestamps, things are not as bright as they seem yet. Luci and logread do not display the entire log. I have set log_buffer_size to 256, and both logread and luci display about 45 kB of data (incl timestamp chars), while there is more data in the log.

About timestamps: "logread -f" does show the correct timestamp when a new entry is logged:
Sun Mar 16 18:49:48 2014 user.notice root: testmessage

But since I already have more data in the log than logread can show, a display of the entire log does not display the new entry. So, the config option is working again, but the syslog itself still has some bugs...

comment:16 in reply to: ↑ 15 Changed 4 years ago by hnyman

Replying to anonymous:

Luci and logread do not display the entire log. I have set log_buffer_size to 256, and both logread and luci display about 45 kB of data (incl timestamp chars), while there is more data in the log.

I have now run into this issue with rev.40004 and have filed a new bug #15357 to separate this issue from the old bug.

comment:17 Changed 4 years ago by hnyman

This has been fixed by r40295.
Both bug #14227 and #15357 can be closed.

comment:18 Changed 4 years ago by blogic

  • Resolution set to fixed
  • Status changed from new to closed

Add Comment

Modify Ticket

Action
as closed .
The resolution will be deleted. Next status will be 'reopened'.
Author


E-mail address and user name can be saved in the Preferences.

 
Note: See TracTickets for help on using tickets.