This thread is for discussion of development topics between me and the other developers.
We'll see how it works out.
This thread is for discussion of development topics between me and the other developers.
We'll see how it works out.
Comments on: V52 vs. V48 + my patches
1) OHCI vs. UHCI modules
No need to grey out the checkboxes.
There's no harm in loading both. If the corresponding hardware is not there, that driver will refuse to load. All that happens is that you'll get a message in syslog. Why not let this automatically happen? That way, you don't need special code to enable/disable them specific to each model.
2) I see that you moved start_jffs2() up. I think it should be after start_syslog(), so that you'll see any messages. Moving it back down where I had it shouldn't cause any problems.
3) Good job on the console_main. I didn't think of that!
New dnsmasq options???
1) Why all-servers? This forces it to query all the configured servers.
BTW, strict-order does not not over-ride all-servers.
Unless you did this to fix some sort of problem, I think the defaults are better (neither of these options set.) If "all-servers" is hardcoded to be on, there is no way to unset it in the GUI custom config.
If the user really wants it to do that, he can set it in his custom config.
BTW, I plan to have a dnsmasq config section in my upcoming release of the Advanced Ops Manual.
The default operation is:
"dnsmasq tests the servers on the first query: whichever replys first gets used. If the server in use stops responding or responds very slowly, the test will be done again, and a new server will be selected.
"strict-order" will stop the testing - the first server will always be used, and if it does not respond, then the query will go to the second one, etc."
"There's a trade-off with complexity, load-balancing and robustness. The existing algorithm tries hard to be simple and not to send a query to just one server unless it's known to be up. It therefore copes well with a list of servers, some of which are dead, without accidentally losing queries or causing long time-outs."
By default, when dnsmasq has more than one upstream server available, it will send queries to just one server. Setting all-servers forces dnsmasq to send all queries to all available servers.
"Q: How does it pick which server to use when all of them are up?
A: Every so often, it sends a query to all of them, and looks to see which one answers first."
"Every so often" is 50 queries or 20 seconds.
2) Dnsmasq's default cache size is 150. This may or may not be too small, but 4096 is probably too large as a default. I'd suggest maybe 256 or 512. Are you seeing a problem with 150?
IMHO, all-servers and strict-order are "expert" options. And cache-size is "semi-expert". Normal users shouldn't be setting them, because you really need to know what you are doing and why the defaults aren't adequate for you.
They can all be set in the Dnsmasq Custom configuration, along with extra servers, etc.
My recent fix for the dnsmasq restart wasn't quite right. I had time to thing about it, and fix it right.
I'm putting together a patchset to V52 to fix all the above.
Here is my cleanup patchset. Only fixes here, no new features.
TB, please let me know when you see this message, so I know you indeed saw it.
OHCI vs. UHCI modules. … There's no harm in loading both.
Actually, there is. Do you think I would go through all this hassle with disabling the non-existing controller just for the sake of it ;)?
It may not cause any issues on 520GU running K24, but it doesn't mean it works the same way across the board. With K26 builds the wrong driver loaded makes WL500GPv1 and WL500W to stall or reboot. On other models both modules - if loaded - stay in memory (unlike K24, where the wrong driver is getting unloaded immediately) - not a very big deal but still some wasted RAM…
you moved start_jffs2() up
If you just move it down the way you did, it causes a remount attempts every time when - for example - network settings are saved in the GUI. Remount errors out resulting in the error message on the JFFS page in the GUI. Even though I later fixed the remounting problem by an additional check inside of start_jffs2(), I didn't see any reason for moving it (until you explained it now), so I moved it back. In any case, I don't think the log message during startup is really important - doesn't justify moving the start_jffs call for me - you can always check JFFS mounting status by several other methods.
Why all-servers? This forces it to query all the configured servers. BTW, strict-order does not not over-ride all-servers.
Well, first of all the dnsmasq config changes in the last build are experimental, and obviously we can adjust or rollback them if needed. The goal was to try and make the internal dns server to respond faster with the default settings, without a need for user to know the dnsmasq parameters.
That said, what's the harm in querying all servers? It does speed up the dns lookups in many cases, which is a good thing and which most people would like to get from it. Also, I tested strict-order being below all-servers in the config file, and it does change the behavior - in that case I didn't see the dnsmasq to query all servers, it works as if "all-servers" option was not there. So in the current state we have potentially faster lookups by default (for people who have no idea about custom options) with the ability to change it to strictly ordered lookups if you care and know about this option. I don't see the real need to support the "default" dnsmasq behavior with querying the servers one-by-one in the order determined by previous responses - but I agree that the GUI option for it might be useful, having it enabled by default. If someone convinces me that querying all servers is bad, I'll change my opinion :).
As for the cache size, I've experimented with it a long while ago, and ended up using 4096 ever since, based on the various recommendations on WRT forums as well as my own tests. But I don't have a strict preference on what should be the default value here - if you believe that 4096 is too much for 16MB RAM routers, we can decrease it.
My recent fix for the dnsmasq restart wasn't quite right. … I'm putting together a patchset to V52 to fix all the above.
Heh, once you put your mind on something, you can't move on ;)… I'm waiting to see what is wrong with it now…
Also, may I ask you to test your changes a bit more extensively - think about what else might be affected? You're the only one now who really helps with Tomato USB development on the regular basis, and it would be great if I can just check in your changes, and keep doing other stuff. I'm not saying that anyone can produce 100% bug-free code :), it just seems that some extra testing would help in many cases. For example, the last time it was the "buttons" process dying after receiving an unexpected SIGALRM ;).
EDIT: I have not seen your patch yet when posting this. Got it now - thanks! Looks good to me for the most part. Not decided yet about the "all-servers" option and moving start_jffs2() call.
Please, please no all-servers. It needlessly pounds DNS servers, creates lots of sub-MTU traffic/packet interrupts, and is essentially the DNS version of HTTP prefetch. If someone wants to enable it, and *understands the impact*, fine - but dnsmasq does a _fine_ job of self-tuning based on the lowest latency response from all available servers (so long as strict-order is NOT used), which produces *the same net effect as all-servers without the additional workload*.
My Tomato utilities site: http://multics.minidns.net/blog/articles/tomato_utilities
Ok, 2 votes to get rid of it :).
Well, by default the dnsmasq pounds all DNS servers every 50 queries or 20 seconds anyway. How much bigger is the impact of all-servers for an average internet user, considering relatively large cache (512 to 4096) after a few days of uptime?
By the way, do you happen to know what is a default behavior of Windows client in that regard?
+1 vote from me. While I do agree with Teddy_B concerning the all-servers, it should not be on by default. It is stressing DNS servers and probably on slower links the performance will decrease instead of increase. I do notice however a performance increase with all-servers on my own fast cable connection, so I'm in for a GUI option.
start_jffs: Okay, I understand now. But is that the only potential problem on a restart? The load_files_from_nvram and init scripts will be done, too. I don't have a strong opinion one way or the other, I just thought it best to get syslog running as early as possible so error messages will be seen. For example, if there are errors in the jffs, the driver will log them. Look at all the printk statements in the jffs code. If syslog isn't up, will these messages appear?
There's interactions with the autorun scripts, too. Disk autoruns, cifs autoruns, and jffs autoruns should all happen the same way—-they should all get done on START and RESTART.
Ahhh—-I see it. There needs to be a stop_jffs in the SIGHUP case, and start_jffs in the SIGUSR2 case. 'course, if the jffs didn't unmount because it's busy then we may need special handling just like the usbhost case.
I'll look at this and do some testing. Clearly there's a few more side-effects than I originally thought!
dnsmasq restart wasn't quite right. … I'm putting together a patchset to V52 to fix all the above.
Heh, once you put your mind on something, you can't move on.
We used to have a motto hanging in the computer lab:
Our code, it is beautiful.
If it works, so much the better.
Some people call it attention to elegant design. Sme people call it anal-retentiveness. ;-)
the "buttons" process dying after receiving an unexpected SIGALRM …
Ya got me! How was buttons being sent a SIGALRM? AFAIK, I didn't even touch buttons.c.
Ahhhhhh, okay, now I see. Buttons is calling a function that belongs to init. Bad bad programming. At work, we used to take people who wrote code like this out back and gently explain to them (with a baseball bat) that this is a Bad Thing to do. ;-)
Anyway, I'll fix it. Properly.
what's the harm in querying all servers? It does speed up the dns lookups in many cases
It's called being a bad netizen. It puts more load on the DNS servers, and more load on your own system. It's as impolite as configuring ntpd with "burst", or setting the ntpd maxpoll interval down to 8 or 16 seconds.
And it's not really needed. Dnsmasq does a damn fine job with its default operation. Only the first (i.e., fastest) reply is used anyway, the rest are redundant and get discarded. So they are just wasted overhead. It periodically rechecks to see which is the fastest server. Once the fastest server is picked, it is highly likely that it will *always* be the fastest one—the response times of the various servers are not going to be varying much. So sending queries to servers that we pretty much know are slow servers is just a pure waste.
In adjusting tuning parameters, you always have to be wary of over-optimizing. "Best" doesn't always mean fastest. It's a balance of fastest and expense. Yes, a large cache will be faster, but it consumes more RAM——which is a resource that a 16MB router doesn't have an overabundance of. I did a lot of investigation of caching issues at my job, and one thing that stood out is that you very rapidly get into diminishing returns. The first little bit of cache helps a lot. The next additional bit helps some more, but much less of an improvement. I suspect that the default cachesize of 150 gets you 90% of the maximum possible improvement, and 512 only picks up an additional 4%-5%. Going to 4096 probably only picks up another 1%-2%. My vote would be for Tomato to either leave it off and go with dnsmasq's default, or maybe a hardcoded 512. And pull out the GUI config item for cachesize.
If a user wants to configure any of these things, we have the perfect place to do it—-in the Custom Config section of the GUI. All we need is a section in the Tomato Advanced Ops Manual to explain it. Which I plan to put in, as I'm currently working on a new revision.
FWIW, I have 6 servers in my config. I'll look at the stats after a couple of days of running and see what's happening.