The parent currently uses a fixed-size 100000 byte buffer to receive
messages. Even with a tile size of 256x256 pixels, that is not enough for some
tiles that compress badly as PNG. Add a nextmessage: message that gives the
size of the immediately following message. Send a such before each tile:
message, and handle it appropriately. The nextmessage: message is used only
from child to parent process, it is never sent to the Websocket client.
Once Poco 1.6.1 is released, this will not be necessary.
It doesn't work to debug a program that has file capabilities set, it seems,
so to debug the loolwsd master process, one in practice needs to run sudo gdb
on it. But it is not necessarily a good idea to run all of the code as
root. When configured for debugging (--enable-debug), reset real and effective
uid to a non-root one, either one given with an --uid option (typically that
of the developer), or "nobody".
Otherwise, if we use the same port number and same HTTPServer, if enough
clients try to contact us and, we won't be able to accept child processes
having been spawned.
Also add some temporary debugging output here and there to debug lifecycle
management issues.
We don't want to use WNOHANG, not sure where I got that idea from. That leads
to busy looping. The loop is in a thread of its own, so it is fine to just
wait ("hang") for some child to die.
It was obviously very wrong to use both a unique_ptr to the
MasterProcessSession in WebSocketRequestHandler::handleRequest(), and then a
bare pointers to the peer object in the MasterProcessSession object. We got
crashes here and there related to the destructors.
Let's see if we can manage without mutexes.
There probably are more places where I should catch those and act
appropriately. At least in these places, where the websocket connection is
already closed, or being closed, anyway, the right thing to do is just to
ignore exceptions, which are generated from attempts to write to an already
closed Poco WebSocket, for instance.
But actually I wonder why I thought I would need signalfd at all; wouldn't it
be enough to just loop in the undertaker thread, calling waitpid(), as long as
there are child processes? I'll try after this commit.
(Besides, I now notice that when I client disconnects, we don't close the
websocket to the child process, so it never goes away. Will fix that.)
As the child processes are pre-spawned and just hang around waiting, there is
ample time to attach one in a debugger in a controlled debugging scenario
anyway.
Works now for the trivial 'connect' test program. Still need to add
pre-spawning of a new child process as soon as an existing one from the pool
has been taking into use. And need to test with the actual JS client.
Set the SLEEPFORDEBUGGER environment variable to the number of seconds a child
process should sleep before calling lok_init(), so that you have time to
attach the process in a debugger.
For now, each LOOL client has a separate child process (or none at all, if it
has accessed only information found in the cache). This will obviously have to
chnage to handle collaboration. Etc.
The parent process talks the same Websocket protocol with the child
processes. When there is a child process for a client, traffic from the client
is forwarded as such to the child process and vice versa.