Now we have 2 caching directories: 'persistent', and 'editing'.
The Persistent cache always represents the document as it is saved, the
Editing one caches the current edits. The Editing state is copied to
Persistent on save. The Editing state is destroyed on reload.
Implementing this Was harder than I first expected. The basic idea is as
follows: The master process puts each message arriving from a client that
isn't "canceltiles" into a (client-specific) queue. A separate thread that
pulls messages from the queue at its own pace and handles them as
before. Incoming "canceltiles" messages are handled specially, though: The
queue is emptied of "tile" messages.
The above sounds simple but there are several details that were a bit tricky
to get right.
Remove any intersecting cached tiles. It is the parent process that handles
the tile cache, so it must look for invalidatetiles: messages, too, before
passing them on to the client. To know for which part we should remove tiles,
add an "ephemeral" curpart: message that the child process sends to the parent
process before the invalidatetils: message.
Why did I wait so long to do this? This is obviously the right thing to do, I
hope, and has a very significant impact on the robustness of the server...
Just call the getStatus() function directly in the child process, which will
always cause a complete status: message to be sent to the client. No
documentsizechanged: messages now sent to the client at all.
When a child process sends a documentsizechanged: message to the parent, to be
forwarded to the client, parse it and update a cached status of the doucment,
if available, and send an updated status: message to the client instead.
Note that clients should not rely on getting only status: messages and never
documentsizechanged: messages, though; the cache might be cleaned at any time
even while the server is running. If there is no cached status of the document
to update and re-use, we have to forward the documentsizechanged: message as
such to the client.
No idea. Does not even work from the command line... But after
"fixing" this, we run into other weird problems anyway. (Maybe one
needs to use lower-level (Mach) APIs for esoteric stuff like chroot?)
So, getting loolwsd to work on OS X seems unexpectedly hard even
before considering what changes might be needed to LibreOfficeKit. Oh
well.
The parent currently uses a fixed-size 100000 byte buffer to receive
messages. Even with a tile size of 256x256 pixels, that is not enough for some
tiles that compress badly as PNG. Add a nextmessage: message that gives the
size of the immediately following message. Send a such before each tile:
message, and handle it appropriately. The nextmessage: message is used only
from child to parent process, it is never sent to the Websocket client.
Once Poco 1.6.1 is released, this will not be necessary.
Catch IOException when trying to copy the document and send error message.
Otherwise, the exception will just propagate up to handleRequest() and the
connection will be silently closed. Closing the connection is fine as such, I
think, but we need to send an error message first.
Otherwise, if we use the same port number and same HTTPServer, if enough
clients try to contact us and, we won't be able to accept child processes
having been spawned.
Also add some temporary debugging output here and there to debug lifecycle
management issues.
Just a stopgap measure so far. Clearly will need some logic to get rid of too
many children if they are just hanging around with no clients incoming. Also,
we need some sanity check not to spawn an unlimited number of children.
Otherwise two or more linkOrCopy() functions running simultaneously will lead
to a mess.
Actually nftw() is not guaranteed to be thread-safe, I think, so we should
really use something that is, like Poco's SimpleRecursiveDirectoryIterator.
It probably is not a good idea to keep depleting the entropy source needlessly
by having a separate RNG in the preSpawn(). Use just one shared RNG and
protect access to it with a mutex.
The JS code always passes in 0 for now. The server parses the parameter and
calls LibreOfficeKitDocument::setPart() before calling paintTile().
Probably also the status, key, mouse and selection messages will need a part
number. The intent is after all that the protocol is as stateless as
possible. (So maybe we should also pass the document URL in each message?)
The new function takes a map from keywords to integer values, and accepts
parameters in the form of either name=keyword, or for backward compatibility,
name='keyword'. Use it to parse the type parameter of the key, mouse,
selecttext and selectgraphic messages. This restricts the accepted keywords to
those actually valid for each message.
Needed for /usr/share/fonts so that fontconfig trusts its cache, but no harm
doing it for all directories. (Except a slight slowdown, need to see it if
has any significant impact if we would do the utime() only for directories
under /usr/share/fonts.)
Thus we need to pass the FTW_DEPTH flag to nftw() and handle FTP_DP instead of
FTP_D. We also need to make sure the directory is created also in the case of
an empty directory, for which no FTW_F callback of files inside it has been
received.
It is safest to not exit the nftw() in the FTW_SLN case.
It was obviously very wrong to use both a unique_ptr to the
MasterProcessSession in WebSocketRequestHandler::handleRequest(), and then a
bare pointers to the peer object in the MasterProcessSession object. We got
crashes here and there related to the destructors.
Let's see if we can manage without mutexes.
There probably are more places where I should catch those and act
appropriately. At least in these places, where the websocket connection is
already closed, or being closed, anyway, the right thing to do is just to
ignore exceptions, which are generated from attempts to write to an already
closed Poco WebSocket, for instance.
Also, seems that calling LOKitDocument::destroy() (in the child process's
LOOLSession dtor) causes crashes, avoid that. The can be little need for any
cleanup as the process is about to exit anyway, and the user profile is a
temporary one that will be binned.
Otherwise it uses a timestamp with one-second granularity as seed, and thus
most of the child processes pre-spawned at start will use the same seed, which
causes breakage.
Works now for the trivial 'connect' test program. Still need to add
pre-spawning of a new child process as soon as an existing one from the pool
has been taking into use. And need to test with the actual JS client.
Add a new program, loadtest, that runs a requested number of client sessions
in parallel to a loolwsd server. A client session loads one of a list of test
documents, and does some operations on it.
Move the getTokenInteger() and getTokenString() functions out from LOOLSession
into a new namespace LOOLProtocol, as they are neeeded also in the loadtest
program.
Add, also in LOOLProtocol, functions to parse some of the messages from the
server. (In general that is done in client JavaScript code, of course; only
for testing purposes needed in C++ code.)
Handle the start of a child process when needed centrally, not in each command
handler.
Also, handle unrecognized commands always already in the parent
process. (Command syntax checks still done in child process, though.)
For now, each LOOL client has a separate child process (or none at all, if it
has accessed only information found in the cache). This will obviously have to
chnage to handle collaboration. Etc.
The parent process talks the same Websocket protocol with the child
processes. When there is a child process for a client, traffic from the client
is forwarded as such to the child process and vice versa.
Having a 'close' would mean being able to do a new 'open', too, which
introduces unneeded complexity, at least at this stage.
Just start a fresh WebSocket connection for each document.