Remove any intersecting cached tiles. It is the parent process that handles
the tile cache, so it must look for invalidatetiles: messages, too, before
passing them on to the client. To know for which part we should remove tiles,
add an "ephemeral" curpart: message that the child process sends to the parent
process before the invalidatetils: message.
Why did I wait so long to do this? This is obviously the right thing to do, I
hope, and has a very significant impact on the robustness of the server...
Just call the getStatus() function directly in the child process, which will
always cause a complete status: message to be sent to the client. No
documentsizechanged: messages now sent to the client at all.
When a child process sends a documentsizechanged: message to the parent, to be
forwarded to the client, parse it and update a cached status of the doucment,
if available, and send an updated status: message to the client instead.
Note that clients should not rely on getting only status: messages and never
documentsizechanged: messages, though; the cache might be cleaned at any time
even while the server is running. If there is no cached status of the document
to update and re-use, we have to forward the documentsizechanged: message as
such to the client.
They must be on the same file system where LO is installed, and the TDF
tarballs puts that in /opt, and on some Linuxes in default configuration /opt
is on a different file system from /home, so we should not really use
/home/lool.
The amount of system frameworks (both public and private) needed by
the LO libraries is staggering. It will not work to try to list them
here. If you are crazy enough to want to run this on OS X, use some
other tool than this script to set up the system template for the
chroot jail. Like mkjail from https://github.com/glvnst/shlibs.
No idea. Does not even work from the command line... But after
"fixing" this, we run into other weird problems anyway. (Maybe one
needs to use lower-level (Mach) APIs for esoteric stuff like chroot?)
So, getting loolwsd to work on OS X seems unexpectedly hard even
before considering what changes might be needed to LibreOfficeKit. Oh
well.
Not sure if this is the right thing to do, but should work for our purposes
for now. If and when somebody else wants to fine-tune this stuff, feel free.
Wonder if it would be good style to expand the spec file from a .in one by the
configury, so that one could use autoconf expansions for the version number in
it?
At least when not running as the owner of those files. Refactor the capability
dropping so that we can separately drop the CAP_FOWNER and CAP_SYS_CHROOT
capabilities. The child process never needs CAP_FOWNER and the parent process
never needs CAP_SYS_CHROOT.
It was a bit misleading to have "chroot" in the name of the script as it isn't
a chroot jail as such that the script is setting up, but a *template* that
will be copied (hardlinked) into each chroot jail.
The script should be installed, so that a loolwsd package can include it. We
don't want to loolwsd use to depend on having the sources available,
obviously.
The parent currently uses a fixed-size 100000 byte buffer to receive
messages. Even with a tile size of 256x256 pixels, that is not enough for some
tiles that compress badly as PNG. Add a nextmessage: message that gives the
size of the immediately following message. Send a such before each tile:
message, and handle it appropriately. The nextmessage: message is used only
from child to parent process, it is never sent to the Websocket client.
Once Poco 1.6.1 is released, this will not be necessary.
I have made a pseudo-release tarball (for internal testing purposed) with the
version 1.0.0.
Use the following policy for versioning: When doing a release tarball, bump
the version in configure.ac so that the third (micro) version number is
even. Don't commit that to git, but do run make dist. After you have the
tarball, bump the micro version number again to an odd number. Commit. This
way, anything from git will always have an odd micro version number.
Useful when building loolwsd on a system that doesn't have new enough
LibreOfficeKit headers available. (It makes sense to use them only if the
resulting loolwsd will then be run on a system with a new enough LibreOffice
installation anyway.)
Pass the --with-lokit-path=bundled/include configure option to use them.
It doesn't work to debug a program that has file capabilities set, it seems,
so to debug the loolwsd master process, one in practice needs to run sudo gdb
on it. But it is not necessarily a good idea to run all of the code as
root. When configured for debugging (--enable-debug), reset real and effective
uid to a non-root one, either one given with an --uid option (typically that
of the developer), or "nobody".
Catch IOException when trying to copy the document and send error message.
Otherwise, the exception will just propagate up to handleRequest() and the
connection will be silently closed. Closing the connection is fine as such, I
think, but we need to send an error message first.
Currently we require the current version, 1.6.0, and as soon as 1.6.1 is out,
I will start requiring that, because I want to use a feature I submitted to
it.
Surround most of configure.ac with AC_LANG_PUSH([C++]) while at it, as this is
all C++ code anyway.
Otherwise, if we use the same port number and same HTTPServer, if enough
clients try to contact us and, we won't be able to accept child processes
having been spawned.
Also add some temporary debugging output here and there to debug lifecycle
management issues.
Just a stopgap measure so far. Clearly will need some logic to get rid of too
many children if they are just hanging around with no clients incoming. Also,
we need some sanity check not to spawn an unlimited number of children.
Otherwise two or more linkOrCopy() functions running simultaneously will lead
to a mess.
Actually nftw() is not guaranteed to be thread-safe, I think, so we should
really use something that is, like Poco's SimpleRecursiveDirectoryIterator.
It probably is not a good idea to keep depleting the entropy source needlessly
by having a separate RNG in the preSpawn(). Use just one shared RNG and
protect access to it with a mutex.
Use setuid root otherwise. (But note that the portablity to other
Unixes is a work in progress, and for instance it is known that this
doesn't work on OS X yet.)
Allow the directory parameters to be relative paths; turn them into absolute
ones for later use in the script as we change directories back and forth.
Use cpio instead of cp --parent for each file separately. (The latter has the
problem that parent directories are created using the protection the
corresponding source directory has, and tht might not permit you to copy other
files later into the same directory.)
Also copy usr/share/liblangtag, for the (common) case when LibreOffice is
built to use a system liblangtag.
Don't bother with the special handling of /usr/share/fonts/ghostscript unless
it exists and is a symlink.
With these changes, it worked for CentOS 7, too.
The JS code always passes in 0 for now. The server parses the parameter and
calls LibreOfficeKitDocument::setPart() before calling paintTile().
Probably also the status, key, mouse and selection messages will need a part
number. The intent is after all that the protocol is as stateless as
possible. (So maybe we should also pass the document URL in each message?)
The new function takes a map from keywords to integer values, and accepts
parameters in the form of either name=keyword, or for backward compatibility,
name='keyword'. Use it to parse the type parameter of the key, mouse,
selecttext and selectgraphic messages. This restricts the accepted keywords to
those actually valid for each message.
Needed for /usr/share/fonts so that fontconfig trusts its cache, but no harm
doing it for all directories. (Except a slight slowdown, need to see it if
has any significant impact if we would do the utime() only for directories
under /usr/share/fonts.)
Thus we need to pass the FTW_DEPTH flag to nftw() and handle FTP_DP instead of
FTP_D. We also need to make sure the directory is created also in the case of
an empty directory, for which no FTW_F callback of files inside it has been
received.
It is safest to not exit the nftw() in the FTW_SLN case.
Makefile.am:18: warning: deprecated feature: target 'SETCAP' overrides 'SETCAP$(EXEEXT)'
Makefile.am:18: change your target to read 'SETCAP$(EXEEXT)'
/usr/share/automake-1.13/am/program.am: target 'SETCAP$(EXEEXT)' was defined here
Makefile.am:1: while processing program 'SETCAP'
We don't want to use WNOHANG, not sure where I got that idea from. That leads
to busy looping. The loop is in a thread of its own, so it is fine to just
wait ("hang") for some child to die.
Just use find ... -type f instead of a for loop.
One note: I wonder how text rendering by the LO code in the child process
works even if we haven't copied any actual fonts into the sys template (and
thus chroot jail)? Is FreeType (or whatever it) already present in the
process, and font files open, even before the chroot? Weird.
It was obviously very wrong to use both a unique_ptr to the
MasterProcessSession in WebSocketRequestHandler::handleRequest(), and then a
bare pointers to the peer object in the MasterProcessSession object. We got
crashes here and there related to the destructors.
Let's see if we can manage without mutexes.
There probably are more places where I should catch those and act
appropriately. At least in these places, where the websocket connection is
already closed, or being closed, anyway, the right thing to do is just to
ignore exceptions, which are generated from attempts to write to an already
closed Poco WebSocket, for instance.
Also, seems that calling LOKitDocument::destroy() (in the child process's
LOOLSession dtor) causes crashes, avoid that. The can be little need for any
cleanup as the process is about to exit anyway, and the user profile is a
temporary one that will be binned.
But actually I wonder why I thought I would need signalfd at all; wouldn't it
be enough to just loop in the undertaker thread, calling waitpid(), as long as
there are child processes? I'll try after this commit.
(Besides, I now notice that when I client disconnects, we don't close the
websocket to the child process, so it never goes away. Will fix that.)
As the child processes are pre-spawned and just hang around waiting, there is
ample time to attach one in a debugger in a controlled debugging scenario
anyway.
Otherwise it uses a timestamp with one-second granularity as seed, and thus
most of the child processes pre-spawned at start will use the same seed, which
causes breakage.
Works now for the trivial 'connect' test program. Still need to add
pre-spawning of a new child process as soon as an existing one from the pool
has been taking into use. And need to test with the actual JS client.