This can happen most notably when the kit crashes.
In this case the DocBroker is terminated so all
client connections are closed, allowing clients
to re-connect and re-open the doc.
Change-Id: I9854e11b002ca6e3c2a1a0bbb91ca087679d25bb
The rendering options must be last in the load
command. This is because it could contain
characters that interfere with the parsing.
Change-Id: I14930d580d59014532d07410251acbd1316b4f61
When not sending ping the ping time is not set
which results in the setting the poll timeout to
a negative value, forcing it to return immediately.
This happens when sending ping before upgrading
to WebSocket, which isn't common. One way to
reproduce it, however, is to connect to the
admin console with an unauthenticated socket.
Change-Id: I9f3db1a02b8f8e2781d23d843e848068ad434958
Normally the last session either stopped and terminated
the DocBroker upon saving the doc, or immediately upon
removal when nothing to save.
However, saving could fail, or the session could be
disconnected by the client, or saving could timeout, etc.
In all those failure scenarios the DocBroker should
not linger as a zombie (alive but without sessions).
Here we detect that we are left with no sessions
and terminate correctly.
Change-Id: I31862234e321f63e686f54fa69daacc1fa06ae75
The prisoner poll should wake every so often
to check and rebalance the new children.
However this didn't happen correctly and
WSD would starve of children every so often.
The frequency of checking and rebalancing of
children should be reviewed and optimized.
Also simplified the code to avoid rebalancing
DocBrokers and only do NewChildren.
Change-Id: Id3be34ed3a47c739b606ee7969088397d3807e7a
Termination should normally be initiated by the
DocumentBroker in question, so sending of termination
message on the sockets come from the correct thread.
When termination happens from elsewhere
(f.e. cleanupDocBrokers) we cannot send socket
messages, and have to resort to rude termination.
Change-Id: I94acb7b314f5dbdc45c57049fc1ac8527ba72fb9
Once a socket has changed ownership to a new
poll it will assert thread affinity with said
new poll. So we cannot do any IO on the old
poll's thread at that point and on.
Change-Id: I662f188dea7c377a18f3e546839ec43f2875dc7b
We can calculate the timeout ourselves easily and add it to the
polling loop simplifying life.
Also ensure we never send messages to a non-authenticated thread.
While message emplacement happens in the DocumentBroker poll, we
can be sure that the next iteration of the poll will call
hasQueuedWrites before polling.
ClientSession still isn't getting the notification,
so document is not uploaded to the client just yet.
Change-Id: Ifda8ed394f6df1ec1a5bc1975d296dea496c0aed
This reverts commit 388d7b1dbf.
It is vital to have clean control of thread start. By starting
a thread during init. of a member (or base-clase) we loose lots of
control, some examples:
Any members initialized after a member that auto-starts a
thread, is effectively un-defined, and cannot be accessed
reliably.
This is particularly problematic for sub-classes.
I've seen threads started before the base-class has
finished constructing in the original creating thread -
such that the vtable was not yet updated to the sub-class
causing the transient parent vtable used during construction
to be used in-error.
Now that DocumentBroker has SocketPoll thread,
it's isAlive() must be defined by the lifetime of
both the SocketPoll thread and the ChildProcess,
which it previously did.
Change-Id: I093f8774cf4374d01729a383f6c535de4143fec6
Reviewed-on: https://gerrit.libreoffice.org/35122
Reviewed-by: Ashod Nakashian <ashnakash@gmail.com>
Tested-by: Ashod Nakashian <ashnakash@gmail.com>
Since all TerminatingPoll instances need to fire
a thread, no reason to do it manually and risk
races. Now TerminatingPoll does it in the ctor.
Change-Id: I59850ad48b3789f3a23d01abb05a7f28e5717031
Reviewed-on: https://gerrit.libreoffice.org/35114
Reviewed-by: Ashod Nakashian <ashnakash@gmail.com>
Tested-by: Ashod Nakashian <ashnakash@gmail.com>
DocumentBrokerPoll is always owned by a
single DocumentBroker instance, so we
can hold a reference to it. This eliminates
the need to hold a shared_ptr to the owner
which, in turn, eliminates the need for
a create wrapper.
Change-Id: I954c9dddcc3b2cfdd5dfcc8248ab3d47a897f684
Reviewed-on: https://gerrit.libreoffice.org/35113
Reviewed-by: Ashod Nakashian <ashnakash@gmail.com>
Tested-by: Ashod Nakashian <ashnakash@gmail.com>
When we read data, we must also handle it,
otherwise the next poll might have no
more data (the request data was read
completely the first time) and we dead
lock waiting for data to process.
Change-Id: I26c69ecc1f0550e8371cf77a6f3928a7a877eff7
Reviewed-on: https://gerrit.libreoffice.org/35080
Reviewed-by: Ashod Nakashian <ashnakash@gmail.com>
Tested-by: Ashod Nakashian <ashnakash@gmail.com>
When the last client session is disconnected
docBroker must first issue a save and wait
until the kit processes the save and sends
back notfication. Since said notification
goes to the ChildSession (which is the last)
and said ChildSession is the one that signals
to docBroker to persist the doc to the Storage,
we need to keep all components alive and kicking
during this final saving.
As such, when the last session is to be removed
from docBroker, we instead issue an autosave and
continue everything as normal. When the save
notification even arrives and ChildSession signals
docBroker to persist the doc, we check if we were
destroying and in that even remove that last session
and stop the polling thread.
The docBroker instance itself will get cleaned up
in due time.
Change-Id: Ie84e784284e1ec12b0b201d6bf75170b31f66147
As there isn't support (yet) to send files
asynchronously, when the socket native buffer
is small, asynchronous writes naturally return
EWOULDBLOCK. As a temp solution, we send files
synchronously, so there is no need to poll.
This should be replaced witha file-server
polling/serving thread that is dedicated to
sending files only (which closes the connection
when done).
Change-Id: I062fea44bfe54ab8d147b745da97bd499bf00657
So startDestroy must not assume the session
was already added at that point, since addSession
is done when we poll for reading, but not when
the socket has already disconnected.
Change-Id: I7bb7222604269c1cc9f2f4b4dad3ea1054b3e0c9
Without this, it was impossible to connect to an existing session, because we
were trying to send messages to sessions that were not connected yet.
Change-Id: If9260a1f0ac8f5387f492541548724b0065df9d9
Don't block when creating new sessions, instead queue the requests, and handle
them in the DocumentBroker.
Change-Id: I200bbfc740f004c37178fa316d1eb91afdde5d4a
When we call readIncomingData() outside of the poll, we read the data, and
store them into the buffer. But then the next poll will not indicate that
there data actually some available (they've been already read from the FD),
and we will get stuck.
I susppect we should remove from the SSL case at some stage too, to be
symmetrict to the non-SSL case.
Change-Id: Ib8339400b41e797adb2eb8519f87093ebf6be9c5
Also dung out a chunk of older code.
FIXME: websocket / ClientSession needs to associate itself with the
DocumentBroker poll loop in place of the original loop.
Pull the notification pieces out of SigUtil.cpp - not signal safe,
and invoked only from LOOLWSD anyway.
In a non-blocking world, the socket close notification sends are
instant - so more work required to count-down / timeout remaining
clients.
With multipart streams the parsing isn't
done until all parts are in. We will need
to reparse the header when more data comes in
until we can parse all parts. Otherwise we
end up removing the header and losing it
when we can't find all parts in the body.
This fixes convert-to requests.
Change-Id: Ic1d5ccbd00fd6763eb91fdda35177f6df847f100
These are really GET requests that aren't
WebSocket upgrade. Should rename to something
less misleading.
Re-enabled testSlideShow which depended on this.
Change-Id: I52b7f67b650fcdcbae7c2bff020b756099263141
Apparently if we don't send immediately after the upgrade
the first frame doesn't get parsed as WebSocket (at least
in Poco).
Change-Id: Ieb30afae1423d8352d81c79af568947d10fca1e6
It's necessary to do the SSL handshake
and to get the request (if we get there)
on accepting so by the time we poll we know
what SSL needs to do next. No reason to
read on first poll when we should be
expecting a request upon connection anyway.
Change-Id: I8eecaad5f8450075d45e487702972418cad125bc
Because the socket can be freed while a separate
thread is sending data via the handler, we must
have a locked reference to the socket instance
in the handler.
Change-Id: Iefad3fc2b147f96b8d538d9edd7cac3fce25b5bf
Requests need to be introspected and
dispatched to the appropriate handler.
File serving, admin, POST request, etc.
are all valid types that we need to support.
But of course the primary one is the WS request
to load to and interact with a document.
Change-Id: Id2c3214deb6b54b06b2735ec3370f09ed7a1ae51
ConsoleCertificateHandler will block on stdin
when a certificate error occures. This is hardly
helpful to users. Although in a server environment
this is likely to go to the journal, we should
properly log it and shutdown, rather than hang
waiting for the unlikely console input.
Change-Id: I54bb38ec60f443c579ef20dbb759941fae182e5b
When two sockets send data to each other in blocking
mode, they can both wait until the other end-point's
buffers are free enough to receive the data being
sent. Since in LOOLWebSocket we lock both send and
receive with the same lock, this prevents the
reader thread from freeing the buffer while we try
to send data. But since our peer is in the same
dilemma, neither of us will make progress--deadlock.
Since sockets are full-duplex, they are capable of
handling two way communication concurrently. Poco
seems to not share data between them either, so
this seems safe.
Change-Id: I1fd68cd4fb3b4250b93c8f94cd42e49ee78f6650
Reviewed-on: https://gerrit.libreoffice.org/34194
Reviewed-by: Ashod Nakashian <ashnakash@gmail.com>
Tested-by: Ashod Nakashian <ashnakash@gmail.com>
When cleaning up DocumentBrokers we hold
the global DocBrokersMutex. So we need
to keep this lock as short as possible
to serve new requests.
However when a given DocBroker is locked
and busy, or worse deadlocked, we'd end
up blocking any new client connection.
So here we skip busy DocBrokers while
cleaning up.
Change-Id: I188c9abc34fd90c4ba388894b47c1ab08d185129
Reviewed-on: https://gerrit.libreoffice.org/34119
Reviewed-by: Ashod Nakashian <ashnakash@gmail.com>
Tested-by: Ashod Nakashian <ashnakash@gmail.com>
...and not just when loading a new document.
Change-Id: If5b19e500c59a8d1fcf96666ef244833c05e2b99
Reviewed-on: https://gerrit.libreoffice.org/34118
Reviewed-by: Ashod Nakashian <ashnakash@gmail.com>
Tested-by: Ashod Nakashian <ashnakash@gmail.com>
Poco documentation for getPath says
"Returns the decoded path part of the URI."
Unfortunately, this isn't true for encoded URIs.
It ends up returning the full URI (albeit decoded).
So we decode the URI ourselves before calling
getPath to ensure it will be able to parse it
and return only the path, as promised.
Change-Id: I23ed65f753f7e5db74ce7833b812f566b1964037
Reviewed-on: https://gerrit.libreoffice.org/34117
Reviewed-by: Ashod Nakashian <ashnakash@gmail.com>
Tested-by: Ashod Nakashian <ashnakash@gmail.com>
To perform one fuzzing iteration, use something like:
./loolwsd_fuzzer --config-file=loolwsd.xml --o:lo_template_path="/opt/libreoffice/instdir" --o:storage.filesystem[@allow]=true --fuzz=/tmp/looltrace
Change-Id: I27210d55a65f75e7d51e05c2f5f999acb758c4b1
By default snapshots are disabled, since trace recording
is enabled, to avoid unexpectedly flooding the disk.
Change-Id: I6c8728e14801f0a72accde1378455ec0e6046e3e
We no longer tell the clinet "This is embarrassing..."
when we disconnect and unload an idle document. Instead,
the client UI remains greyed out so the user can resume
as if it was inactive (and reload the document in this case).
Also, we now always send the "close: " message prior
to shutting down a client websocket. This is more
reasonable and consistent when we intentionally disconnect,
so clients can rely on it to signal intent and give reason.
Otherwise, a disconnection without this application-level
message should be unexpected and is therefore reasonable
to show the "This is embarrassing..." message.
Change-Id: Ic7439bcc9267be155586ccd5d122e9fe60225516
A race condition between the client socket thread
and the idle-document cleanup caused segfault
on the websocket.
Now the ChildProcess object doesn't reset
the websocket on closing, rather on destruction.
Change-Id: I10d0dfb1ba677a65479df85b7a53de8c5f1b44c3
Each Kit process now reports its own PSS,
which is much more accurate as they share
a significant ratio of their pages with
one another.
Admin tracks the PSS values of the Kits
and reports to the console.
Change-Id: Ifa66d17749c224f0dc211db80c44f7c913f2d6c4
Reviewed-on: https://gerrit.libreoffice.org/33864
Reviewed-by: Ashod Nakashian <ashnakash@gmail.com>
Tested-by: Ashod Nakashian <ashnakash@gmail.com>
- add loolwsd option --config-file=path
- search all data files in the actual data directory
instead of the default one.
(cherry-picked from commit c4e9681fd17381c7af2936726262d7357a7dda10)
Change-Id: I028ff8a696aa6336da55bcac2952f13b12ba8eb8
Reviewed-on: https://gerrit.libreoffice.org/33504
Reviewed-by: Jan Holesovsky <kendy@collabora.com>
Tested-by: Jan Holesovsky <kendy@collabora.com>
The connection limit MAX_CONNECTIONS
is for document WS connections, which
doesn't take into account the other
resources (.js, .css, etc.) that clients
request. These transient files are as
many as 10 per client. While they
are being requested, other clients
should not be blocked due to reaching
the limit.
We now take into account these connections
and allow at least half as many clients
as the limit to connect simultaneously
without blockage.
Change-Id: I42fd27b992457bcf765a7a85382e6d7caad4bc97
Reviewed-on: https://gerrit.libreoffice.org/33669
Reviewed-by: Ashod Nakashian <ashnakash@gmail.com>
Tested-by: Ashod Nakashian <ashnakash@gmail.com>
No need to expose PrisonerSession via ClientSession
when the marshalling of messages can be done by
ClientSession directly.
PrisonerSession can now be removed altogether.
Change-Id: I131e41f5d9ae50be7244fb92a6f391a757502111
Reviewed-on: https://gerrit.libreoffice.org/33431
Reviewed-by: Ashod Nakashian <ashnakash@gmail.com>
Tested-by: Ashod Nakashian <ashnakash@gmail.com>
Throw when empty payload is enqueued
and return empty payload on get timeout
(instead of throwing).
Change-Id: Iab5df775caa46d5c212d0850645cda6cca16f20b
Reviewed-on: https://gerrit.libreoffice.org/33421
Reviewed-by: Ashod Nakashian <ashnakash@gmail.com>
Tested-by: Ashod Nakashian <ashnakash@gmail.com>
Which is dangerous in general, but here the values are from
non-overlapping ranges. Make it a bit more explicit that this is not an
accident.
Change-Id: I56897473a755e28cd9e7f74ceacecbab2db5e829
Refactored the forkit process wait and
re-fork into separate function.
Change-Id: If6106ea3820d10b4448bb27740d757afcea4779f
Reviewed-on: https://gerrit.libreoffice.org/33137
Reviewed-by: Ashod Nakashian <ashnakash@gmail.com>
Tested-by: Ashod Nakashian <ashnakash@gmail.com>
Forkit always spawns a single child.
This is done to validate that forking
children is working and to be ready to
serve clients.
However, this initial forking can be slow,
mostly due to cold file reading and loading.
This often causes multiple child spawning
and other undesirable effects.
This patch makes sure that a single child
is always at the ready before proceeding
while simplifying the code. Otherwise, if
forkit fails to fork a single child within
4x an expected command-timeout (currently 5
seconds) then we fail the service altogether.
Change-Id: Ie2a441a2479db98a54f7fb9b9c8e98cda2f07c1c
Reviewed-on: https://gerrit.libreoffice.org/33128
Reviewed-by: Ashod Nakashian <ashnakash@gmail.com>
Tested-by: Ashod Nakashian <ashnakash@gmail.com>
When a client disconnects, we autosave only when there
are no other sessions. The idea being that it'd be
wasteful to save on every client disconnect, so long
that we have a later chance of saving.
This doesn't hold, however, when the only other
remaining session is one that had just connected
and hasn't loaded a view yet. WSD wouldn't
know about this, but when unloading the
disconnecting session, Kit will exit the process
as the last view is gone, losing all unsaved changes.
This patch tracks the sessions in WSD that have
loaded a view and takes that into account when
deciding to autosave on disconnect or not, thereby
fixing this corner case that may result in data loss.
Change-Id: I699721f2d710128ed4db65943b9357dff54380f5
Reviewed-on: https://gerrit.libreoffice.org/33126
Reviewed-by: Ashod Nakashian <ashnakash@gmail.com>
Tested-by: Ashod Nakashian <ashnakash@gmail.com>
In some cases when the last session is destroying
and we expect the DocumentBroker to be removed,
while waiting for autosave, a new client might
connect to the very same document.
In those cases we shouldn't fail but should retry
loading the document again once it has been unloaded.
Change-Id: I97ebb73991b9739cfb4e93b7de9115bdb48fda7a
Reviewed-on: https://gerrit.libreoffice.org/33125
Reviewed-by: Ashod Nakashian <ashnakash@gmail.com>
Tested-by: Ashod Nakashian <ashnakash@gmail.com>
Before setting up the socket where we listen for client requests, try
connecting to the same port. If that succeeds, another loolwsd process
is already running and listening on that port. That is obviously
undesirable.
Yes, there is a race condition if multiple loolwsd processes are
started simultaneously and do this check before any of them have
actually created the socket. Live with it. Multiple loolwsd processes
is a problem that happens accidentally for developers only anyway. In
a production environment systemd takes care of having just one, I
hope.
Thanks to Kendy for the idea.
Change-Id: Ifdde83472f9a56e592ec5dc7649dd7706efc2f7c
The server tells the client the hash of each tile it sends (calculated
from the contents of the tile, not its PNG encoding). When the client
asks for a tile to be refreshed, it tells the server what the hash of
the existing tile is. If the server notices that the tile contents
hasn't actually changed, it doesn't PNG encode it and doesn't send it
to the client.
The intent is that this will reduce load on the server and also avoid
unnecessary tile traffic.
Change-Id: Ia06ca68655ea984ed4319f24f4470afda322eccf
Prespawning proactively is optimistic, but is not critical.
When getting a space child, more will be spawn perforce,
so prespawnChildren shouldn't block.
Change-Id: I60cc8c1ab87cba384ebc9aca9e79b89f69a99252
Reviewed-on: https://gerrit.libreoffice.org/32858
Reviewed-by: Ashod Nakashian <ashnakash@gmail.com>
Tested-by: Ashod Nakashian <ashnakash@gmail.com>
Previously tilecombine had its own version, which is
nonesensical, since it's not really a tile.
Now it passes the version to the tiles when
parsing and serializes version per-tile.
Change-Id: I5db8d94880431e3d2a40b6787c6fe51a05771305
Reviewed-on: https://gerrit.libreoffice.org/32633
Reviewed-by: Ashod Nakashian <ashnakash@gmail.com>
Tested-by: Ashod Nakashian <ashnakash@gmail.com>
There is no way to let the user of document currently being
opened, in case of failure, know that disk is low on space.
We check the disk space when forking children after which we try
to alert all users but this would end up doing nothing for
current document because document broker is not registered at
this time (we iterate through doc brokers when alerting). Another
conditional disk check is performed just before opening the
document but this is performed only if last disk check was
performed greater than 60 seconds which would never be the case
because document open is always preceded by a child fork (when
rebalancing children).
Lets not cache the disk check when forking the children to
prevent above mentioned situation while still minimizing the
number of disk checks performed.
Change-Id: Id3add998f94e23f9f8c144f09e5efe9f0b63821c
Since we always need to set the thread-pool size
anyway, we cannot have 'unlimited' connections.
Actually, we never did, so that was misleading
in configure.ac anyway.
The current defaults are 20 connections and
10 documents, instead of the previous 1024
connections.
The reason for this "low" limit is to
enable unittesting these limits automatically
for the default configure.
There is also a lower-limit (needed by unittests
and internal technical requirements) of 3 connections
and 2 documents.
Change-Id: I6ccf3a607c50bb2a86bf1c0a16ebb6326ee34c7d
Reviewed-on: https://gerrit.libreoffice.org/32712
Reviewed-by: Ashod Nakashian <ashnakash@gmail.com>
Tested-by: Ashod Nakashian <ashnakash@gmail.com>
In the rare event that it fails, we will
cleanup the DocBrokers correctly.
Change-Id: I6f5346c9ec5021bdadc8a01ed389951ca18d07a3
Reviewed-on: https://gerrit.libreoffice.org/32676
Reviewed-by: Ashod Nakashian <ashnakash@gmail.com>
Tested-by: Ashod Nakashian <ashnakash@gmail.com>
When saving we need to differentiate between no-op
and failure. The lastSaveTime must always be updated
when saving didn't fail (i.e. no modification or saved).
Change-Id: I0e2455afac22c82f0b623f9441fbc0bca8a7cb83
Reviewed-on: https://gerrit.libreoffice.org/32629
Reviewed-by: Ashod Nakashian <ashnakash@gmail.com>
Tested-by: Ashod Nakashian <ashnakash@gmail.com>
While time_t is much simpler, it's too
opaque. The new chrono library is type-safe
and does conversion correctly, as well as
guarantees monotonity and other desirable
properties.
Change-Id: Id41c44c397a31d73e894e8f1715ff18f2b67df53
Reviewed-on: https://gerrit.libreoffice.org/32627
Reviewed-by: Ashod Nakashian <ashnakash@gmail.com>
Tested-by: Ashod Nakashian <ashnakash@gmail.com>
Since we already enumerate the DocumentBrokers and
remove dead ones before every autosave (every 30 seconds)
as well as when loading/unloading documents, there
is little reason to have a separate timeout and loop
for idle documents.
In addition, the previous code only killed the child
but didn't remove the dead DocumentBroker entry from
DocBrokers. Not clear if this was intentional, but it
would have been removed on the next cleanup anyway.
Change-Id: Ie1e09c77133165dd1255930b0b0be06fde7646b1
Reviewed-on: https://gerrit.libreoffice.org/32626
Reviewed-by: Ashod Nakashian <ashnakash@gmail.com>
Tested-by: Ashod Nakashian <ashnakash@gmail.com>
Between waits on forkit we shouldn't spend too much time.
This is to recover from forkit crashes.
Now we first spawn children, and only when not successful
(i.e. no more spare children needed) we check for autosave
and DocBrokers cleanup. Only when none of the above is done
do we sleep.
This gives the best balance between forkit waits and reduces
the unittests by a good 25 seconds (crash tests down from 34s
to about 10s only).
Change-Id: If69284746859bc78d14f1c6eda07aef4e006709e
Reviewed-on: https://gerrit.libreoffice.org/32624
Reviewed-by: Ashod Nakashian <ashnakash@gmail.com>
Tested-by: Ashod Nakashian <ashnakash@gmail.com>
Otherwise we throttle spawning to allow
time to fork the process. If we don't wait
we can bomb a loaded server and bring it down.
However during startup this is not necessary
as there are no in-flight spawn requests.
Change-Id: I1beac94571f6d8d96136d32c81310bea6547242b
Reviewed-on: https://gerrit.libreoffice.org/32622
Reviewed-by: Ashod Nakashian <ashnakash@gmail.com>
Tested-by: Ashod Nakashian <ashnakash@gmail.com>