We should always set the shutdown flag first.
Otherwise, we run afoul of a race condition.
Change-Id: Ic99793d68b3b943496ff932b4bdafd336fef7f82
Signed-off-by: Ashod Nakashian <ashod.nakashian@collabora.co.uk>
With --unattended, do not wait for a debugger
upon seg-faulting. This avoids the unnecessary wait
that prolongs failed unit-tests in automated runs.
Now run_unit.sh and Cypress Makefile set this flag.
Note that the wait only happens when in debug
builds, or when envar COOL_DEBUG is set. This
prevents us from waiting when running a debug
build where we can't see the output, or indeed
the run is on a CI build machine.
This flag can also be used by devs when reproducing
failures where there is no interest in attaching
a debugger. The logs are shorter and more
readable, too. At least in trace level.
Change-Id: Ice15482c6724abc47f5955402295198eb7f671ee
Signed-off-by: Ashod Nakashian <ashod.nakashian@collabora.co.uk>
Based on previous crashes, it is useful to have more granularity
on what happened last.
Change-Id: If18a3a4d7817be23a6f8aadd301827a8e1bc007e
Signed-off-by: Michael Meeks <michael.meeks@collabora.com>
Certain parts of the code assume ShutdownRequested is set
when relying on the Termination flag. This is because
they serve as degrees not as independent flags.
The Termination flag is basically a more impatient
ShutdownRequested flag. It is used to forcefully
and immediately terminate. It is not designed to be
used independently from ShutdownRequested.
This pair is a good candidate for unifying as a single enum.
Change-Id: I8e3913a1959868197d8c5a059e89cbdbc6cef070
Signed-off-by: Ashod Nakashian <ashod.nakashian@collabora.co.uk>
During startup we wait for extended time until
a child process is up and running. In case this
takes a long time, or indeed if forking permanently
fails, we simply don't respond to shutdown requests.
This patch makes it possible to wait for about 20
seconds at a time for a child. This way on average
we will exit the process within about 10 seconds,
assuming we are blocked waiting for a child that
will never spawn.
Change-Id: I4409cbc60aa3c7bd30970d4638c820bc581b65ba
Signed-off-by: Ashod Nakashian <ashod.nakashian@collabora.co.uk>
(cherry picked from commit 6ed8b4bb1a6532d947a6bb00f936183767b54a80)
Assuming that the intention is to output a pair of strings on start and
a pair of string on finish.
Signed-off-by: Miklos Vajna <vmiklos@collabora.com>
Change-Id: I1ab3b03c353a8d0875e2fee451ca34965fc3038e
Hopefully helps to hunt segv's more effectively in development.
Signed-off-by: Michael Meeks <michael.meeks@collabora.com>
Change-Id: Ife800f9902fe8d95711b3b14867ee21bb3c3f21b
Records the uno commands from different instances of ChildSession and
dumps the last 4 uno commands into the crashlog during a fatal crash
Signed-off-by: Gopi Krishna Menon <krishnagopi487.github@outlook.com>
Change-Id: I838f71769dc08df7076c040f3d72c15f7607e9d3
It was a request an approved by @kendy despite the signal
handler limitations. After several tries, we were blind
when we introduce the commit using the heap.
This reverts commit 9720eb0f81.
Most C and Posix API clobber errno. By failing to save
it immediately after invoking an API we risk simply
reporting the result of an arbitrary subsequent API call.
This adds LOG_SYS_ERRNO to take errno explicitly.
This is necessary because sometimes logging is not done
immediately after calling the function for which we
want to report errno. Similarly, log macros that log
errno need to save errno before calling any functions.
This is necessary as the argements might contain calls
that clobber errno.
This also converts some LOG_SYS entries to LOG_ERR
because there can be no relevant errno in that context
(f.e. in a catch clause).
A couple of LOG_ macros have been folded into others,
reducing redundancy.
Finally, both of these log macros append errno to the
log message, so there is little point in ending the
messages with a period.
Change-Id: Iecc656f67115fec78b65cad4e7c17a17623ecf43
Signed-off-by: Ashod Nakashian <ashod.nakashian@collabora.co.uk>
loolmount now works and supports mounting and
unmounting, plus numerous improvements,
refactoring, logging, etc.. When enabled,
binding improves the jail setup time by anywhere
from 2x to orders of magnitude (in docker, f.e.).
A new config entry mount_jail_tree controls
whether mounting is used or the old method of
linking/copying of jail contents. It is set to
true by default and falls back to linking/copying.
A test mount is done when the setting is enabled,
and if mounting fails, it's disabled to avoid noise.
Temporarily disabled for unit-tests until we can
cleanup lingering mounts after Jenkins aborts our
build job. In a future patch we will have mount/jail
cleanup as part of make.
The network/system files in /etc that need frequent
refreshing are now updated in systemplate to make
their most recent version available in the jails.
These files can change during the course of loolwsd
lifetime, and are unlikely to be updated in
systemplate after installation at all. We link to
them in the systemplate/etc directory, and if that
fails, we copy them before forking each kit
instance to have the latest.
This reworks the approach used to bind-mount the
jails and the templates such that the total is
now down to only three mounts: systemplate, lo, tmp.
As now systemplate and lotemplate are shared, they
must be mounted as readonly, this means that user/
must now be moved into tmp/user/ which is writable.
The mount-points must be recursive, because we mount
lo/ within the mount-point of systemplate (which is
the root of the jail). But because we (re)bind
recursively, and because both systemplate and
lotemplate are mounted for each jails, we need to
make them unbindable, so they wouldn't multiply the
mount-points for each jails (an explosive growth!)
Contrarywise, we don't want the mount-points to
be shared, because we don't expect to add/remove
mounts after a jail is created.
The random temp directory is now created and set
correctly, plus many logging and other improvements.
Change-Id: Iae3fda5e876cf47d2cae6669a87b5b826a8748df
Reviewed-on: https://gerrit.libreoffice.org/c/online/+/92829
Tested-by: Jenkins
Tested-by: Jenkins CollaboraOffice <jenkinscollaboraoffice@gmail.com>
Reviewed-by: Ashod Nakashian <ashnakash@gmail.com>
Seems to not cause any serious regressions in the iOS app or in "make
run", but of course I am not able to run a comprehensive check of all
functionality.
Change-Id: I44a0e8d60bdbc0a885db88475961575c5e95ce88
Reviewed-on: https://gerrit.libreoffice.org/c/online/+/93037
Tested-by: Jenkins
Tested-by: Jenkins CollaboraOffice <jenkinscollaboraoffice@gmail.com>
Reviewed-by: Tor Lillqvist <tml@collabora.com>
More readable and typically more efficient.
Change-Id: I9bd5bfc91f4ac255bb8ae0987708fb8b56b398f8
Reviewed-on: https://gerrit.libreoffice.org/c/online/+/95285
Reviewed-by: Michael Meeks <michael.meeks@collabora.com>
Tested-by: Jenkins
Tested-by: Jenkins CollaboraOffice <jenkinscollaboraoffice@gmail.com>
LibreOffice core uses that, too, and we support an even more
restricted set of compilers.
Change-Id: I0d0e2c8608e323eb5ef0f35ee8c46d02ab49a745
Reviewed-on: https://gerrit.libreoffice.org/c/online/+/92467
Tested-by: Jenkins CollaboraOffice <jenkinscollaboraoffice@gmail.com>
Reviewed-by: Tor Lillqvist <tml@collabora.com>
1) Don't actually kill anything with the kill command, otherwise kill(0,
SIGKILL) will kill the fuzzer itself.
2) Don't require a valid signature when authenticating with JWT, since
the private key is generated on each process startup.
3) Log when the JWT would be invalid due to an expired timestamp.
Change-Id: I0da285617e27910329c0e7ed80a6d02e86344ccf
Reviewed-on: https://gerrit.libreoffice.org/c/online/+/91737
Tested-by: Jenkins CollaboraOffice <jenkinscollaboraoffice@gmail.com>
Reviewed-by: Miklos Vajna <vmiklos@collabora.com>
ignoring the segv can lead to not making progress, while churning debug.
Change-Id: I97af266cec3feefe2dcbd9adb8dbf4b13a4d69bd
Reviewed-on: https://gerrit.libreoffice.org/c/online/+/87002
Tested-by: Jenkins CollaboraOffice <jenkinscollaboraoffice@gmail.com>
Reviewed-by: Michael Meeks <michael.meeks@collabora.com>
It is painful to check and search manually the PID to attach the LOKit
process when exists several pre-spawned waiting to load a document.
This patch helps to attach the debugger when the LOKit process is about
to load a document then send the "signal SIGUSR1" to resume it.
Change-Id: I3b15bd522c6ef3ef57dc3453b457dcf91f2661b9
Reviewed-on: https://gerrit.libreoffice.org/85430
Reviewed-by: Michael Meeks <michael.meeks@collabora.com>
Tested-by: Henry Castro <hcastro@collabora.com>
Reviewed-by: Henry Castro <hcastro@collabora.com>
This is the cleanest way to achieve the goal
of immediately exiting a child. This is used
for cleaning up kit instances when closing
docs, as well as in unit-tests.
Change-Id: I76870234b130a508044044b102419646abe81ac8
Reviewed-on: https://gerrit.libreoffice.org/83699
Reviewed-by: Ashod Nakashian <ashnakash@gmail.com>
Tested-by: Ashod Nakashian <ashnakash@gmail.com>
malloc is not signal safe, and must not be called
from signal-safe functions. If malloc itself signals,
calling it in the signal handler can deadlock.
Luckily, we only needed malloc for getting the
backtrace strings. Now we just write directly to
stderr, which is faster, cleaner, and safer.
Change-Id: I54093f45e05f2a0fd3c5cde0cc2104ffe6d81d2a
Reviewed-on: https://gerrit.libreoffice.org/83151
Reviewed-by: Ashod Nakashian <ashnakash@gmail.com>
Tested-by: Ashod Nakashian <ashnakash@gmail.com>
There are a few things acceptable/safe in a
signal handler, and taking locks is not one of them.
This replaces the logic with a simple counter that
serves the purpose just as well.
If we get a double signal, we log and ignore.
Change-Id: If589c18492468c120d00c213805467bcbba05d27
Reviewed-on: https://gerrit.libreoffice.org/83150
Reviewed-by: Ashod Nakashian <ashnakash@gmail.com>
Tested-by: Ashod Nakashian <ashnakash@gmail.com>
The following flags are affected:
ShutdownRequestFlag
TerminationFlag
DumpGlobalState
Since it's common to grep for all places
that set or reset these global flags, it
makes more sense to have explicit functions
for each operation. Now we have set and reset
accessors where appropriate and get is reserved
for read-only access.
This changes the getters to only return
the boolean value of these flags rather than
a reference to the atomic object, now that
they are read-only.
Also, a few Mobile-specific cases were folded
either with other Mobile-specific sections, or
they were now identical to the non-Mobile case
and therefore deduplicated, making the code
cleaner and more readable.
Change-Id: Icc852aa43e86695d4e7d5962040a9b5086d9d08c
Reviewed-on: https://gerrit.libreoffice.org/81978
Reviewed-by: Ashod Nakashian <ashnakash@gmail.com>
Tested-by: Ashod Nakashian <ashnakash@gmail.com>
Otherwise both loolwsd and unit-copy-paste.so would have a
SigHandlerTrap:
==26186==ERROR: AddressSanitizer: odr-violation (0x000002090ae0):
[1] size=40 'SigHandlerTrap' ../common/SigUtil.cpp:76:12
[2] size=40 'SigHandlerTrap' common/SigUtil.cpp:76:12
These globals were registered at these points:
[1]:
#0 0x5f9a28 in __asan_register_globals.part.13 /home/vmiklos/git/libreoffice/lode/packages/llvm-472c6ef8b0f53061b049039f9775ab127beafbe4.src/compiler-rt/lib/asan/asan_globals.cc:365
#1 0x7f8f537f1d8b in asan.module_ctor (/home/vmiklos/git/libreoffice/online-san/test/../test/.libs/unit-copy-paste.so+0x60ad8b)
[2]:
#0 0x5f9a28 in __asan_register_globals.part.13 /home/vmiklos/git/libreoffice/lode/packages/llvm-472c6ef8b0f53061b049039f9775ab127beafbe4.src/compiler-rt/lib/asan/asan_globals.cc:365
#1 0xe2bcfe in asan.module_ctor (/home/vmiklos/git/libreoffice/online-san/loolwsd+0xe2bcfe)
Change-Id: I54d5f6d4298848cacd437d302cff0e8c5003fb8c
Otherwise both loolwsd and unit-copy-paste.so would have a
ShutdownRequestFlag:
==13663==ERROR: AddressSanitizer: odr-violation (0x00000208f860):
[1] size=1 'ShutdownRequestFlag' ../common/SigUtil.cpp:60:19
[2] size=1 'ShutdownRequestFlag' common/SigUtil.cpp:60:19
These globals were registered at these points:
[1]:
#0 0x5f9a18 in __asan_register_globals.part.13 /home/vmiklos/git/libreoffice/lode/packages/llvm-472c6ef8b0f53061b049039f9775ab127beafbe4.src/compiler-rt/lib/asan/asan_globals.cc:365
#1 0x7f9b903f1d0b in asan.module_ctor (/home/vmiklos/git/libreoffice/online-san/test/../test/.libs/unit-copy-paste.so+0x60ad0b)
[2]:
#0 0x5f9a18 in __asan_register_globals.part.13 /home/vmiklos/git/libreoffice/lode/packages/llvm-472c6ef8b0f53061b049039f9775ab127beafbe4.src/compiler-rt/lib/asan/asan_globals.cc:365
#1 0xe2b9fe in asan.module_ctor (/home/vmiklos/git/libreoffice/online-san/loolwsd+0xe2b9fe)
Change-Id: I247760325f804813249e814dbb4576493619dee7
Otherwise both loolwsd and unit-copy-paste.so would have a
DumpGlobalState:
==5783==ERROR: AddressSanitizer: odr-violation (0x00000208f7a0):
[1] size=1 'DumpGlobalState' ../common/SigUtil.cpp:49:19
[2] size=1 'DumpGlobalState' common/SigUtil.cpp:49:19
These globals were registered at these points:
[1]:
#0 0x5f9a08 in __asan_register_globals.part.13 /home/vmiklos/git/libreoffice/lode/packages/llvm-472c6ef8b0f53061b049039f9775ab127beafbe4.src/compiler-rt/lib/asan/asan_globals.cc:365
#1 0x7f5c5edf1c9b in asan.module_ctor (/home/vmiklos/git/libreoffice/online-san/test/../test/.libs/unit-copy-paste.so+0x60ac9b)
[2]:
#0 0x5f9a08 in __asan_register_globals.part.13 /home/vmiklos/git/libreoffice/lode/packages/llvm-472c6ef8b0f53061b049039f9775ab127beafbe4.src/compiler-rt/lib/asan/asan_globals.cc:365
#1 0xe2b98e in asan.module_ctor (/home/vmiklos/git/libreoffice/online-san/loolwsd+0xe2b98e)
Change-Id: I4b7b0238eb9b38a30875e8788c1dcb27f1d1643f
Otherwise both loolwsd and unit-copy-paste.so would have a
TerminationFlag:
==11732==ERROR: AddressSanitizer: odr-violation (0x00000208f4a0):
[1] size=1 'TerminationFlag' ../common/SigUtil.cpp:41:19
[2] size=1 'TerminationFlag' common/SigUtil.cpp:41:19
These globals were registered at these points:
[1]:
#0 0x5f9988 in __asan_register_globals.part.13 /home/vmiklos/git/libreoffice/lode/packages/llvm-472c6ef8b0f53061b049039f9775ab127beafbe4.src/compiler-rt/lib/asan/asan_globals.cc:365
#1 0x7f5df9cf18cb in asan.module_ctor (/home/vmiklos/git/libreoffice/online-san/test/../test/.libs/unit-copy-paste.so+0x60a8cb)
[2]:
#0 0x5f9988 in __asan_register_globals.part.13 /home/vmiklos/git/libreoffice/lode/packages/llvm-472c6ef8b0f53061b049039f9775ab127beafbe4.src/compiler-rt/lib/asan/asan_globals.cc:365
#1 0xe2b4fe in asan.module_ctor (/home/vmiklos/git/libreoffice/online-san/loolwsd+0xe2b4fe)
Change-Id: Ic620b143ecb77699f40676ff39d0fa7abceb34d5
Spent hours on trying to cleverly use the existing TerminationFlag
(with minor modifications to the code that checks it, and some
additional code to set and reset it), but could not get it to work.
This is simpler, but sure, using a global variable is ugly of course.
At least the new MobileTerminationFlag is very specific in semantics
and only used in the mobile apps.
Change-Id: I0775fdfa7880750ca12c6fd7ec41d3d3ceb2f0ad
(Note that when I say 'NUL' I mean the ASCII character called NUL,
i.e. a zero byte. Not to be confused with 'NULL'.)
Why FatalGdbString has to be a C style fixed size char array I don't
know. Or wait, I do know. Because SPEED!!! And using C strings safely
is trivial.
Change-Id: Id28b00a6e3219cf6f015c4209732f33216f83b22
We don't have any user-generated signals to handle by shutting down in
an app.
One less thing to worry about. Now it's just the global
TerminationFlag that is problematic when the code runs in just one
process.