Commit graph

13 commits

Author SHA1 Message Date
Patrick Luby
4b71e988b4 Fix compiler warnings when building on macOS Sonoma
Signed-off-by: Patrick Luby <plubius@neooffice.org>
Change-Id: I7e88e0ba272fc00892059c96a2cd0237657e23b9
2023-10-02 08:50:52 +01:00
Caolán McNamara
0f5c171433 do init_gather_lut at start if simd::init succeeds
and avoid local static in simd_initPixRowSimd

Signed-off-by: Caolán McNamara <caolan.mcnamara@collabora.com>
Change-Id: Idb89d5069da5ff10b346b5e4d767374d4529a96f
2023-09-26 08:39:20 +01:00
Caolán McNamara
77d1424b8d rleMask->rleMaskBlock in non-simd branch
Signed-off-by: Caolán McNamara <caolan.mcnamara@collabora.com>
Change-Id: I09d0c900535eba6b274e294ff39ea71a9d9c323f
2023-09-25 16:55:04 +01:00
Michael Meeks
fde98db394 Factor out CPU RLE into a function with similar signature.
Performance testing suggests that:
 + dense text this is 2x faster.
 + 'hello world' text this is 1.7x faster.

Change-Id: I4ff940663c44d0b22c9187deb4ee397a9d9953b0
Signed-off-by: Michael Meeks <michael.meeks@collabora.com>
2023-09-25 16:55:04 +01:00
Michael Meeks
8f2c7f4de7 Add idea to save an instruction by masking at the same time ...
Change-Id: I9ecd35c1655bd72994a297b8897db473a921bc20
Signed-off-by: Michael Meeks <michael.meeks@collabora.com>
2023-09-25 16:55:04 +01:00
Michael Meeks
2e1e0c1260 Save another variable and rename.
Change-Id: I279df2615f972acd3f8107b236d67232c3d6015f
Signed-off-by: Michael Meeks <michael.meeks@collabora.com>
2023-09-25 16:55:04 +01:00
Michael Meeks
81a8bb4589 Switch to a single loop to reduce branching.
Simply calculate our loop variables from the iteration we're on.

Change-Id: I0bb73302fb09963b2a1f5b3d93ef302316ef1d4f
Signed-off-by: Michael Meeks <michael.meeks@collabora.com>
2023-09-25 16:55:04 +01:00
Michael Meeks
ad768d2337 Handle only 256 pixel runs, to drop another variable.
Change-Id: I5e28b4f86ae191b181a69b82511d3393b5fc8c20
Signed-off-by: Michael Meeks <michael.meeks@collabora.com>
2023-09-25 16:55:04 +01:00
Michael Meeks
0153ddb554 First cut at getting aligned loads and simpler loop structure.
Remove the special case for the first pixel, and instead have a
previous pixel run initialized to zero.

AVX2 has no effective shift for the while si256 so use permutation
to shift the last pixel of the previous run into the right place,
mask it and combine.

Saves a second un-aligned load of the same data, and branch.

Change-Id: I77c9cdead13d37aaf4d9f31d98cbd5c4a9c5ce24
Signed-off-by: Michael Meeks <michael.meeks@collabora.com>
2023-09-25 16:55:04 +01:00
Michael Meeks
743fa7d91f Use a LUT and SIMD packing logic to accelerate RLE pixel copy.
Change-Id: I6874f1b33acf6f0f3c72c86f9fbe232e1f5a560a
Signed-off-by: Michael Meeks <michael.meeks@collabora.com>
2023-09-25 16:55:04 +01:00
Caolán McNamara
42e98bb2e4 experimentally bootstrap something using avx2 to generate bitmap
just enough to get the same results as before

https://github.com/CollaboraOnline/online/issues/7165

Signed-off-by: Caolán McNamara <caolan.mcnamara@collabora.com>
Change-Id: I109c9b8f1e7935782c72e0179aa0ed48712eadb6
2023-09-25 16:55:04 +01:00
Michael Meeks
6d6425336d SIMD - first cut at building LUT for vpermd gather.
Change-Id: I6ae13be0a36b4e30b3d535029313d8402da7de1d
Signed-off-by: Michael Meeks <michael.meeks@collabora.com>
2023-09-25 16:55:04 +01:00
Michael Meeks
cce3767ba8 First cut SIMD wrappers / separation to accelerate RLE code.
Split it out as a C file, to avoid accidental C++ header inclusion,
and C is a cross-platform assembler anyway so a good match.

Change-Id: I6c042781713aecaf143b9663af8377659a7deaf1
Signed-off-by: Michael Meeks <michael.meeks@collabora.com>
2023-09-25 16:55:04 +01:00