office-gobmx/editeng/source/rtf
Armin Le Grand (allotropia) 290c8f6e04 ITEM: Change SfxItemSet to use unordered_set
With all the changes done for Items we can now do deeper
basic changes to the ItemSet itself with manageable risk.

I already did https://gerrit.libreoffice.org/c/core/+/166455
aka 'ITEM: Add measurements for SfxItemSet usages' to
get some statistical information about the fill/usage grade
of the ItemSet's fixed PtrArray to SfxPoolItems, check
that out to get an own picture.

Those results show that an average usage is between some
extremes ranging from 0% to 50%, but when using more checks
and using multiple files/interactions/edits in all
applications we end up with around typical 12%-19% of that
array used, the rest is nullptr's.

Thus I thought about one of the initial ideas of this
series of changes (ITEM), to use a std::unordered_map
(A) instead of that fixed array of SfxPoolItem Ptr's (B).
Tthat again was for a complete type-based rewrite, which
I once did a POC, but the code cannot be adapted to that,
just too much work.

Those are very different in architecture, (B) is done
since a long time (since ever), but as pointed out above,
(A) is now possible. There are many aspects to it, let's
grep some:

Speed (iterate): (A) and (B) are both linear. (A) has
less entries, but may touch different mem areas
(buckets). (B) is linear, but many empty spaces which
are usually uselessly iterated.

Speed (access Item by WhichID): (A) is hashed by
WhichID, so mostly linear for unordered_set/hash.
(B) is in a linear array, but has to calculate the
offset for each WhichID access.

So I guess speed will mostly equal out.

Memory: (A) will be dynamically allocated (buckets),
but stl is highly optimized and may even re-use areas,
has to provide some extra info but will need less
places for Items since it's dynamic and can start empty.
(B) will be allocated once (except AllItemSet) and may
even be 'derived' to the ItemSet (SfxItemSetFixed), but
has to allocate all space at once.

I can go in lots of more detail here, but due to the
results of the statistics I just made a test now,
including measuring some results (will include in
gerrit, not here). I used two pro versions for that.
That way I have some data now.

Result is:
- It is now possible to do this, it runs stable :-)
- Speed: As expected, mostly no change
- Memory: Depending on usage, 0% to 25% less,
  usually around 8-10%

Side effects:
- SfxAllItemSet could be done without WhichRanges,
  thus without expensive 'merges'
- SfxItemSetFixed is not needed. While the idea is
  good, it needs a lot of extra stuff
- Access to Items is linear if set
- WhichRanges: Still needed, but for vaildity
  checking/filtering of ItemSet content
- WhichRanges: Worth to think about if these are
  needed at all, probably just exist for historical
  reasons since allocation/number of added Items
  was never ever dynamic -> just not allocatable

Putting the current version on gerrit, may still
trigger some UTs (checked SW/SC/SD...)

I did not like that functionality at ItemSet that
hands out a vector of the set items for cases where
to avoid iterating and deleting items at the same
time at an ItemSet, so changed to using a local
vector of remembered WhichIDs and deleting after
the iterator is done. I also saw some strange
usages of SfxItemIter in sw which i will have to
check.

Since there are still problems with UTs which I can
not reproduce locally I have now added asserts to
the case that an ItemSet gets changed with still
having active SfxItemIter(s). That is always an
error, but with the architecture of that fixed
array of ItemPtrs did not have devastating effects
(purely by coincidence).
With that asserts, UTs run well in SC and SD, but
in SW I get 11 (eleven!) asserts from the UTs, all
of them from 'ITEM: SfxItemSet ClearItem' BTW.
I guess these have to be fixed before thinking
about this change...

Good news is that all 11 cases were the same in SW,
in SwHistorySetAttrSet::SwHistorySetAttrSet which
does some strange things using two SfxItemIter in
parallel. Thus SW UTs are also clear and I see no
more errors caused by ItemSets being changed while
SfxItemIters are alive.
Bad news is that I still have errors to hunt...

NOTE: Have now cleaned all UTs, this showed that
there are some unexpected side-effects of the Items
being processed in another order when SfxItemIter
is used, also found one case where a
WhichRangesContainer is constructed for a
SfxItemSet using the set items from another ItemSet
and SfxItemIter to do so. There *might* be more
cases not covered by UTs.

NOTE: While speed stays the same and mem is reduced
up to 25% (see measurements in 1st comment) another
*important* aspect is that this frees the way for
using ItemSets *without* WhichRanges - these are
necessary mainly to create that fixed array of
pointers to the Items in a *manageable* size. With
a dynamic structure like unordered_set there is
in principle *no need* anymore to use WhichRanges
to pre-define the Items a Set could hold. There
is one exception: We have cases where one ItemSet
is set at another one with defined WhichRanges to
*filter* the contained Items - these would have to
be identified. This is a rare case and we would have
to check cases where an ItemSet gets set at another
ItemSet. This would be as if all ItemSets would be
AllItemSets in principle - much easier for everyone.

NOTE: Waited for 24.8 split just to not take
unnecessary risks.

Change-Id: I75b0ac1d8a4495a3ee93c1117bdf618702785990
Reviewed-on: https://gerrit.libreoffice.org/c/core/+/166972
Reviewed-by: Armin Le Grand <Armin.Le.Grand@me.com>
Tested-by: Jenkins
2024-06-11 17:26:42 +02:00
..
rtfitem.cxx
svxrtf.cxx