Access Control List Performance
The ACL concept in G2 might need to be reworked since its performance is really poor when having useralbums or just generally very heterogenous permissions for your albums and items.
Main Problem
- The queries are quite fast if the accessListIds IN (?, ?, ..) clause is really short
- The queries get super slow if that's not the case
- Our concept of ACL: one ACL id sums up all permissions all users / groups have for a specific item. If there's another item with the same sum of permissions for all users / groups, we reuse the same ACL id.
- Usually, this works quite good, because we assume that you we can reuse ACL ids very well since the permissions in a G2 are usually the same for a lot of items / albums
- The worst case is if every item in G2 has different permissions. Then #accessListId = #itemIds
- Useralbums come pretty close to the worst case. Joe7 has 1200 useralbums, that equals to 1200 ACL ids since every useralbum has different permissions
- Owner specific permissions could lead to a similar bad case
Slow queries
- GalleryCoreApi::fetchChildItemIds
- GalleryCoreApi::fetchDescendentCounts when it queries ::fetchUncachedDescendentCounts (and on Joe7's G2, it seems to do that everytime)
- ... everywhere we use fetchAccessListIds($permission, $userId) and use the returned ACL ids in another query
Discussion
- First and foremost, get the number of ACL ids for a specific activeUser down. Case useralbums: User X has the same permissions for all useralbums but his own. So there should be exactly 2 ACL ids involved for this user, and not #userAlbums ACL ids.
- Bharat proposed to allow multiple ACL ids per itemId
- Maybe we need some smart logic to decide when to use multiple ACL ids per itemId and when to merge multiple ACL ids into a single for a specific itemId
- In fetchChildItemIds, don't call fetchAccessListIds to get a list of all ACLs for this user, in the fetchAccessListIds call add more arguments to only fetch the ACL ids for the current album's children. This will get down the nr of ACL ids for the IN (...) clause to a few, unless we're viewing the parent album of all useralbums. Together with the above fixes, this should be a good solution
- Better use of the LIMIT clause where possible and of caching where needed -> analyze all current queries with tons of useralbums.
- The fetchChildItemIds and fetchUncachedDescendentCounts queries can be optimized dramatically
- An alternative for fetchChildItemIds:
SELECT g2_ChildEntity.g_id, g2_ItemAttributesMap.g_orderWeight
FROM g2_ChildEntity, g2_Item, g2_AccessSubscriberMap, g2_ItemAttributesMap, g2_AccessMap
WHERE g2_ChildEntity.g_parentId =7 AND g2_ChildEntity.g_id = g2_Item.g_id
AND g2_AccessSubscriberMap.g_itemId = g2_ChildEntity.g_id
AND g2_AccessSubscriberMap.g_accessListId = g2_AccessMap.g_accessListId
AND (g2_AccessMap.g_userId = $userId OR g2_AccessMap.g_groupId IN ( $groupIdMarkers ))
AND g2_ItemAttributesMap.g_itemId = g2_ChildEntity.g_id
GROUP BY g2_ChildEntity.g_id HAVING BIT_OR( g2_AccessMap.g_permission ) & $permBits = $permBits
ORDER BY g2_ItemAttributesMap.g_orderWeight, g2_ChildEntity.g_id
LIMIT 9
- Here, we're merging fetchAccessListIds and the actual fetchChildItemIds queries into a single one.
- Advantages:
- We're first limiting the query space by parentItemId<->childItemId, then we're using only the ACL id for this subspace
- A single query, all can be done with indices
- The join speeds up the query by another factor
- Disadvantages:
- We're doing redundant work. If #aclId is small and #childItemIds is large, we're doing BIT_OR() again and again for the same values. Thus, I think this approach isn't good at all. The above changes are better. But for the parent album of the useralbums, this query is quite optimal right now, since #childItemIds = #aclIds there.
- We've done the same for fetchUncachedDescendentCounts:
SELECT iam0.g_itemId, COUNT(iam1.g_itemId)
FROM g2_ItemAttributesMap AS iam0, g2_ItemAttributesMap AS iam1, g2_AccessSubscriberMap, g2_AccessMap
WHERE iam0.g_itemId IN ($parentSequenceMarkers)
AND iam1.g_parentSequence LIKE CONCAT(iam0.g_parentSequence, iam0.g_itemId, '/%')
AND iam1.g_itemId = g2_AccessSubscriberMap.g_itemId
AND g2_AccessSubscriberMap.g_accessListId = g2_AccessMap.g_accessListId
AND (g2_AccessMap.g_userId = $userId OR g2_AccessMap.g_groupId IN ($groupIdMarkers))
GROUP BY iam0.g_itemId HAVING BIT_OR( g2_AccessMap.g_permission ) & $permissionBits = $permissionBits;
Same problems here. Redundant BIT_OR() work. But better locality.
Further notes
GalleryEntity::save() makes an entry in AccessSubscriberMap for all newly created _Entity_.
if ($this->testPersistentFlag(STORAGE_FLAG_NEWLY_CREATED)) {
GalleryCoreApi::relativeRequireOnce(
'modules/core/classes/GalleryAccessSubscriberMap.class');
$ret = GalleryAccessSubscriberMap::addMapEntry(
array('itemId' => $this->getId(), 'accessListId' => 0));
if ($ret->isError()) {
return $ret->wrap(__FILE__, __LINE__);
}
}
Is it really needed? (like 130K of my 240K ASM entries has accessListId value 0 and I have only 90K rows/items in g2_Item.)
Shouldn't we make entries only for ItemIds?
An entity can be (a quick stat from my db):
g_entityType count(g_id)
GalleryAlbumItem 5575
GalleryComment 1936
GalleryDerivativeImage 148126
GalleryEntity 1
GalleryGroup 3
GalleryLinkItem 464
GalleryMovieItem 908
GalleryPendingUser 19
GalleryPhotoItem 84989
GalleryUnknownItem 47
GalleryUser 1261
ThumbnailImage 33
Do we really need rows for GalleryDerivativeImage in here?
(it's pretty sure we don't need rows for groups,users(pendinguser) as this kinda mapping is already done in AccessMap)
Don't we set permissions like viewcomment/viewresizesonly on albums/photoitems?
If the answer is yes, they could be ommitted (almost 70%).
Not to forget
- Migrate should call GalleryCoreApi::compactAccessList()
- Imageblock module has bad queries too
- GalleryTheme does quite a few fetchChild... calls. Make sure we're using caching where possible and optimize the queries