Current ACL Design
Goal: An efficient data structure for the 3-tuple (user, permission, item).
- Use user-groups to reduce the number of tuples exploiting the fact that most users have similar permissions
- Use Access Control Lists (ACL) for efficient lookups and to exploit that most items have the same permissions
- Use a single bit-array to store all permissions in a single value
- Easy to register new permissions
- Bit / integer operations are fast
Queries
Frequent:
- Given an albumId and a userId, return a list of all viewable child-items
- Given a list of itemIds and a userId, return all permissions for these items
Common:
- Given an albumId and a userId, return the viewable sub-tree (itemIds)
- Given a userId, return a list of all viewable/editable/... items.
Rare:
- Given an itemId, return all permissions all users have for that item
Architecture
- The 3-tuple (user, permission, item) is stored in two separate maps.
- AccessMap stores the 3-tuple (accessListId, userOrGroupId, permission)
- AccessSubsriberMap stores the 2-tuple (accessListId, itemId)
AccessMap defines Access Control Lists identified by accessListId and AccessSubscriberMap assigns the ACLs to the itemIds.
Current Algorithm
Get a List of Items
Given:
- The search space is limited by a list of itemIds or by an albumId designating the root of a sub-tree to be searched in.
- The userId is given. We expand the userId to a list of userOrGroupIds (resolving all group memeberships)
- A specific permission is required. Mostly 'core.view'.
- Rarely, a permission-mask is required (combination of permissions)
Assumptions:
- The list of ACLs is small.
- The user (with all its groups) is associated with < 20 ACLs
Algorithm:
- Limit the search space by userId and permission (AccessMap table): $accessListIds = fetchAccessListIds('core.view', $userId);
- Limit the search space by itemIds and $accessListIds: Join the AccessSubscriberMap with the Item table and use all given criteria.
Get all Permissions for some Items
Given:
- A list if itemIds.
- The userId is given. We expand the userId to a list of userOrGroupIds (resolving all group memeberships)
- Use a single query joining AccessMap with AccessSubscriberMap
- Limit by given itemIds
- Limit by userOrGroupId
- Convert the permission-bits to a better permission-representation in PHP
General Purpose
Given:
- Some required permission, e.g. 'core.edit'
- A userId
We're looking for a list of all items / entities we have access to. As a general purpose helper, there is:
- $accessListIds = fetchAccessListIds($permission, $userId)
Which can then later be used in any query which involved joining the AccessSubscriberMap table with other tables.
Limitations
The current design maps exactly 1 accessListId to each itemId.
Pro's:
- The cardinality of AccessSubscriberMap is always limited to #itemIds
- Lookup by user / group -> ACLs is fast
- Loopup by itemId -> ACLs is fast
- If there are only very few users that add items, it scales very well.
- If permissions are very uniform for all items, it scales very well.
Con's:
- For each item with slightly different permissions, there is a new accessListId.
- Scales very bad for social / community sites (a lot of users that add items)
- Scales bad with the number of permissions (since permissions are grouped into a single bit-array / tuple)
API
Current Usage:
- Mostly fetchAccessListIds('core.view', $userId) followed by a query joining Item with AccessSubscriberMap
- Could profit from using the itemId list to limit the number of accessListIds.
- Rarely used to query the whole item-tree.
- fetchAccessListId($itemId), part of core API, but not really used (just in GalleryItem::save)
Challenge
- Huge search space:
- Huge Nr. of items
- Huge Nr. of users
- Medium Nr. of groups
- Large Nr. of distinct permissions (bits)
- Very heterogeneous permissions
- Performance should scale very well
- For lookup by userId/groupId
- For lookup by itemId
- For lookup by permission bits
Scaling Issues By Example
Scaling With Users / Permission Diversity
- Community usage profile - Assume each user creates his own (user)album
- Each user album has different permissions. The owner (creator) usually has more permissions than everyone else.
- Thus, we need a different ACL for each user's album.
- Assuming you have 50'000 users, all basic queries for all users (including the guest user) have ~50'000 parameters (ACL ids).
- For each ACL, there are about 4 tuples. Admin permissions, registeres users permissions, everybody permissions, user (owner) permissions.
Summary:
- #accessListIds scales with #users about 1:1 assuming all users have their own album.
- #accessListIds by user scales with #users!!
- Cardinality of AccessMap ~ 4 * #accessListIds = 4 * #users
Scaling With Items
- This is where G2's ACL design shines and this is why it was designed as it is. The initial ACL design obsoleted a PermissionMap design which scaled pretty badly with the number of items.
- Assuming all items have the same permissions there is just a single ACL. AccessMap and AccessSubscubscriberMap are both relatively small.
Scaling With Permission Bits
- Each ACL entry includes all permission bits (1 bit for core.view, 1 for core.edit, 1 for core.delete, etc)
- Traditional ACL designs would have separate ACL structures for each permission bit. E.g. a AccessMap and AccessSubscriberMap table for each bit (or have separate accessListIds for each bit).
- By grouping the bits together, 32 ot 64 of them, we have a single accessListIds domain for all possible combinations of what users have what permissions for what items.
- At the same time, we don't need that information for most cases. For one of the major use cases, we only want to know if permission bit x is set for user y for item z.
Solutions / Counter Measures
Minimize the Number of AccessListIds Per User
- The current design minimizes the cardinality of AccessSubscriberMap (1 accessListId per ItemId)
- It's a clean design: It defines ACLs in AccessMap and associates them with items in AccessSubscriberMap
- At the cost of more accessListIds
- And the redundancy in AccessMap can be quite large (a lot of ACLs only differing in a single (userId, permission) tuple.
- Goal: Minimize the number of AccessListsIds Per User (minimize fetchAccessListIds('core.view', $userId)).
And if that is contradiction with reducing overall complexity, find a trade-off between the cardinality of ACLs/user and #AccessSubscriberMap.
Achieving Minimal ACLs
- Goal: Minimize ACLs per user
- Same as minimal ACLs overall?
- Expensive algorithm:
- Each original ACL of n (userOrGroupId, permission) tuples can be split into smaller ACLs having 1..n of these tuples
- TODO: Find an algorithm for this optimization. And find algebraic / combinatoric properties to reduce the complexity of the algorithm.
- Fallback: If the algorithm is too expensive, use some fallback rules to reduce the number of ACLs.
Allow For Multiple AccessListIds Per ItemId
- Change the PK in AccessSubscriberMap from itemID to (accessListId, itemId)
- Split ACLs into sub-ACLs
- Example:
- A single ACL for a user-album might have 3 tuples in AccessMap {(admins, all permissions), (everybody, view), (owner, all permissions)}
- Each user-album needs a separate ACL with a different owner userId.
- By separating an ACL for the admin and everybody group permissions, we can re-use that ACL for all user-albums.
- The owner of a user-album would receive additionally another ACL for (owner, all permissions)
- Warning: GalleryCoreApi::fetchAccessListId($itemId) is partially broken by this API change. But we can offer a somewhat graceful degradation and we can log such calls. And it's only used in GalleryItem, nowhere else.
- E.g. have it log the call and return the accessListId most relevant to the user.
- Remove the function after the next major API change.
Partition User from Group ACL Entries
- An easy way to optimize for most cases might be to use separate accessListIds for group and user permissions.
- Another easy algorithm is to search for redundancies between existing ACLs. We can easily identify that 3 of 4 tuples of all user-album related ACLs are redundant and create a new ACL for that case.
Partition By Permission Bits
- Use separate ACL structures for each permission bit.
- This can help for very large collections, but without other measures this won't get us very far.
Permission Inheritance In Item Tree
- Similarely to how user-groups help grouping users with the same permissions together
- Most data-items have the same permission as their parent-album
- Most albums have the same permissions as their parent-album
- Could reduce the cardinality of the Access*Map tables by one or two orders of magnitude
- Costs in terms of selecting all item ancestors to select all relevant ACLs and to compute the permissions
Distributed ACL Management
- Partition the ACL Management by AlbumItem, similar to Permission Inheritance
Store the ACL / permission information locally for each (sub-)album. E.g. done in Microsoft Windows.
Con's:
- While most queries in G2 are spatially limited to a sub-album, there are still some queries against the whole item tree.
Limit the Search Space
Do not query for large data sets / or in the whole itemId tree. Query very locally, restrict the queries always to a very narrow search space.
- Limit spatially by itemIds (e.g. itemId IN (7,8,9))
- Limit by user or group id (e.g. userOrGroupId IN (2,3,4))
- Limit by permission (e.g. 'core.view')
Example:
- Instead of using fetchAccessListIds(permission, userId), consider starting with limiting the search space by itemIds / sub-tree first. Or do all in a single, larger query.
Cache
- Use (persistent) caching as appropriate
- Prevent from frequent rebuilding of the cache, strive for iterative rebuilding
Summary
Problem
- In the current design fetchAccessListIds($permission, $userId) is wildly used to get the ACLs before using that list of ACLs in another query to get a list of items.
- This approach only works fine if the number of ACLs per user is very low (e.g. #ACLs < 20 per user). Else the subsequent queries become rather expensive.
- The current design is optimized for low numbers of ACLs which is usually the case for single-uploading-user installations.
- The number of ACLs increases linearly with the number of user-albums and with the diversity of the item-permissions.
Suggested Solution
Minimalize the number of ACLs per user:
- Remove redundancy in AccessMap by extracing and reusing new ACLs
- Thereby reducing / minimizing the ACLs per user (fetchAccessListIds($permission, $userId))
- At the cost of multiple accessListIds per itemId in AccessSubscriberMap
Details:
- Allow more than one accessListId to be mapped to itemId in AccessSubscriberMap
- Create an ACL for user-album permissions excluding the (ownerId, permission) tuple
- Periodically compact the ACL mapping by searching for redundancies between ACLs and extracting new ACLs.
- Maybe do that at addPermission() time.