Gallery2:Design Documents:ACL Scalability - Gallery Codex
Personal tools

Gallery2:Design Documents:ACL Scalability

From Gallery Codex

Current ACL Design

Goal: An efficient data structure for the 3-tuple (user, permission, item).

  • Use user-groups to reduce the number of tuples exploiting the fact that most users have similar permissions
  • Use Access Control Lists (ACL) for efficient lookups and to exploit that most items have the same permissions
  • Use a single bit-array to store all permissions in a single value
    • Easy to register new permissions
    • Bit / integer operations are fast

Queries

Frequent:

  • Given an albumId and a userId, return a list of all viewable child-items
  • Given a list of itemIds and a userId, return all permissions for these items

Common:

  • Given an albumId and a userId, return the viewable sub-tree (itemIds)
  • Given a userId, return a list of all viewable/editable/... items.

Rare:

  • Given an itemId, return all permissions all users have for that item

Architecture

  • The 3-tuple (user, permission, item) is stored in two separate maps.
    • AccessMap stores the 3-tuple (accessListId, userOrGroupId, permission)
    • AccessSubsriberMap stores the 2-tuple (accessListId, itemId)

AccessMap defines Access Control Lists identified by accessListId and AccessSubscriberMap assigns the ACLs to the itemIds.

Current Algorithm

Get a List of Items

Given:

  • The search space is limited by a list of itemIds or by an albumId designating the root of a sub-tree to be searched in.
  • The userId is given. We expand the userId to a list of userOrGroupIds (resolving all group memeberships)
  • A specific permission is required. Mostly 'core.view'.
  • Rarely, a permission-mask is required (combination of permissions)

Assumptions:

  • The list of ACLs is small.
  • The user (with all its groups) is associated with < 20 ACLs

Algorithm:

  1. Limit the search space by userId and permission (AccessMap table): $accessListIds = fetchAccessListIds('core.view', $userId);
  2. Limit the search space by itemIds and $accessListIds: Join the AccessSubscriberMap with the Item table and use all given criteria.

Get all Permissions for some Items

Given:

  • A list if itemIds.
  • The userId is given. We expand the userId to a list of userOrGroupIds (resolving all group memeberships)
  1. Use a single query joining AccessMap with AccessSubscriberMap
    1. Limit by given itemIds
    2. Limit by userOrGroupId
  2. Convert the permission-bits to a better permission-representation in PHP

General Purpose

Given:

  • Some required permission, e.g. 'core.edit'
  • A userId

We're looking for a list of all items / entities we have access to. As a general purpose helper, there is:

  1. $accessListIds = fetchAccessListIds($permission, $userId)

Which can then later be used in any query which involved joining the AccessSubscriberMap table with other tables.

Limitations

The current design maps exactly 1 accessListId to each itemId.

Pro's:

  • The cardinality of AccessSubscriberMap is always limited to #itemIds
  • Lookup by user / group -> ACLs is fast
  • Loopup by itemId -> ACLs is fast
  • If there are only very few users that add items, it scales very well.
  • If permissions are very uniform for all items, it scales very well.

Con's:

  • For each item with slightly different permissions, there is a new accessListId.
  • Scales very bad for social / community sites (a lot of users that add items)
  • Scales bad with the number of permissions (since permissions are grouped into a single bit-array / tuple)

API

Current Usage:

  • Mostly fetchAccessListIds('core.view', $userId) followed by a query joining Item with AccessSubscriberMap
    • Could profit from using the itemId list to limit the number of accessListIds.
    • Rarely used to query the whole item-tree.
  • fetchAccessListId($itemId), part of core API, but not really used (just in GalleryItem::save)

Challenge

  • Huge search space:
    • Huge Nr. of items
    • Huge Nr. of users
    • Medium Nr. of groups
    • Large Nr. of distinct permissions (bits)
    • Very heterogeneous permissions
  • Performance should scale very well
    • For lookup by userId/groupId
    • For lookup by itemId
    • For lookup by permission bits

Scaling Issues By Example

Scaling With Users / Permission Diversity

  • Community usage profile - Assume each user creates his own (user)album
  • Each user album has different permissions. The owner (creator) usually has more permissions than everyone else.
    • Thus, we need a different ACL for each user's album.
  • Assuming you have 50'000 users, all basic queries for all users (including the guest user) have ~50'000 parameters (ACL ids).
  • For each ACL, there are about 4 tuples. Admin permissions, registeres users permissions, everybody permissions, user (owner) permissions.

Summary:

  • #accessListIds scales with #users about 1:1 assuming all users have their own album.
  • #accessListIds by user scales with #users!!
  • Cardinality of AccessMap ~ 4 * #accessListIds = 4 * #users

Scaling With Items

  • This is where G2's ACL design shines and this is why it was designed as it is. The initial ACL design obsoleted a PermissionMap design which scaled pretty badly with the number of items.
  • Assuming all items have the same permissions there is just a single ACL. AccessMap and AccessSubscubscriberMap are both relatively small.

Scaling With Permission Bits

  • Each ACL entry includes all permission bits (1 bit for core.view, 1 for core.edit, 1 for core.delete, etc)
  • Traditional ACL designs would have separate ACL structures for each permission bit. E.g. a AccessMap and AccessSubscriberMap table for each bit (or have separate accessListIds for each bit).
  • By grouping the bits together, 32 ot 64 of them, we have a single accessListIds domain for all possible combinations of what users have what permissions for what items.
  • At the same time, we don't need that information for most cases. For one of the major use cases, we only want to know if permission bit x is set for user y for item z.

Solutions / Counter Measures

Minimize the Number of AccessListIds Per User

  • The current design minimizes the cardinality of AccessSubscriberMap (1 accessListId per ItemId)
    • It's a clean design: It defines ACLs in AccessMap and associates them with items in AccessSubscriberMap
    • At the cost of more accessListIds
    • And the redundancy in AccessMap can be quite large (a lot of ACLs only differing in a single (userId, permission) tuple.
Goal: Minimize the number of AccessListsIds Per User (minimize fetchAccessListIds('core.view', $userId)).

And if that is contradiction with reducing overall complexity, find a trade-off between the cardinality of ACLs/user and #AccessSubscriberMap.

Achieving Minimal ACLs

  • Goal: Minimize ACLs per user
  • Same as minimal ACLs overall?
  • Expensive algorithm:
    • Each original ACL of n (userOrGroupId, permission) tuples can be split into smaller ACLs having 1..n of these tuples
TODO: Find an algorithm for this optimization. And find algebraic / combinatoric properties to reduce the complexity of the algorithm.
Fallback: If the algorithm is too expensive, use some fallback rules to reduce the number of ACLs.

Allow For Multiple AccessListIds Per ItemId

  • Change the PK in AccessSubscriberMap from itemID to (accessListId, itemId)
  • Split ACLs into sub-ACLs
  • Example:
    • A single ACL for a user-album might have 3 tuples in AccessMap {(admins, all permissions), (everybody, view), (owner, all permissions)}
    • Each user-album needs a separate ACL with a different owner userId.
    • By separating an ACL for the admin and everybody group permissions, we can re-use that ACL for all user-albums.
    • The owner of a user-album would receive additionally another ACL for (owner, all permissions)
Warning: GalleryCoreApi::fetchAccessListId($itemId) is partially broken by this API change. But we can offer a somewhat graceful degradation and we can log such calls. And it's only used in GalleryItem, nowhere else.
  • E.g. have it log the call and return the accessListId most relevant to the user.
  • Remove the function after the next major API change.

Partition User from Group ACL Entries

  • An easy way to optimize for most cases might be to use separate accessListIds for group and user permissions.
  • Another easy algorithm is to search for redundancies between existing ACLs. We can easily identify that 3 of 4 tuples of all user-album related ACLs are redundant and create a new ACL for that case.

Partition By Permission Bits

  • Use separate ACL structures for each permission bit.
  • This can help for very large collections, but without other measures this won't get us very far.

Permission Inheritance In Item Tree

  • Similarely to how user-groups help grouping users with the same permissions together
  • Most data-items have the same permission as their parent-album
  • Most albums have the same permissions as their parent-album
  • Could reduce the cardinality of the Access*Map tables by one or two orders of magnitude
  • Costs in terms of selecting all item ancestors to select all relevant ACLs and to compute the permissions

Distributed ACL Management

Partition the ACL Management by AlbumItem, similar to Permission Inheritance

Store the ACL / permission information locally for each (sub-)album. E.g. done in Microsoft Windows.

Con's:

  • While most queries in G2 are spatially limited to a sub-album, there are still some queries against the whole item tree.


Limit the Search Space

Do not query for large data sets / or in the whole itemId tree. Query very locally, restrict the queries always to a very narrow search space.

  • Limit spatially by itemIds (e.g. itemId IN (7,8,9))
  • Limit by user or group id (e.g. userOrGroupId IN (2,3,4))
  • Limit by permission (e.g. 'core.view')

Example:

  • Instead of using fetchAccessListIds(permission, userId), consider starting with limiting the search space by itemIds / sub-tree first. Or do all in a single, larger query.

Cache

  • Use (persistent) caching as appropriate
  • Prevent from frequent rebuilding of the cache, strive for iterative rebuilding

Summary

Problem

  • In the current design fetchAccessListIds($permission, $userId) is wildly used to get the ACLs before using that list of ACLs in another query to get a list of items.
  • This approach only works fine if the number of ACLs per user is very low (e.g. #ACLs < 20 per user). Else the subsequent queries become rather expensive.
  • The current design is optimized for low numbers of ACLs which is usually the case for single-uploading-user installations.
  • The number of ACLs increases linearly with the number of user-albums and with the diversity of the item-permissions.

Suggested Solution

Minimalize the number of ACLs per user:

  • Remove redundancy in AccessMap by extracing and reusing new ACLs
  • Thereby reducing / minimizing the ACLs per user (fetchAccessListIds($permission, $userId))
  • At the cost of multiple accessListIds per itemId in AccessSubscriberMap

Details:

  • Allow more than one accessListId to be mapped to itemId in AccessSubscriberMap
  • Create an ACL for user-album permissions excluding the (ownerId, permission) tuple
  • Periodically compact the ACL mapping by searching for redundancies between ACLs and extracting new ACLs.
    • Maybe do that at addPermission() time.