Gallery2:Design Documents:Progress Bar for Item Add - Gallery Codex
Personal tools

Gallery2:Design Documents:Progress Bar for Item Add

From Gallery Codex

Goals

  • Don't need a progress bar when adding a single image. What's the threshold? For simplicity, all add-item requests use a progressbar.
  • Improve the reliability, usability and robustness of long running item-add processes.
  • Connection Timeout - Periodic HTML output to keep the connection alive
  • Server timeout - Extending the PHP timeouts in all sub-processes (ItemAddPlugin, archive extraction, limiting the batch size for the ItemAddOption calls)
  • Memory limits - Only loading a limited number of items into memory at once.
  • User feedback - Give the user feedback for long running add item requests (progress-bar)
  • KISS - Relatively simple design
  • Backwards-compatible - No API change required. Batch ItemAddPlugins need to use the newly added handleRequest() param though.


  • Non-Goal: Don't offer a progress-bar for the upload process (item add from browser). The progress-bar is for processing the request, e.g. for "add items from local server."

Design

ItemAdd / ItemAddPlugin / ItemAddOption

  • Do the whole request handling in a progress-bar callback.
  • Following a simpler design than for ItemEdit. Rather than dynamically deciding whether a progress-bar is required, always delegate to the progress-bar view.
  • Robustness / Resource Limits
    • Limit the memory usage (split the processing into chunks).
    • Regular progress-bar updates for usability and to keep the connection alive.
    • Regular set_time_limit() calls to keep PHP alive.
    • Regular storage checkpoints to keep the filesystem in sync' with the database in case an error happens later in the request handling.
  • ItemAddPlugin instances can and should use all the methods to improve robustness
  • ItemAddOption instances don't need to be adapted for this change. They're always called with a maximal number of items which should easily be handled within 30 seconds.
  • Archive extraction and invoking of ItemAddOption instances is now both called post-processing
  • For mass-adding items, ItemAddController->postProcessItems() should be used in ItemAddPlugin instances to ensure that post-processing happens shortly after an item is added to minimize the risk that an item is added without any post-processing (in case an error happens later in the process).

From Local Server

  • Split the request handling into 2 steps
    • Build an in-memory list of file-names (should be < 2 MB even for 50k files), using a progress bar -> total file-count
    • Start adding items with the current algorithm, but now using a progress-bar and using the file-list that we generated in the above step
  • Checkpoint after each item (atomic transactions per item/file)
  • Local server / from web page need a progress bar. Other item-add methods might profit from it as well.
  • Need to count the number of files first to determine what 100% is. For "local server", this means that we need recurse into all subdirectories twice. First to count, then to process. Or we can keep the list in memory.

From Web Page

  • File transfers can take a long time, thus there's an elevated likelihood of a timeout / error after a file has been added.
  • Checkpoint after each item (atomic transactions per item/file).

Memory Usage

  • The status data (filename, id) is probably < 50 bytes per item. The status for 50'000 items takes maybe 2.5 MB in memory. That's fine.
  • Each GalleryItem object takes maybe 1 kbyte in memory. For 50'000 items that would be 50 MB in memory, too much.

-> The new code shouldn't build a list of all added items as objects). Since entities are cached in memory (GDC has not an unlimited size), that shouldn't affect performance too dramatically. -> $status can be kept in memory. We load it as a whole in ItemAddView to list the added items anyway.

Timeouts

  • Before calling each ItemAddOption, there should be a guaranteeTimeLimit(30) call.
  • In itemAddOptions / ItemAddPlugin, use the progress bar, checkPoint and and guaranteeTimeLimit as needed.

Item-Add Process

Overview

Overall, each add-item process goes through these steps:

1. ItemAddController loads the specified ItemAddPlugins and delegates to it with $itemAddPluginInstance->handleRequest($form, $item);
2. ItemAddPlugin checks the request parameters and adds the specified items
3. ItemAddController extracts archive items (e.g. .zip archives get replaced with an item for each file in the archive)
4. ItemAddController calls each ItemAddOption instance to their post-processing with $itemAddOptionInstance->handleRequestAfterAdd($form, $items)
4.1 .. 4.n Each ItemAddOption instance handles the request


Progress Bar for Batch Operations

The request needs to be handled in a progress bar view when it is a potentially long-running (more than 15-30 seconds).

When is a progress bar required?

  • ItemAddPlugin - When adding more than a dozen items, the ItemAddPlugin should return requiresProgressbar = true
  • Extract - If there is one or more items that can be extracted, it's very likely that one of them would be replaced by many items. Thus, delegate to the progress bar if there is one or more items that will get extracted.
  • ItemAddOption - If the number of added items multiplied by all ItemAddOption instances is larger than ~40, delegate to the progress bar.
Rule of thumb: When adding / processing more than 10 items, a progress bar is appropriate to ensure that the overall process doesn't time out on very slow systems.

Add Process without Progress Bar

1. ItemAddController loads the specified ItemAddPlugins and delegates to it with $itemAddPluginInstance->handleRequest($form, $item);
2.1 ItemAddPlugin.handleRequest() checks the request parameters to evaluate whether a progress bar is required. None is required.
2.2 ItemAddPlugin.handleRequest() adds the items, e.g. by using GalleryCoreApi::addItemToAlbum(...);
2.3 ItemAddPlugin.handleRequest() returns a list of all added items in the $status array
3. ItemAddController.postProcess() goes through the list of added items to check if any item can be extracted. None of them do.
4. ItemAddController.postProcess() loads all ItemAddOption instances and calls $optionInstance->handleRequestAfterAdd($form, $items); for each of them.
5. ItemAddController completes the process by redirecting to the ItemAddView to show the results

Batch Add Process with Progress Bar

1. ItemAddController loads the specified ItemAddPlugins and delegates to it with $itemAddPluginInstance->handleRequest($form, $item);
2.1 ItemAddPlugin.handleRequest() checks the request parameters to evaluate whether a progress bar is required. It needs to add a lot of items, thus a progress bar is required.
2.2 ItemAddPlugin.handleRequest() registers a trailer callback function to add the items in the progress bar view.
2.3 ItemAddPlugin.handleRequest() returns without adding any items, specifying that a progress bar is required.
3. ItemAddController delegates to the progress bar view
4. ProgressBarView calls $itemAddPluginInstance->callback(...)
5. ItemAddPlugin.callback() starts adding items based on the request data
6. ItemAddPlugin.callback() calls $itemAddController.postProcess() after every 50 items
7.1 ItemAddController.postProcess() loads all ItemAddOption instances
7.2 ItemAddController.postProcess() extracts items that can be extracted and calls all ItemAddOption instances with batches of max. 20 items
8 ItemAddController.callback() after many iterations over step 6 and 7, the process is finally finished

Notes:

  • ItemAddPlugin periodically updates the progress bar.
  • ItemAddController.postProcess() periodically updates the progress bar and calls the specific ItemAddOption instances with small enough batches to ensure that they don't need to manage the progress bar themselves.

Add Process with Progress Bar for ItemAddOption Only

1. ItemAddController loads the specified ItemAddPlugins and delegates to it with $itemAddPluginInstance->handleRequest($form, $item);
2.1 ItemAddPlugin.handleRequest() checks the request parameters to evaluate whether a progress bar is required. None is needed.
2.2 ItemAddPlugin.handleRequest() adds the items, e.g. by using GalleryCoreApi::addItemToAlbum(...);
2.3 ItemAddPlugin.handleRequest() returns a list of all added items in the $status array
3. ItemAddController.postProcess() detects that the product of the added items and ItemAddOption instances is too large to handle the request without progress bar. It registers itself as a trailer callback function and delegates to the progress bar view.
4. ProgressBarView calls $itemAddController->postProcess(...)
5.1 ItemAddController.postProcess() loads all ItemAddOption instances
7.2 ItemAddController.postProcess() extracts items that can be extracted and calls all ItemAddOption instances with batches of max. 20 items
8 ItemAddController.callback() after many iterations over step 6 and 7, the process is finally finished

Discussion

Alternative Design - Progress Bar for all Add Processes

The above described design has a few disadvantages:

  • It is complex - there are 3 different cases and one has to check whether a progress bar is required in multiple stages of the request handling.
  • In the end, a progress bar is required unless only 1-10 items are added at a time.
  • Auto-redirect for progress-bar - We could also extend the progress-bar interface allowing the API to force an automated redirect. We could make use of that to jump to the ItemAddView result page for rather short running add processes.

The main disadvantage of this design is that adding only 1-10 items is the majority case for the default plugin ItemAddFromBrowser. A progress bar slightly complicates the control flow for the user since the user has to click on the continue link when the progress bar view reaches 100%. But with an auto-redirect progress-bar, that shouldn't matter that much.

Therefore we explore a simpler design alternative here where we always use a progress bar:

Concrete Implementation

1. ItemAddController.handleRequest() registers a trailer callback 'handleRequestWithProgressBar' and delegates to the progress bar view
2. ProgressBarView calls $itemAddController->handleRequestWithProgressBar(...)
3. ItemAddController.handleRequestWithProgressBar() delegates the request to the ItemAddPlugin with $itemAddPlugin->handleRequest($form, $item, $this)
4. ItemAddPlugin.handleRequest() starts adding items based on the request data
5. ItemAddPlugin.handleRequest() calls $itemAddController->postprocessItems($status) after every 50 items
6. ItemAddController.postProcess() extracts items that can be extracted and calls all ItemAddOption instances with batches of max 50 items
iterate over 5 / 6 in a loop while adding items
7. ItemAddController.handleRequestWithProgressBar() does a final postProcessItems() call (for backwards compatibility)

Notes:

  • On request param error ($error, not $ret), we redirect to the ItemAdd view showing the errors. We can't delegate since we're already in the progress-bar view.
  • Alternatively, we could separate ItemAddPlugin|Option->handleRequest() into 2 methods: One to check the request parameters for $errors, and the other to do the processing. We could then call the input validation method before delegating to the progress-bar view. I opted to conform with the rest of the framework and keeping a single handleRequest() method.
  • I do an auto-redirect to the success page if the elapsed time is less than 15 seconds. That way users don't have to click the continue link when adding just a few files.
  • The progress-bar is updated regularely in most involved methods.
    • In the ItemAddPlugin
    • When adding items from extracted archives
    • Before calling each ItemAddOption (ItemAddOption are expected to handle 50 items without the need for progress-bar updates)
  • The progress-bar starts from 0% multiple times in the request. It's hard to compute what a 100% are when the #items can increase anytime (extracting archives) and when switching from different sub-processes (adding items, extracting items, post-processing items). I went with the simple approach of just starting from 0 in each sub-process and to use the progress-bar not as an exact indicator, but to keep the user in the know that we're still processing his request successfully.
  • No changes to ItemAddOption implementations necessary
  • It's backwards-compatible to old itemAddPlugin implementations (only a PHP 5 strict notice for the additional / unused 3rd param to ItemAddPlugin::handleRequest())

Postprocessing Periodically vs. After the ItemAdd Stage

Postprocessing is about extracting archive items and calling ItemAddOption instances.

There are two design alternatives when it comes to postprocessing.

A) Call postProcess() during the item-add process after every 50 items that have been added.
B) Call postProcess() after the item-add process (after the ItemAddPlugin).

The advantage of A) is that items reach their final state much sooner. E.g. assume a long-running process of adding thousands of items. The first 10 images could be viewed already while it is still adding new items. Better apply the postprocessing soon after adding the items rather than waiting until all items have been added.

Also, if going with B), options like the DiskQuotaLimit would have to wait until the actual physical disk quota is exceeded (errors) before it can start deleting items once the item add process has been completed.

And the most compelling argument for A) - In case of an error, A) is much better. Assume we do regular transaction checkpoints. If we add 500 items and then have an error, we'd end up with 500 items without postprocessing (no watermarks, not removed (quota), etc.). If A) is used, only the last sub-batch is affected of incomplete postprocessing.

The only advantage of B) is its simplicity (less convoluted code).

Thus A) is highly recommended.

Resource Limits

The two main resource limits are:

  • Timeouts
  • Memory Usage

The timeout constraint means that we need to use the progress bar view and do the necessary actions that go in hand with long-running tasks (periodically do checkpoints, extend the time limit, update the progress bar).

The memory limit means that we can't keep all items (objects) in memory. E.g. when postprocessing, we have to pay attention not to load all items into memory but instead load up to xx items and then let each ItemAddOption instance handle that batch. Then load the next batch, etc.


References