Default-Schedule-Worker

Overview

The Default-Schedule-Worker is one available Schedule-Worker

Actions

If the Default-Schedule-Worker is started, the following actions take place:

A) check state of scheduled entities

  1. The default schedule-worker searches for entities with state = SCHEDULED
  2. if checks all according storageentities, whether their state is FETCHED
  3. if all according storageentities are FETCHED, then the state of the entity is set to FETCHED. If there customdata has not entry "downloadurl", it is created.
  4. if one storageentity has state FETCHED_ERROR, the state is reset to NEW, when the number of tried fetches is < number of fetchtries (in customdata of the user or role)
  5. if the state of the storageentity is FATAL_FETCHED_ERROR, the state of the entity is set to FATAL_FETCHED_ERROR

B) handle entities that should be downloaded

  1. The default schedule-worker searches for entities with state = NEW
  2. it there is no datasource, then the state of the entity is set to FETCHED and the scheduler jumps to point C. Otherwise:
  3. it reads the customdata of the entity:
    1. de.jenomics.ikona.binarystore.worker.schedule.StandardScheduleProcessor.fetchcount: number of fetches
    2. de.jenomics.ikona.binarystore.worker.schedule.StandardScheduleProcessor.fetchduedate: the timestamp when the fetch should start
    3. de.jenomics.ikona.binarystore.worker.schedule.StandardScheduleProcessor.copyduedate: the timestamp when the copy should start - if the number of downloads > 1, the first one is fetched, the other ones are copied from the first one.
    4. de.jenomics.ikona.binarystore.worker.schedule.StandardScheduleProcessor.fetchstorage: the storage that should be used for fetching
    5. de.jenomics.ikona.binarystore.worker.schedule.StandardScheduleProcessor.copystorage: the storage that should be used for copying
  4. it scores the available storages in order to find the order for fetching (if there is no fetchstorage and no copystorage)
  5. if creates a storageentity with state NEW and fetchduedate = given fetchduedate if there is one
  6. if fetchcount > 1 it creates more storageentities with customdata "copyfrom" and sets the state of these storageentities to TO_BE_COPIED. It sets fetchduedate = copyduedate if there is one.
  7. if updates the modification-date of the according storage in order to be aware when it was last used for planning in the scoring-function

C) check entities that should be deleted

  1. The default schedule-worker searches for entities with state = TO_BE_DELETED
  2. if retrieves all relations where this entity is source (=parent) and deletes them
  3. if retrieves all storageentities that belong to this entity. If there is no storageentity, it deletes the entity.

D) handle entities to be deleted (no ref)

  1. The default schedule-worker reads the "unreferencethresholdinminutes" property from the customdata of the user. If this field is missing, it tries to read it from the customdata of the role of the user. If this field is also missing in the role, the "minutesToWait" are set to 60.
  2. if searches for entities that have a creationdate older than "minutesToWait" and that are not referenced so that the uuid of the entity is not used as target (=child).
  3. it sets the state of the entity to TO_BE_DELETED
  4. it searches for storageentities that belong to this entity and sets the state of the storageentities to TO_BE_DELETED

E) handle entities to be deleted (expirydate)

  1. The default schedule-worker searches for entities with an expirydate < now that are not in "deleting-mode", i.e. state is in IS_DELETING, IS_DELETING_BINARY, DELETED_BINARY, TO_BE_DELETED or TO_BE_DELETED_BINARY
  2. it sets the state of the entity to TO_BE_DELETED
  3. it searches for storageentities that belong to this entity and sets the state of the storageentities to TO_BE_DELETED

F) handle storageentities to be deleted (expirydate)

  1. The default schedule-worker searches for storageentities with an expirydate < now that are not in "deleting-mode", i.e. state is in IS_DELETING, IS_DELETING_BINARY, DELETED_BINARY, TO_BE_DELETED or TO_BE_DELETED_BINARY
  2. it sets the state of the storageentity to TO_BE_DELETED

G) handle storages to be drained

  1. The default schedule-worker searches for storages that should be drained (tobedrained = true)
  2. if the storage has the customdata-field "drainduedate" and its' timestamp is > now, then the worker jumps to point H, otherwise:
  3. if searches for storageentities that belong to this storage with state FETCHED and no expirydate or expirydate > now
  4. it searches for an available storage using scoring. By using the customdata-field "downloadurl", it tries to check whether the found storage is in the same data center (DC) (= same "downloadurl")
  5. for each storageentity it sets their state to TO_BE_MOVED and creates a now storageentity with state TO_BE_COPIED and customdata-field "copyfrom", where it sets the uuid of the old storage.

H) handle moved storageentities

  1. The default schedule-worker searches for storageentities with state MOVED
  2. it sets the state of the storageentity to TO_BE_DELETED

Scoring

The scoring of the storages is done like this:

  1. find by bytes available: here the customdata-field "storageminpercent" is taken into account, whether there is enough free space or not. Also the storage must not be locked, in draining mode or read only. Order is desc => the storage with the most free available space is the first one
  2. find by last created: the storages are sorted by creationdate desc => the oldest one is first
  3. find by first modified: the storages are sorted by modificationdate => the one that has not been changed for the longest time is first
  4. find by reliablity: the storage are sorted by the customdata-field "reliability" desc => the one with the highest value is first
  5. find by performance: the storage are sorted by the customdata-field "performance" desc => the one with the highest value is first

At step 1) a LinkedHashMap<UUID,Integer> is created and the storages are added with their uuid and score-point = (maxcount - position in list) => the first one gets the highest socre.

During the following steps, the storages get additional score-points. In the end, the list is sorted and returned.