Automatic periodic archiving and purging of your Models

Business applications that generates transactional data need to perform periodic housekeeping to optimise between database space usage and performance. Maymyo allows you to archive, purge and reload (to snapshot tables) your transactional data in your models automatically (using Scheduled Jobs).

However, database tables that go through frequent inserts and deletes will become fragmented over time. You should also perform periodic database housekeeping in conjunction with our archiving and purging.

Defining your model’s archiving and purging criteria

../_images/arc_model.png

Maintain Archive Models

You can archive and purge your models together with its children models (joined by a Foreign Key field of your model). Each Archive Model is identified uniquely by its Model Name, ie its app_label.model_name.

First you have to decide when and which rows to archive or purge by using Dynamic Conditions and Filters. The Archiving Condition and Archiving Filter will decide when and which rows to archive. In the example above, the infra.AuditHeader and its child infra.AuditLog are archived on the First Business Day of the Month. The rows to be archived are found in the predicate returned by the ARCHIVE-AUDIT-LOG Dynamic Filter, which we show below:

def get_predicates(model_instances, params):
    from infra.objects.app_registry import get_posting_date
    from infra.objects.app_calendar import add_months
    # Archive previous month's data (not this month because data is still being updated)
    archive_date = add_months(get_posting_date(), -1)
    return {'changed_on__year': archive_date.year, 'changed_on__month': archive_date.month}

The predicate returned by the Filter above tells us that the AuditHeader rows to be archived are those whose changed_on date falls in the previous month (using the current posting date). This predicate is applied only on the model infra.AuditHeader. All its children AuditLog rows will also be archived and output into the same archive file.

The purging of AuditHeader happens at the End of the Month, ie the last business day of the month. The rows to be purged are changed_on more than 12 months ago (from current posting date):

def get_predicates(model_instances, params):
    from infra.objects.app_registry import get_posting_date
    from infra.objects.app_calendar import add_months, first_day
    from infra.objects.app_datetimes import to_aware_date
    # 12 month window, ie purge anything older
    purge_date = first_day(add_months(get_posting_date(), -12))
    purge_on = to_aware_date(purge_date)
    return {'changed_on__lt': purge_on}

If you have more than one Children Models, then use comma to separate them.

Note

You can choose to purge without archiving or vice-versa. You should ensure that the rows you purge has been archived previously, ie the purging window should be longer.

The Snapshot Table field specifies the database table, using schema.table_name format, to reload previously archived rows to. Archived children data are reloaded to Children Tables. The snapshot tables are clones of the archived models (parent and its children), except that they must have an additional field for the Archive Date field (ie the archiving date, which is the First Business Day of the Month). If you were to write reports against these snapshot tables, make sure your where clause has the archive date.

If you archive a parent and 2 children models together, then you must provide 3 snapshot tables to reload the archive data to, otherwise the reloading will fail.

The django view to reload archive data to snapshot tables are still in our TODO list. However the function to do so is in infra.objects.archive_model.perform_reloading and you may use it in your own Application Commands.

Where are archive files kept?

Archive files are kept in the same directory as Archived Spool files. The date of the archive sub-directories follows the archival date, which is the First Business Day of the Month.

Archiving, Purging and Reloading History

Every time an archiving and purging is performed, an entry is made in the Archiving History (infra.ArchivingHistory) and Purging History (infra.PurgingHistory) respectively.

The former is also used to select archived data to reload to the snapshot tables. A successful reloading will also create a Reloading History (infra.ReloadingHistory) instance.

Currently, these are log tables that are accessible only directly from the database. Creating views for them are in our TODO list.

The Archiving and Purging of Maymyo models

These are Application Jobs provided by our fixtures data. They are ARCHIVE-INFRA-MODELS and PURGE-INFRA-MODELS. We currently schedule the former on the First Business Day and the latter on the Last Business Day of the month respectively.

You can append your models’ archiving and purging to our Jobs or create your own.