Usage and tips

It’s recommended to specify all models and all their fields in dj_anonymizer. This helps to avoid the situation in the future when some changes were introduced to project models (e.g. some fields with sensitive data were added) and it was forgotten to clean or fake them.

If there’s no need to specify all models anonymizer may be used with - - soft_mode argument.

Usage pattern suggestion

For better structuring of dj_anonymizer, it may be used with a pattern described below.

For example if project structure looks like below:

project/
    manage.py
    project/
        ...
    app1/
        ...
    app2/
        ...
    ...
    appN/
        ...
    ...

dj_anonymizer may be placed as a separate Django app. Model definitions may be defined within the same naming structure as it used for applications:

project/
    manage.py
    project/
        ...
    app1/
        ...
    app2/
        ...
    ...
    appN/
        ...
    ...
    anonymizer/
        app1.py
        app2.py
        ...
        appN.py

Definition directory may be changed via ANONYMIZER_MODEL_DEFINITION_DIR.

dj_anonymizer by default also read base.py in the model’s definition directory where may be placed django.contrib.*.models as well as models defined by any 3rd-party libraries.

Model registration

register_anonym(models)

Register models for anonymization

  • models - list of tuples (model, cls_anonym), where model is a model class and cls_anonym - anonymization class, inherited form AnonymBase.

register_clean(models)

Register models which should be cleaned

  • models - list of tuples (model, cls_anonym), where model is a model class and cls_anonym - anonymization class, inherited form AnonymBase with specified queryset for deletion or just AnonymBase.

If AnonymBase class have truncate=True, parameter table will be truncated instead of performing an SQL delete query. Additionally, cascade=True may be set to truncate foreign-key related tables (supported for postgresql and oracle).

register_skip(models)

List of models which dj_anonymizer will skip.

Data anonymization

Anonymization class must be inherited from AnonymBase. Anonymization class contains attributes mapped to model fields. Also anonymization class may contain class Meta where you can specify queryset and excluded fields.

Example:

from datetime import datetime

from django.contrib.auth.models import User

from dj_anonymizer.register_models import (
    AnonymBase,
    register_anonym
)
from dj_anonymizer import fields

from faker import Factory


# using faker lib for generating nice names
fake = Factory.create()


# create anonymizer class
class UserAnonym(AnonymBase):
    last_name = fields.function(fake.last_name)
    first_name = fields.function(fake.first_name)
    email = fields.string(
        "test_email_{seq}@preply.com", seq_callback=datetime.now
    )
    username = fields.string("user_name{seq}")
    is_staff = fields.function(lambda: False)
    password = fields.password(
        "some_test_password", salt='salt', hasher="sha1"
    )

    class Meta:
        # anonymize all users except the first one
        queryset = User.objects.exclude(id=1)
        # list of fields which will not be changed
        exclude_fields = [
            "groups", "user_permissions", "is_active",
            "is_superuser", "last_login", "date_joined"
        ]


register_anonym([
    (User, UserAnonym),
    (SomeClass, SomeClassAnonym)
])
In class Meta you can specify queryset and exclude_fields:
  • queryset - model queryset to which anonymization will be applied. If you don’t specify this attribute, anonymization will be applied to all rows of model (like MyModel.objects.all())

  • exclude_fields - list of model fields which should not be anonymized. If you don’t specify this attribute, the excluded fields will be inferred automatically

dj_anonymizer provides certain helpful field types for anonymization classes:

fields.function(callback, *args, **kwargs)

Result of execution of callback function will be set to the model field. callback function will be called for every record of your model.

  • callback - function which will generate data for the model

  • *args - tuple of args for callback

  • **kwargs - dict of args for callback

fields.password(password, *args, **kwargs)

Gives the possibility to set the same password to all anonymized dump. Args and kwargs are the same as for make_password.

  • password - password in plain-text format

fields.string(field_value, seq_start=0, seq_step=1, seq_callback=None, seq_args=(), seq_kwargs=None, seq_slugify=True)

Generate string for every record of the model.

  • field_value - string which will be set to field. It may contain {seq} parameter which will be replaced by sequence value (e.g. “username_{seq}” will generate username_1, username_2 etc.)

  • seq_start - value of sequence start

  • seq_step - step of sequence

  • seq_callback - function which will generate data for {seq} parameter in string (e.g. (“test_email_{seq}@preply.com”, seq_callback=datetime.now))

  • seq_args - tuple of args for seq_callback

  • seq_kwargs - dict of kwargs for seq_callback

  • seq_slugify - flag, slugify or not result of execution of seq_callback

Clean data

Register your model with register_clean.

Example 1 - delete all data from model User:

from django.contrib.auth.models import User

from dj_anonymizer.register_models import AnonymBase
from dj_anonymizer.register_models import register_clean


register_clean([
    (User, AnonymBase),
])

Example 2 - truncate all data from model User:

from django.contrib.auth.models import User

from dj_anonymizer.register_models import AnonymBase
from dj_anonymizer.register_models import register_clean


register_clean([
    (User, AnonymBase(truncate=True)),
])

Example 3 - truncate all data from model User with cascade option:

from django.contrib.auth.models import User

from dj_anonymizer.register_models import AnonymBase
from dj_anonymizer.register_models import register_clean


register_clean([
    (User, AnonymBase(truncate=True, cascade=True)),
])

Example 4 - delete all data from model User, except user with id=1:

from django.contrib.auth.models import User

from dj_anonymizer.register_models import AnonymBase
from dj_anonymizer.register_models import register_clean


class UserAnonym(AnonymBase):
    class Meta:
        queryset = User.objects.exclude(id=1)


register_clean([
    (User, UserAnonym),
])