Generate files with fake data

All of core dependencies of this package are MIT licensed. Most of optional dependencies of this package are MIT licensed, while a few are BSD-, Apache 2- or GPLv3 licensed. All licenses are mentioned below between the brackets.

  • Core package requires Python 3.7, 3.8, 3.9, 3.10 or 3.11.

  • Faker (MIT) is the only required dependency.

  • Django (BSD) integration with factory_boy (MIT) has been tested with Django 2.2, 3.0, 3.1, 3.2, 4.0 and 4.1.

  • DOCX file support requires python-docx (MIT).

  • EPUB file support requires xml2epub (MIT) and jinja2 (BSD).

  • ICO, JPEG, PNG, SVG and WEBP files support requires imgkit (MIT).

  • MP3 file support requires gtts (MIT) or edge-tts (GPLv3).

  • PDF file support requires pdfkit (MIT).

  • PPTX file support requires python-pptx (MIT).

  • ODP file support requires odfpy (Apache 2).

  • ODS file support requires tablib (MIT) and odfpy (Apache 2).

  • ODT file support requires odfpy (Apache 2).

  • XLSX file support requires tablib (MIT) and openpyxl (MIT).

  • PathyFileSystemStorage storage support requires pathy (Apache 2).

  • AWSS3Storage storage support requires pathy (Apache 2) and boto3 (Apache 2).

  • AzureCloudStorage storage support requires pathy (Apache 2) and azure-storage-blob (MIT).

  • GoogleCloudStorage storage support requires pathy (Apache 2) and google-cloud-storage (Apache 2).

  • AugmentFileFromDirProvider provider requires nlpaug (MIT), torch (BSD), transformers (Apache 2), numpy (BSD), pandas (BSD) and tika (Apache 2).



Latest stable version from PyPI

WIth all dependencies

pip install faker-file[all]

Only core

pip install faker-file

With most common dependencies

Everything, except ML libraries which are required for data augmentation only

pip install faker-file[common]

With DOCX support

pip install faker-file[docx]

With EPUB support

pip install faker-file[epub]

With images support

pip install faker-file[images]

With MP3 support

pip install faker-file[mp3]

With XLSX support

pip install faker-file[xlsx]

With ODS support

pip install faker-file[ods]

With ODT support

pip install faker-file[odt]

With data augmentation support

pip install faker-file[data-augmentation]

Or development version from GitHub

pip install https://github.com/barseghyanartur/faker-file/archive/main.tar.gz


Supported file types

  • BIN

  • CSV

  • DOCX

  • EML

  • EPUB

  • ICO

  • JPEG

  • MP3

  • ODS

  • ODT

  • ODP

  • PDF

  • PNG

  • RTF

  • PPTX

  • SVG

  • TAR

  • TXT

  • WEBP

  • XLSX

  • ZIP

Additional providers

  • AugmentFileFromDirProvider: Make an augmented copy of randomly picked file from given directory. The following types are supported : DOCX, EML, EPUB, ODT, PDF, RTF and TXT.

  • RandomFileFromDirProvider: Pick a random file from given directory.

Supported file storages

  • Native file system storage

  • AWS S3 storage

  • Azure Cloud Storage

  • Google Cloud Storage

Usage examples

With Faker

One way

from faker import Faker
from faker_file.providers.txt_file import TxtFileProvider

FAKER = Faker()

file = TxtFileProvider(FAKER).txt_file()

Or another

from faker import Faker
from faker_file.providers.txt_file import TxtFileProvider

FAKER = Faker()

file = FAKER.txt_file()

With factory_boy


from django.db import models

class Upload(models.Model):

    # ...
    file = models.FileField()


Note, that when using faker-file with Django and native file system storages, you need to pass your MEDIA_ROOT setting as root_path value to the chosen file storage as show below.

import factory
from django.conf import settings
from factory import Faker
from factory.django import DjangoModelFactory
from faker_file.providers.docx_file import DocxFileProvider
from faker_file.storages.filesystem import FileSystemStorage

from upload.models import Upload

FS_STORAGE = FileSystemStorage(

class UploadFactory(DjangoModelFactory):

    # ...
    file = Faker("docx_file", storage=FS_STORAGE)

    class Meta:
        model = Upload

File storages

All file operations are delegated to a separate abstraction layer of storages.

The following storages are implemented:

  • FileSystemStorage: Does not have additional requirements.

  • PathyFileSystemStorage: Requires pathy.

  • AzureCloudStorage: Requires pathy and Azure related dependencies.

  • GoogleCloudStorage: Requires pathy and Google Cloud related dependencies.

  • AWSS3Storage: Requires pathy and AWS S3 related dependencies.

Usage example with storages

FileSystemStorage example

Native file system storage. Does not have dependencies.

import tempfile
from faker import Faker
from faker_file.providers.txt_file import TxtFileProvider
from faker_file.storages.filesystem import FileSystemStorage

FS_STORAGE = FileSystemStorage(
    root_path=tempfile.gettempdir(),  # Use settings.MEDIA_ROOT for Django

FAKER = Faker()

file = TxtFileProvider(FAKER).txt_file(storage=FS_STORAGE)


PathyFileSystemStorage example

Native file system storage. Requires pathy.

import tempfile
from pathy import use_fs
from faker import Faker
from faker_file.providers.txt_file import TxtFileProvider
from faker_file.storages.cloud import PathyFileSystemStorage

PATHY_FS_STORAGE = PathyFileSystemStorage(

FAKER = Faker()

file = TxtFileProvider(FAKER).txt_file(storage=PATHY_FS_STORAGE)


AWSS3Storage example

AWS S3 storage. Requires pathy and boto3.

from faker import Faker
from faker_file.providers.txt_file import TxtFileProvider
from faker_file.storages.aws_s3 import AWSS3Storage

S3_STORAGE = AWSS3Storage(
    root_path="tmp",  # Optional
    rel_path="sub-tmp",  # Optional
    # Credentials are optional too. If your AWS credentials are properly
    # set in the ~/.aws/credentials, you don't need to send them
    # explicitly.
        "key_id": "YOUR KEY ID",
        "key_secret": "YOUR KEY SECRET"

FAKER = Faker()

file = TxtFileProvider(FAKER).txt_file(storage=S3_STORAGE)



Simply type:

pytest -vrx

Or use tox:


Or use tox to check specific env:

tox -e py310-django41

Writing documentation

For security issues contact me at the e-mail given in the Author section.

Artur Barseghyan <artur.barseghyan@gmail.com>

Project documentation