faker-file
Create files with fake data. In many formats. With no efforts.
Prerequisites
All of core dependencies of this package are MIT licensed. Most of optional dependencies of this package are MIT licensed, while a few are BSD-, Apache 2-, GPL or HPND licensed.
All licenses are mentioned below between the brackets.
Core package requires Python 3.7, 3.8, 3.9, 3.10 or 3.11.
Faker (MIT) is the only required dependency.
Django (BSD) integration with factory_boy (MIT) has been tested with
Django
starting from version 2.2 to 4.2 (although only maintained versions of Django are currently being tested against).BMP
,GIF
andTIFF
file support requires either just Pillow (HPND), or a combination of WeasyPrint (BSD), pdf2image (MIT), Pillow (HPND) and poppler (GPLv2).DOCX
file support requires python-docx (MIT).ICO
,JPEG
,PNG
,SVG
andWEBP
files support requires either just Pillow (HPND), or a combination of imgkit (MIT) and wkhtmltopdf (LGPLv3).PDF
file support requires either Pillow (HPND), or a combination of pdfkit (MIT) and wkhtmltopdf (LGPLv3), or reportlab (BSD).PPTX
file support requires python-pptx (MIT).ODP
andODT
file support requires odfpy (Apache 2).ODS
file support requires tablib (MIT) and odfpy (Apache 2).PathyFileSystemStorage
storage support requires pathy (Apache 2).AWSS3Storage
storage support requires pathy (Apache 2) and boto3 (Apache 2).AzureCloudStorage
storage support requires pathy (Apache 2) and azure-storage-blob (MIT).GoogleCloudStorage
storage support requires pathy (Apache 2) and google-cloud-storage (Apache 2).SFTPStorage
storage support requires paramiko (LGLPv2.1).AugmentFileFromDirProvider
provider requires either a combination of textaugment (MIT) and nltk (Apache 2) or a combination of nlpaug (MIT), PyTorch (BSD), transformers (Apache 2), numpy (BSD), pandas (BSD), tika (Apache 2) and Apache Tika (Apache 2).
Documentation
Documentation is available on Read the Docs.
For bootstrapping check the Quick start.
For various ready to use code examples see the Recipes.
For tips on
PDF
creation see Creating PDF.For tips on
DOCX
creation see Creating DOCX.For tips on
ODT
creation see Creating ODT.For tips on images creation see Creating images.
For CLI options see the CLI.
Read the Methodology.
For guidelines on contributing check the Contributor guidelines.
Installation
Latest stable version from PyPI
WIth all dependencies
pip install faker-file[all]
Only core
pip install faker-file
With most common dependencies
Everything, except ML libraries which are required for data augmentation only
pip install faker-file[common]
With DOCX support
pip install faker-file[docx]
With EPUB support
pip install faker-file[epub]
With images support
pip install faker-file[images]
With PDF support
pip install faker-file[pdf]
With MP3 support
pip install faker-file[mp3]
With XLSX support
pip install faker-file[xlsx]
With ODS support
pip install faker-file[ods]
With ODT support
pip install faker-file[odt]
With data augmentation support
pip install faker-file[data-augmentation]
With GoogleCloudStorage support
pip install faker-file[gcs]
With AzureCloudStorage support
pip install faker-file[azure]
With AWSS3Storage support
pip install faker-file[s3]
Or development version from GitHub
pip install https://github.com/barseghyanartur/faker-file/archive/main.tar.gz
Features
Supported file types
BIN
BMP
CSV
DOCX
EML
EPUB
ICO
GIF
JPEG
JSON
MP3
ODS
ODT
ODP
PDF
PNG
RTF
PPTX
SVG
TAR
TIFF
TXT
WEBP
XLSX
XML
ZIP
For all image formats (BMP
, ICO
, GIF
, JPEG
, PNG
, SVG
,
TIFF
and WEBP
) and PDF
, there are both graphic-only and
mixed-content file providers (that also have text-to-image capabilities).
Additional providers
AugmentFileFromDirProvider
: Make an augmented copy of randomly picked file from given directory. The following types are supported :DOCX
,EML
,EPUB
,ODT
,PDF
,RTF
andTXT
.AugmentRandomImageFromDirProvider
: Augment a random image file from given directory. The following types are supported :BMP
,GIF
,JPEG
,PNG
,TIFF
andWEBP
.AugmentImageFromPathProvider
: Augment an image file from given path. Supported file types are the same as forAugmentRandomImageFromDirProvider
provider.GenericFileProvider
: Create files in any format from raw bytes or a predefined template.RandomFileFromDirProvider
: Pick a random file from given directory.FileFromPathProvider
: File from given path.
Supported file storages
Native file system storage
AWS S3 storage
Azure Cloud Storage
Google Cloud Storage
SFTP storage
Usage examples
With Faker
Recommended way
from faker import Faker
# Import the file provider we want to use
from faker_file.providers.txt_file import TxtFileProvider
FAKER = Faker() # Initialise Faker instance
FAKER.add_provider(TxtFileProvider) # Register the TXT file provider
file = FAKER.txt_file() # Generate a TXT file
# Meta-data is stored inside a ``data`` attribute (``dict``).
# The following line would produce something like /tmp/tmp/tmphzzb8mot.txt
print(file.data["filename"])
# The following line would produce a text generated by Faker, used as
# the content of the generated file.
print(file.data["content"])
Note
Note, that in this case file
value is a StringValue
instance,
which inherits from str
but contains meta-data such as absolute
path to the generated file, and text used to generate the file, stored
in filename
and content
keys of the data
attribute
respectively. See Meta-data for more information.
If you just need bytes
back (instead of creating the file), provide
the raw=True
argument (works with all provider classes and inner
functions):
from faker import Faker
from faker_file.providers.txt_file import TxtFileProvider
FAKER = Faker()
FAKER.add_provider(TxtFileProvider)
raw = FAKER.txt_file(raw=True)
Note
Note, that in this case file
value is a BytesValue
instance,
which inherits from bytes
but contains meta-data such as absolute
path to the generated file, and text used to generate the file, stored
in filename
and content
keys of the data
attribute
respectively. See Meta-data for more information.
But this works too
from faker import Faker
from faker_file.providers.txt_file import TxtFileProvider
FAKER = Faker()
file = TxtFileProvider(FAKER).txt_file()
If you just need bytes
back:
from faker import Faker
from faker_file.providers.txt_file import TxtFileProvider
FAKER = Faker()
raw = TxtFileProvider(FAKER).txt_file(raw=True)
With factory_boy
upload/models.py
from django.db import models
class Upload(models.Model):
# ...
file = models.FileField()
upload/factories.py
Note, that when using faker-file
with Django
and native file system
storages, you need to pass your MEDIA_ROOT
setting as root_path
value
to the chosen file storage as show below.
import factory
from django.conf import settings
from factory import Faker
from factory.django import DjangoModelFactory
from faker_file.providers.docx_file import DocxFileProvider
from faker_file.storages.filesystem import FileSystemStorage
from upload.models import Upload
FS_STORAGE = FileSystemStorage(
root_path=settings.MEDIA_ROOT,
rel_path="tmp"
)
factory.Faker.add_provider(DocxFileProvider)
class UploadFactory(DjangoModelFactory):
# ...
file = Faker("docx_file", storage=FS_STORAGE)
class Meta:
model = Upload
Meta-data
The return value of any file provider file generator function is either
StringValue
or BytesValue
, which inherit from str
and bytes
respectively.
Both StringValue
and BytesValue
instances have a meta data attribute
named data
(type dict
). Various file providers use data
to
store meta-data, such as filename
(absolute path to the generated file;
valid for all file providers), or content
(text used when generating the
file; valid for most file providers, except FileFromPathProvider
,
RandomFileFromDirProvider
, TarFileProvider
and ZipFileProvider
).
All file providers store an absolute path to the generated file in filename
key of the data
attribute and instance of the storage used in storage
key. See the table below.
Key name |
File provider |
---|---|
filename |
all |
storage |
all |
content |
all except FileFromPathProvider, RandomFileFromDirProvider, TarFileProvider, ZipFileProvider and all graphic file providers such as GraphicBmpFileProvider, GraphicGifFileProvider, GraphicIcoFileProvider, GraphicJpegFileProvider, GraphicPdfFileProvider, GraphicPngFileProvider, GraphicTiffFileProvider and GraphicWebpFileProvider |
inner |
only EmlFileProvider, TarFileProvider and ZipFileProvider |
File storages
All file operations are delegated to a separate abstraction layer of storages.
The following storages are implemented:
FileSystemStorage
: Does not have additional requirements.PathyFileSystemStorage
: Requires pathy.AzureCloudStorage
: Requires pathy and Azure related dependencies.GoogleCloudStorage
: Requires pathy and Google Cloud related dependencies.AWSS3Storage
: Requires pathy and AWS S3 related dependencies.SFTPStorage
: Requires paramiko and related dependencies.
Usage example with storages
FileSystemStorage example
Native file system storage. Does not have dependencies.
root_path
: Path to the root directory. Given the example of Django, this would be the path to theMEDIA_ROOT
directory. It’s important to know, thatroot_path
will not be embedded into the string representation of the file. Onlyrel_path
will.rel_path
: Relative path from the root directory. Given the example of Django, this would be the rest of the path to the file.
import tempfile
from faker import Faker
from faker_file.providers.txt_file import TxtFileProvider
from faker_file.storages.filesystem import FileSystemStorage
FS_STORAGE = FileSystemStorage(
root_path=tempfile.gettempdir(), # Use settings.MEDIA_ROOT for Django
rel_path="tmp",
)
FAKER = Faker()
file = TxtFileProvider(FAKER).txt_file(storage=FS_STORAGE)
FS_STORAGE.exists(file)
PathyFileSystemStorage example
Native file system storage. Requires pathy
.
import tempfile
from pathy import use_fs
from faker import Faker
from faker_file.providers.txt_file import TxtFileProvider
from faker_file.storages.cloud import PathyFileSystemStorage
use_fs(tempfile.gettempdir())
PATHY_FS_STORAGE = PathyFileSystemStorage(
bucket_name="bucket_name",
root_path="tmp",
rel_path="sub-tmp",
)
FAKER = Faker()
file = TxtFileProvider(FAKER).txt_file(storage=PATHY_FS_STORAGE)
PATHY_FS_STORAGE.exists(file)
AWSS3Storage example
AWS S3 storage. Requires pathy
and boto3
.
from faker import Faker
from faker_file.providers.txt_file import TxtFileProvider
from faker_file.storages.aws_s3 import AWSS3Storage
S3_STORAGE = AWSS3Storage(
bucket_name="bucket_name",
root_path="tmp", # Optional
rel_path="sub-tmp", # Optional
# Credentials are optional too. If your AWS credentials are properly
# set in the ~/.aws/credentials, you don't need to send them
# explicitly.
credentials={
"key_id": "YOUR KEY ID",
"key_secret": "YOUR KEY SECRET"
},
)
FAKER = Faker()
file = TxtFileProvider(FAKER).txt_file(storage=S3_STORAGE)
S3_STORAGE.exists(file)
Testing
Simply type:
pytest -vrx
Or use tox:
tox
Or use tox to check specific env:
tox -e py310-django41
Writing documentation
Keep the following hierarchy.
=====
title
=====
header
======
sub-header
----------
sub-sub-header
~~~~~~~~~~~~~~
sub-sub-sub-header
^^^^^^^^^^^^^^^^^^
sub-sub-sub-sub-header
++++++++++++++++++++++
sub-sub-sub-sub-sub-header
**************************
License
MIT
Support
For security issues contact me at the e-mail given in the Author section.
For overall issues, go to GitHub.
Citation
Please, use the following entry when citing faker-file in your research:
@software{faker-file,
author = {Artur Barseghyan},
title = {faker-file: Create files with fake data. In many formats. With no efforts.},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {https://github.com/barseghyanartur/faker-file},
}
Project documentation
Contents:
- faker-file
- Quick start
- Recipes
- When using with
Faker
- Imports and initializations
- Create a TXT file with static content
- Create a DOCX file with dynamically generated content
- Create a ZIP file consisting of TXT files with static content
- Create a ZIP file consisting of 3 DOCX files with dynamically generated content
- Create a ZIP file of 9 DOCX files with content generated from template
- Create a nested ZIP file
- Create a ZIP file with variety of different file types within
- Another way to create a ZIP file with variety of different file types within
- Create an EML file consisting of TXT files with static content
- Create a EML file consisting of 3 DOCX files with dynamically generated content
- Create a nested EML file
- Create an EML file with variety of different file types within
- Create a PDF file with predefined template containing dynamic fixtures
- Create a DOCX file with table and image using
DynamicTemplate
- Create a ODT file with table and image using
DynamicTemplate
- Create a PDF using reportlab generator
- Create a PDF using pdfkit generator
- Create a graphic PDF file using Pillow
- Graphic providers
- Create a MP3 file
- Create a MP3 file by explicitly specifying MP3 generator class
- Create a MP3 file with custom MP3 generator
- Pick a random file from a directory given
- File from path given
- Generate a file of a certain size
- Generate a files using multiprocessing
- Generating files from existing documents using NLP augmentation
- nlpaug augmenter
- textaugment augmenter
- Using raw=True features in tests
- Create a HTML file from predefined template
- Working with storages
- When using with
Django
(andfactory_boy
)
- When using with
- Creating images
- Creating PDF
- Creating DOCX
- Creating ODT
- CLI
- Methodology
- Testing files like a pro
- Introduction
- Why/motivation
- Intermezzo
- How does faker-file help to solve that problem?
- Without faker-file
- Recap/conclusion
- Security Policy
- Contributor Covenant Code of Conduct
- Contributor guidelines
- Release history and notes
- 0.17.11
- 0.17.10
- 0.17.9
- 0.17.8
- 0.17.7
- 0.17.6
- 0.17.5
- 0.17.4
- 0.17.3
- 0.17.2
- 0.17.1
- 0.17
- 0.16.4
- 0.16.3
- 0.16.2
- 0.16.1
- 0.16
- 0.15.5
- 0.15.4
- 0.15.3
- 0.15.2
- 0.15.1
- 0.15
- 0.14.5
- 0.14.4
- 0.14.3
- 0.14.2
- 0.14.1
- 0.14
- 0.13
- 0.12.6
- 0.12.5
- 0.12.4
- 0.12.3
- 0.12.2
- 0.12.1
- 0.12
- 0.11.5
- 0.11.4
- 0.11.3
- 0.11.2
- 0.11.1
- 0.11
- 0.10.12
- 0.10.11
- 0.10.10
- 0.10.9
- 0.10.8
- 0.10.7
- 0.10.6
- 0.10.5
- 0.10.4
- 0.10.3
- 0.10.2
- 0.10.1
- 0.10
- 0.9.3
- 0.9.2
- 0.9.1
- 0.9
- 0.8
- 0.7
- 0.6
- 0.5
- 0.4
- 0.3
- 0.2
- 0.1
- Package
- faker_file package
- Subpackages
- faker_file.cli package
- faker_file.contrib package
- faker_file.providers package
- Subpackages
- faker_file.providers.augment_file_from_dir package
- faker_file.providers.base package
- faker_file.providers.helpers package
- Submodules
- faker_file.providers.helpers.inner module
create_inner_augment_image_from_path()
create_inner_augment_random_image_from_dir()
create_inner_bin_file()
create_inner_csv_file()
create_inner_docx_file()
create_inner_eml_file()
create_inner_epub_file()
create_inner_file_from_path()
create_inner_generic_file()
create_inner_graphic_ico_file()
create_inner_graphic_jpeg_file()
create_inner_graphic_pdf_file()
create_inner_graphic_png_file()
create_inner_graphic_webp_file()
create_inner_ico_file()
create_inner_jpeg_file()
create_inner_json_file()
create_inner_mp3_file()
create_inner_odp_file()
create_inner_ods_file()
create_inner_odt_file()
create_inner_pdf_file()
create_inner_png_file()
create_inner_pptx_file()
create_inner_random_file_from_dir()
create_inner_rtf_file()
create_inner_svg_file()
create_inner_tar_file()
create_inner_txt_file()
create_inner_webp_file()
create_inner_xlsx_file()
create_inner_xml_file()
create_inner_zip_file()
fuzzy_choice_create_inner_file()
list_create_inner_file()
- Module contents
- faker_file.providers.image package
- Submodules
- faker_file.providers.image.augment module
- faker_file.providers.image.imgkit_generator module
- faker_file.providers.image.pil_generator module
PilImageGenerator
PilImageGenerator.combine_images_vertically()
PilImageGenerator.create_image_instance()
PilImageGenerator.encoding
PilImageGenerator.find_max_fit_for_multi_line_text()
PilImageGenerator.find_max_fit_for_single_line_text()
PilImageGenerator.font
PilImageGenerator.font_size
PilImageGenerator.generate()
PilImageGenerator.handle_kwargs()
PilImageGenerator.line_height
PilImageGenerator.page_height
PilImageGenerator.page_width
PilImageGenerator.save_and_start_new_page()
PilImageGenerator.spacing
PilImageGenerator.start_new_page()
- faker_file.providers.image.weasyprint_generator module
WeasyPrintImageGenerator
WeasyPrintImageGenerator.create_image_instance()
WeasyPrintImageGenerator.encoding
WeasyPrintImageGenerator.generate()
WeasyPrintImageGenerator.handle_kwargs()
WeasyPrintImageGenerator.page_height
WeasyPrintImageGenerator.page_width
WeasyPrintImageGenerator.wrap()
WeasyPrintImageGenerator.wrapper_tag
- Module contents
- faker_file.providers.mixins package
- faker_file.providers.mp3_file package
- faker_file.providers.pdf_file package
- Submodules
- faker_file.providers.augment_image_from_path module
- faker_file.providers.augment_random_image_from_dir module
- faker_file.providers.bin_file module
- faker_file.providers.bmp_file module
- faker_file.providers.csv_file module
- faker_file.providers.docx_file module
- faker_file.providers.eml_file module
- faker_file.providers.epub_file module
- faker_file.providers.file_from_path module
- faker_file.providers.generic_file module
- faker_file.providers.gif_file module
- faker_file.providers.ico_file module
- faker_file.providers.jpeg_file module
- faker_file.providers.json_file module
- faker_file.providers.odp_file module
- faker_file.providers.ods_file module
- faker_file.providers.odt_file module
- faker_file.providers.png_file module
- faker_file.providers.pptx_file module
- faker_file.providers.random_file_from_dir module
- faker_file.providers.rtf_file module
- faker_file.providers.svg_file module
- faker_file.providers.tar_file module
- faker_file.providers.tiff_file module
- faker_file.providers.txt_file module
- faker_file.providers.webp_file module
- faker_file.providers.xlsx_file module
- faker_file.providers.xml_file module
- faker_file.providers.zip_file module
- Module contents
- Subpackages
- faker_file.storages package
- Submodules
- faker_file.storages.aws_s3 module
- faker_file.storages.azure_cloud_storage module
- faker_file.storages.base module
- faker_file.storages.cloud module
- faker_file.storages.filesystem module
- faker_file.storages.google_cloud_storage module
- faker_file.storages.sftp_storage module
- Module contents
- faker_file.tests package
- Submodules
- faker_file.tests.data module
- faker_file.tests.sftp_server module
- faker_file.tests.test_augment module
- faker_file.tests.test_augment_file_from_dir_provider module
- faker_file.tests.test_base module
- faker_file.tests.test_cli module
- faker_file.tests.test_data_integrity module
- faker_file.tests.test_django_integration module
- faker_file.tests.test_helpers module
- faker_file.tests.test_providers module
- faker_file.tests.test_registry module
- faker_file.tests.test_sftp_server module
- faker_file.tests.test_sftp_storage module
TestSFTPStorageTestCase
TestSFTPStorageTestCase.free_port()
TestSFTPStorageTestCase.is_port_in_use()
TestSFTPStorageTestCase.max_port_retry_limit
TestSFTPStorageTestCase.server_manager
TestSFTPStorageTestCase.server_thread
TestSFTPStorageTestCase.setUpClass()
TestSFTPStorageTestCase.sftp_host
TestSFTPStorageTestCase.sftp_pass
TestSFTPStorageTestCase.sftp_port
TestSFTPStorageTestCase.sftp_root_path
TestSFTPStorageTestCase.sftp_user
TestSFTPStorageTestCase.tearDown()
TestSFTPStorageTestCase.test_file_system_storage_abspath()
TestSFTPStorageTestCase.test_integration()
TestSFTPStorageTestCase.test_integration_sub_dir()
TestSFTPStorageTestCase.test_storage
TestSFTPStorageTestCase.test_storage_exists_exceptions()
TestSFTPStorageTestCase.test_storage_generate_filename_exceptions
TestSFTPStorageTestCase.test_storage_initialization_exceptions
TestSFTPStorageTestCase.test_storage_write_bytes_exceptions()
TestSFTPStorageTestCase.test_storage_write_text_exceptions()
- faker_file.tests.test_sqlalchemy_integration module
- faker_file.tests.test_storages module
TestStoragesTestCase
TestStoragesTestCase.setUp()
TestStoragesTestCase.tearDown()
TestStoragesTestCase.test_base_storage_exceptions
TestStoragesTestCase.test_cloud_storage_exceptions
TestStoragesTestCase.test_file_system_storage_abspath()
TestStoragesTestCase.test_pathy_file_system_storage_abspath()
TestStoragesTestCase.test_pathy_file_system_storage_unlink()
TestStoragesTestCase.test_storage
TestStoragesTestCase.test_storage_generate_filename_exceptions
TestStoragesTestCase.test_storage_initialization_exceptions
- faker_file.tests.texts module
- faker_file.tests.utils module
- Module contents
- Submodules
- faker_file.base module
- faker_file.constants module
- faker_file.helpers module
- faker_file.registry module
- Module contents
- Subpackages
- faker_file package
- Indices and tables