byfile Mode

In byfile mode, values are generated using the factories defined in a factory definition. The format of the command is as follows:

randog byfile FACTORY_PATH [...] [--regenerate PROB_REGEN] [--discard PROB_DISCARD] [--csv ROW_NUM] [--error-on-factory-stopped] [common-options]

The argument FACTORY_PATH is a filename of the factory definition. It must be python code that creates an instance of factory in the variable FACTORY as in the following example:

import uuid

FACTORY = randog.factory.from_example({
    "uuid": uuid.uuid4,
    "name": "",
    "age": 20,
})

# (optional) Settings used for CSV output
CSV_COLUMNS = ["uuid", "name", "age"]

Note

In factory definition file, import randog can be omitted.

Arguments and Options

  • FACTORY_PATH [...]:

  • --regenerate PROB_REGEN (default=0.0):

    • the probability that the factory generation value is not returned as is, but is regenerated. It affects cases where the original factory returns a value that is not completely random.

  • --discard PROB_DISCARD (default=0.0):

    • the probability that the factory generation value is not returned as is, but is discarded. If discarded, the number of times the value is generated is less than --repeat/-r or --list/-L or --csv.

  • --csv ROW_NUM (optional):

    • if specified, it outputs generated ROW_NUM objects as CSV. When using this option, it is recommended to use a factory that generates dictionaries and to define CSV_COLUMNS in the definition file to specify the fields of the CSV.

  • --error-on-factory-stopped (optional):

    • If specified, error is occurred in case the factory cannot generate value due to StopIteration. If not specified, the generation simply stops in the case.

  • common-options

Examples

The simplest example is the following:

randog byfile factory_def.py

If the definition file defines a factory that generates a dict equivalent to one record in the database, you can obtain data for testing by generating multiple dict as shown below:

# Generate list which contains 10 values
randog byfile factory_def.py -L 10

You may want to generate multiple values while outputting each one to a separate file. In that case, you can utilize -O and -r as follows:

# Repeat 10 times and output each of them into out_001.json, out_002.json, ... with json format
randog byfile factory_def.py -r 10 -O 'out_{:03}.json' --json

You may want to discard some of the generated values, for example, if you are using PK with missing some timestamps. In the case, the output can be made missing by --discard or --regenerate. For example:

# output at most 20 values (each value will be discarded at 10% probability)
randog byfile factory_def.py --repeat 20 --discard 0.1

# output exactly 20 values (each value will be regenerated at 10% probability)
randog byfile factory_def.py --repeat 20 --regenerate 0.1

output as CSV

To output in CSV format, use the --csv option. The value of each field is determined by the CSV_COLUMNS defined in the definition file.

# output CSV which contains 20 rows
randog byfile factory_def.py --csv 20

Warning

Even if factory generates objects other than dict or CSV_COLUMNS is not defined in the definition file, it will output something in CSV format if the --csv option is specified, but this is not recommended. This behavior may be changed in the future.

CSV output can also be output to multiple files with the --repeat/-r and --output/-O options. In the following example, it outputs 20 lines to each of 10 CSV files.

# output 10 CSV files; each file contains 20 rows
randog byfile factory_def.py --csv 20 -r 10 -O 'out_{:03}.csv'

In the example at the top of this page, CSV_COLUMNS was defined as a list of strings, but you can also specify a function that returns a field instead of a string that specifies a dictionary key. In the following example, the third field is a string that is processed from the value of age.

import uuid

FACTORY = randog.factory.from_example({
    "uuid": uuid.uuid4,
    "name": "",
    "age": 20,
})

# output example: 17642547-0a4c-4897-a8da-2d495558b8fa,d40s8Jqs,20 years old
CSV_COLUMNS = [
    "uuid",
    "name",
    lambda d: f"{d['age']} years old",
]

You may want to discard some of the generated values, for example, if you are using PK with missing some timestamps. In the case, the output can be made missing by --discard or --regenerate. For example:

import uuid
from datetime import datetime, timedelta
import randog

def timestamp_iter():
    next = datetime(2002, 1, 1, 0)
    while True:
        yield next
        next += timedelta(hours=1)

FACTORY = randog.factory.randdict(
    timestamp=randog.factory.by_iterator(timestamp_iter()),
    name=randog.factory.randstr(),
    age=randog.factory.randint(0, 100),
)

CSV_COLUMNS = ["timestamp", "name", "age"]
# output at most 20 rows (each row will be discarded at 10% probability)
randog byfile factory_def.py --csv 20 --discard 0.1

# output exactly 20 rows (Gaps of 'timestamp' at 10% probability)
randog byfile factory_def.py --csv 20 --regenerate 0.1

Note

Missing rows by --discard will result in fewer rows of output than the number specified by --csv.

Note

Skipping rows by --regenerate will result in higher generations than the number specified by --csv.

Change behavior patterns by environment variables

One useful idea is to allow the detailed settings of the factory definition to be changed by environment variables. For example, the following definition file allows the initial value of id to be specified by an environment variable.

import itertools
import os
import randog

initial_id = int(
    os.environ.get("INIT_ID", "0")
)

FACTORY = randog.factory.randdict(
    id=randog.factory.by_iterator(itertools.count(initial_id)),
    name=randog.factory.randstr(),
    age=randog.factory.randint(0, 100),
)

In addition to the standard shell method, the env option of randog can be used to specify environment variables. All of the following examples work the same way:

# Can use it in bash, etc., but not in powershell
INIT_ID=5 randog byfile factory_def.py

# Can use it in any shell
randog byfile factory_def.py --env INIT_ID=5

Note

Multiple environment variables can also be specified as follows:

randog byfile factory_def.py --env INIT_ID=5 VAR=foo
randog byfile factory_def.py --env INIT_ID=5 --env VAR=foo

Note

If you want to make the definition file importable, it may be better to implement the reading of environment variables in if __name__ == "__randog__". See Importable definition files for details.