Odyssey
Classes | Functions | Variables
ody_migr_benchmarks Namespace Reference

Classes

class  BulkInsertionTester
 

Functions

def timer (name="duration")
 
def pg_prepare_insert_script_cursor_mogrify (_tbl_name, _columns, _cursor, _values_list)
 
def pg_prepare_insert_script_executemany_dict (_tbl_name, _columns)
 
def pg_prepare_insert_script_executemany_list (_tbl_name, _columns)
 
def get_data ()
 
def connect ()
 
def execute (sql, params={})
 
def fast_insertion_rate (count)
 
def normal_insertion_rate (count)
 

Variables

 DATABASE_HOST = os.environ.get("DATABASE_HOST")
 
 DATABASE_PORT = os.environ.get("DATABASE_PORT")
 
 DATABASE_NAME = os.environ.get("DATABASE_NAME")
 
 DATABASE_USER = os.environ.get("DATABASE_USER")
 
 DATABASE_PASSWORD = os.environ.get("DATABASE_PASSWORD")
 
string TEST_TABLE_NAME_STR = "psycopg2_insert_time_test"
 
 SETUP_SQL
 
 CLEANUP_SQL
 
string COL_FNAME = "first_name"
 
string COL_LNAME = "last_name"
 
string COL_EMAIL = "email"
 
string COL_GENDER = "gender"
 
string COL_IP = "ip_address"
 
string COL_SSN = "ssn"
 
string COL_CMPNYNAME = "company_name"
 
string COL_RACE = "race"
 
string COL_DPMT = "department"
 
list fixture_columns
 
list experiment_row_counts
 
 tester = BulkInsertionTester(rec_count)
 
list rates
 
 frame = pandas.DataFrame(rates).set_index('count')
 

Detailed Description

Script to profile execution times for multiple bulk insertion methods

Findings from this script align to the conclusions made in the following
tutorial: https://trvrm.github.io/bulk-psycopg2-inserts.html
Conclusion:

Using insertion script with select + unnest option is the most efficient way
of bulk insertion. Using unnest to load multiple rows simultaneously has the
following advantages:

- It is significantly faster than a regular insert loop, especially when
  inserting thousands of rows. Also aster with psycopg2.extra.
  execute_values and inserting list of values with a normal single insertion
  script with cursor.mogrify.
- The benefits of using unnest() increase at least up to 50,000 rows
- It still allows us to write straightforward parameterised and safe SQL
  with no (injection prone) string concatenations - we use psycopg2.sql
  module to generate safe sql scripts.

But using `unnest` requires explicit type conversion incase of NULL values.
A little less efficient (but still order of magnitude faster than interative
approach) but more pythonic option is to use `execute_values`. `execute_values`
require a template of placeholders to be specified beforehand which is why
we should explicitly specify default values for the missing columns during
the source to destionation table mappings.

psycopg2 API resources: http://initd.org/psycopg/docs/extras.html#fast-exec

Function Documentation

◆ pg_prepare_insert_script_cursor_mogrify()

def ody_migr_benchmarks.pg_prepare_insert_script_cursor_mogrify (   _tbl_name,
  _columns,
  _cursor,
  _values_list 
)
Prepare insertion script

A little arduous but reasonably safe sql with placeholders and
cursor.mogrify formatted columns and values for bulk insertion.

Example script:
    INSERT INTO psycopg2_insert_time_test ("first_name", "last_name",
    "email", "gender", "ip_address", "ssn", "company_name", "race",
    "department")
    VALUES
    ('254.70.235.138','Product Management','Noir','Marlyn','Livefish',
    'Female','mnoir0@bizjournals.com','Guamanian','651-05-8691'),
    ('151.11.81.28','Sales','Doxey','Berty','Skaboo','Male',
    'bdoxey1@amazon.com','Malaysian','658-15-9923')
    ...

Definition at line 120 of file ody_migr_benchmarks.py.

120 def pg_prepare_insert_script_cursor_mogrify(_tbl_name,
121  _columns,
122  _cursor,
123  _values_list):
124  """Prepare insertion script
125 
126  A little arduous but reasonably safe sql with placeholders and
127  cursor.mogrify formatted columns and values for bulk insertion.
128 
129  Example script:
130  INSERT INTO psycopg2_insert_time_test ("first_name", "last_name",
131  "email", "gender", "ip_address", "ssn", "company_name", "race",
132  "department")
133  VALUES
134  ('254.70.235.138','Product Management','Noir','Marlyn','Livefish',
135  'Female','mnoir0@bizjournals.com','Guamanian','651-05-8691'),
136  ('151.11.81.28','Sales','Doxey','Berty','Skaboo','Male',
137  'bdoxey1@amazon.com','Malaysian','658-15-9923')
138  ...
139  """
140  col_names = sql.SQL(", ").join(map(sql.Identifier, _columns))
141  placeholders = ",".join(["%s"] * len(fixture_columns))
142 
143  args_str = ','.join(_cursor.mogrify("({})".format(
144  placeholders), x).decode('utf-8') for x in _values_list)
145 
146  return sql.SQL("INSERT INTO {} ({}) VALUES {}").format(
147  sql.SQL(_tbl_name),
148  col_names,
149  sql.SQL(args_str)
150  )
151 
152 

◆ pg_prepare_insert_script_executemany_dict()

def ody_migr_benchmarks.pg_prepare_insert_script_executemany_dict (   _tbl_name,
  _columns 
)
Prepare insertion script

Uses named placeholders to insert list of records (dictonary)

Example script:
INSERT INTO psycopg2_insert_time_test ("first_name", "last_name", "email",
"gender", "ip_address", "ssn", "company_name", "race", "department")
VALUES (%(first_name)s, %(last_name)s, %(email)s, %(gender)s,
%(ip_address)s, %(ssn)s, %(company_name)s, %(race)s, %(department)s)

`values` attribute of cursor.executemany() must be a list of named dict

Definition at line 153 of file ody_migr_benchmarks.py.

153 def pg_prepare_insert_script_executemany_dict(_tbl_name, _columns):
154  """Prepare insertion script
155 
156  Uses named placeholders to insert list of records (dictonary)
157 
158  Example script:
159  INSERT INTO psycopg2_insert_time_test ("first_name", "last_name", "email",
160  "gender", "ip_address", "ssn", "company_name", "race", "department")
161  VALUES (%(first_name)s, %(last_name)s, %(email)s, %(gender)s,
162  %(ip_address)s, %(ssn)s, %(company_name)s, %(race)s, %(department)s)
163 
164  `values` attribute of cursor.executemany() must be a list of named dict
165  """
166  return pg_handler.pg_prepare_insert_single_dict(_tbl_name, _columns)
167 
168 

◆ pg_prepare_insert_script_executemany_list()

def ody_migr_benchmarks.pg_prepare_insert_script_executemany_list (   _tbl_name,
  _columns 
)
Prepare insertion script

Uses named placeholders to insert list of records (list)

Example script:
INSERT INTO "psycopg2_insert_time_test" ("first_name", "last_name",
"email", "gender", "ip_address", "ssn", "company_name", "race",
"department") VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s)


`values` attribute of cursor.executemany() must be a list of multiple
records - list of list (each record values must have same order
as _columns)

Definition at line 169 of file ody_migr_benchmarks.py.

169 def pg_prepare_insert_script_executemany_list(_tbl_name, _columns):
170  """Prepare insertion script
171 
172  Uses named placeholders to insert list of records (list)
173 
174  Example script:
175  INSERT INTO "psycopg2_insert_time_test" ("first_name", "last_name",
176  "email", "gender", "ip_address", "ssn", "company_name", "race",
177  "department") VALUES (%s, %s, %s, %s, %s, %s, %s, %s, %s)
178 
179 
180  `values` attribute of cursor.executemany() must be a list of multiple
181  records - list of list (each record values must have same order
182  as _columns)
183  """
184  return pg_handler.pg_prepare_insert_single_list(_tbl_name, _columns)
185 
186 

◆ timer()

def ody_migr_benchmarks.timer (   name = "duration")
Utility function for timing execution

Definition at line 48 of file ody_migr_benchmarks.py.

48 def timer(name="duration"):
49  '''Utility function for timing execution'''
50  start = time.time()
51  yield
52  duration = time.time() - start
53  print("{}: {:.5f} second(s)".format(name, duration))
54 
55 
56 # import database environment variables

Variable Documentation

◆ CLEANUP_SQL

ody_migr_benchmarks.CLEANUP_SQL
Initial value:
1 = sql.SQL().format(
2  sql.SQL(TEST_TABLE_NAME_STR)
3 )

Definition at line 90 of file ody_migr_benchmarks.py.

◆ experiment_row_counts

list ody_migr_benchmarks.experiment_row_counts
Initial value:
1 = [
2  1, 50, 100, 200, 500, 1000, 2000,
3  5000, 10000, 20000, 35000, 50000
4  ]

Definition at line 534 of file ody_migr_benchmarks.py.

◆ fixture_columns

list ody_migr_benchmarks.fixture_columns
Initial value:
1 = [
2  COL_FNAME,
3  COL_LNAME,
4  COL_EMAIL,
5  COL_GENDER,
6  COL_IP,
7  COL_SSN,
8  COL_CMPNYNAME,
9  COL_RACE,
10  COL_DPMT,
11 ]

Definition at line 107 of file ody_migr_benchmarks.py.

◆ rates

list ody_migr_benchmarks.rates
Initial value:
1 = [
2  {
3  "count": count,
4  'rate of insertion (fast)': fast_insertion_rate(count),
5  'rate of insertion (normal)': normal_insertion_rate(count)
6 
7  }
8  for count in experiment_row_counts
9  ]

Definition at line 545 of file ody_migr_benchmarks.py.

◆ SETUP_SQL

ody_migr_benchmarks.SETUP_SQL
Initial value:
1 = sql.SQL().format(
2  sql.SQL(TEST_TABLE_NAME_STR),
3  sql.SQL(TEST_TABLE_NAME_STR),
4  sql.SQL(TEST_TABLE_NAME_STR)
5 )

Definition at line 66 of file ody_migr_benchmarks.py.