API Documentation¶

Note that JH utils is excluded due to side-effects of importing the module. See the source file for details.

ff_utils¶

class dcicutils.ff_utils.CountSummary(are_even, summary_total)¶

are_even¶: Alias for field number 0

summary_total¶: Alias for field number 1

class dcicutils.ff_utils.SearchESMetadataHandler(key=None, ff_env=None)¶

Wrapper class for executing lucene queries directly on ES. Resolves ES instance location via health page of the given environment. Requires AWS permissions to use. Can be used directly but is used through search_es_metadata.

NOTE: use this class directly if you plan on making multiple subsequent requests to the same environment.

execute_search(index, query, is_generator=False, page_size=200)¶

Executes lucene query on this client’s index.

Parameters:	index – index to search under query – query to run is_generator – boolean on whether or not to use a generator page_size – if using a generator, how many results to give per request
Returns:	list of results of query or None

dcicutils.ff_utils.authorized_request(url, auth=None, ff_env=None, verb='GET', retry_fxn=<function standard_request_with_retries>, **kwargs)¶

Generalized function that handles authentication for any type of request to FF. Takes a required url, request verb, auth, fourfront environment, and optional retry function and headers. Any other kwargs provided are also past into the request. For example, provide a body to a request using the ‘data’ kwarg. Timeout of 60 seconds used by default but can be overwritten as a kwarg.

Verb should be one of: GET, POST, PATCH, PUT, or DELETE auth should be obtained using s3Utils.get_key. If not provided, try to get the key using s3_utils if ‘ff_env’ in kwargs

usage: authorized_request(’https://data.4dnucleome.org/<some path>’, (authId, authSecret)) OR authorized_request(’https://data.4dnucleome.org/<some path>’, ff_env=’fourfront-webprod’)

dcicutils.ff_utils.convert_param(parameter_dict, vals_as_string=False)¶: converts dictionary format {argument_name: value, argument_name: value, …} to {‘workflow_argument_name’: argument_name, ‘value’: value}

dcicutils.ff_utils.delete_field(obj_id, del_field, key=None, ff_env=None)¶: Given string obj_id and string del_field, delete a field(or fields seperated by commas). To support the old syntax, obj_id may be a dict item. Same auth mechanism as the other metadata functions

dcicutils.ff_utils.delete_metadata(obj_id, key=None, ff_env=None)¶: Given standard key/ff_env authentication, simply set the status of the given object to ‘deleted’

dcicutils.ff_utils.dump_results_to_json(store, folder)¶: Takes resuls from expand_es_metadata, and dumps them into the given folder in json format. Args:

store (dict): results from expand_es_metadata folder: folder for storing output

dcicutils.ff_utils.expand_es_metadata(uuid_list, key=None, ff_env=None, store_frame='raw', add_pc_wfr=False, ignore_field=None, use_generator=False, es_client=None)¶

starting from list of uuids, tracks all linked items in object frame by default if you want to add processed files and workflowruns, you can change add_pc_wfr to True returns a dictionary with item types (schema name), and list of items in defined frame Sometimes, certain fields need to be skipped (i.e. relations), you can use ignore fields. Args:

uuid_list (list): Starting node for search, only use uuids. key (dict): standard ff_utils authentication key ff_env (str): standard ff environment string store_frame (str, default ‘raw’):Depending on use case, can store frame raw or object or embedded

Note: If you store in embedded, the total collection can have references to the items that are not in the store

add_pc_wfr (bool): Include workflow_runs and linked items (processed/ref files, wf, software…) ignore_field(list): Remove keys from items, so any linking through these fields, ie relations use_generator (bool): Use a generator when getting es. Less memory used but takes longer es_client: optional result from es_utils.create_es_client - note this could be regenerated

in this method if the signature expires

Returns:

dict: contains all item types as keys, and with values of list of dictionaries

i.e. {

‘experiment_hi_c’: [ {‘uuid’: ‘1234’, ‘@id’: ‘/a/b/’, …}, {…}], ‘experiment_set’: [ {‘uuid’: ‘12345’, ‘@id’: ‘/c/d/’, …}, {…}],

}

list: contains all uuids from all items.

# TODO: if more file types (currently FileFastq and FileProcessed) get workflowrun calculated properties

we need to add them to the add_from_embedded dictionary.

dcicutils.ff_utils.faceted_search(key=None, ff_env=None, item_type=None, **kwargs)¶

Wrapper method for search_metadata that provides an easier way to search items based on facets

kwargs should contain the following 5 things:

key (if not using built in aws auth)
ff_env (if not using build in aws auth)
item_type (if not searching for experiment sets)
item_facets (if you don’t want to resolve these in this function)

any facets (| seperated values) you’d like to search on (see example below)

Example: search for all experiments under the 4DN project with experiment type Dilution Hi-C

kwargs = { ‘Project’: ‘4DN’,

‘Experiment Type’: ‘Dilution Hi-C’, ‘key’: key, ‘ff_env’: ff_env, ‘item_type’: ‘ExperimentSetReplicate’ }

results = faceted_search(**kwargs)

dcicutils.ff_utils.fetch_files_qc_metrics(data, associated_files=None, ignore_typical_fields=True, key=None, ff_env=None)¶

Utility function to grap all the qc metrics from associated types of file such as: ‘proccessed_files’, ‘other_processed_files’, ‘files’ Args:

data: the metadata of a ExperimentSet or Experiment associated_files: a list of the types of the files fields the qc metrics will be extracted from:

examples are = [‘files’, ‘processed_files’, ‘other_processed_files’]

ignore_typical_fields: flag to ignore 4DN custom fields from the qc metric object key: authentication key for ff_env (see get_authentication_with_server) ff_env: The relevant ff beanstalk environment name.

Returns:: a dictionary of dictionaries containing the qc_metric information

dcicutils.ff_utils.fetch_network_ids(subnet_names, security_group_names)¶: Takes lists of subnet and security group names, and fetches their corresponding ids from AWS.

dcicutils.ff_utils.get_associated_qc_metrics(uuid, key=None, ff_env=None, include_processed_files=True, include_raw_files=False, include_supplementary_files=False)¶

Given a uuid of an experimentSet return a dictionary of dictionaries with each dictionary representing a quality metric.

Args:

uuid: uuid of an experimentSet key: authentication key for ff_env (see get_authentication_with_server) ff_env: The relevant ff beanstalk environment name. include_processed_files: if False will exclude QC metrics on processed files

Default: True

include_raw_files: if True will provide QC metrics on raw files as well: Default: False
include_supplementary_files: if True will also give QC’s associated with: non-processed files. Default: False

Returns:

a dictionary of dictionaries with the following structure:

{<qc_metric_uuid>}:{: ‘values’: the values of the qc_metric object>, ‘source_file_association’: <the file class (processed_file or raw_files)>, ‘source_file’: <the accession of the file that the qc is linked to>, ‘source_file_type’: <the description of the file that the qc is linked to>, ‘experiment_description’: <the description of the experiment or experimentset> ‘organism’: <the organism> ‘experiment_type’: <the experiment type (in situ Hi-C, ChIP-seq)>, ‘experiment_subclass’: <the experiment subclass (Hi-C)>, ‘source_experiment’: <the experiment the qc is linked to (if apply)>, ‘source_experimentSet’: <the experimentSet the qc is linked to>, ‘biosource_summary’: <the experiment biosource> }

}

dcicutils.ff_utils.get_authentication_with_server(auth=None, ff_env=None)¶: Pass in authentication information and ff_env and attempts to either retrieve the server info from the auth, or if it cannot, get the key with s3_utils given

dcicutils.ff_utils.get_counts_page(key=None, ff_env=None)¶: Gets DB/ES counts page in JSON

dcicutils.ff_utils.get_counts_summary(env)¶: Returns a named tuple given an FF name to check representing the counts state. CountSummary

are_even: boolean on whether or not counts are even summary_total: raw value of counts

dcicutils.ff_utils.get_es_metadata(uuids, es_client=None, filters=None, sources=None, chunk_size=200, is_generator=False, key=None, ff_env=None)¶

Given a list of string item uuids, will return a dictionary response of the full ES record for those items (or an empty dictionary if the items don’t exist/ are not indexed) Returns

A dictionary with following keys

-keys with metadata

properties (raw frame without uuid), embedded, object

-keys summarizing interactions

linked_uuids_object, linked_uuids_embedded, links, rev_linked_to_me

-others paths, aggregated_items, rev_link_names, item_type, principals_allowed, unique_keys, sid, audit, uuid, propsheets

Args

uuids:

list of uuids to fetch from ES

es_client:

You can pass in an Elasticsearch client (initialized by create_es_client) through the es_client param to save init time.

filters:

Advanced users can optionally pass a dict of filters that will be added to the Elasticsearch query.

For example: filters={‘status’: ‘released’} You can also specify NOT fields:

example: filters={‘status’: ‘!released’}

You can also specifiy lists of values for fields:

example: filters={‘status’: [‘released’, archived’]}

NOTES:

different filter field are combined using AND queries (must all match)

example: filters={‘status’: [‘released’], ‘public_release’: [‘2018-01-01’]}
values for the same field and combined with OR (such as multiple statuses)

sources:

You may also specify which fields are returned from ES by specifying a list of source fields with the sources argument. This field MUST include the full path of the field, such as ‘embedded.uuid’ (for the embedded frame) or ‘object.uuid’ for the object frame. You may also use the wildcard, such as ‘embedded.*’ for all fields in the embedded frame. You need to follow the dictionary structure of the get_es_metadata result i.e. for getting uuids on the linked field ‘files’

sources = [‘properties.files’] or sources = [‘embedded.files.uuid’]

i.e. getting all fields for lab in embedded frame: sources = [‘embedded.lab.*’]
i.e. for getting a only object frame: sources = [‘object.*’]

chunk_size:

Integer chunk_size may be used to control the number of uuids that are passed to Elasticsearch in each query; setting this too high may cause ES reads to timeout.

is_generator:

Boolean is_generator will return a generator for individual results if True; if False (default), returns a list of results.

key: authentication key for ff_env (see get_authentication_with_server) ff_env: authentication by env (needs system variables)

dcicutils.ff_utils.get_es_search_generator(es_client, index, body, page_size=200)¶: Simple generator behind get_es_metadata which takes an es_client (from es_utils create_es_client), a string index, and a dict query body. Also takes an optional string page_size, which controls pagination size NOTE: ‘index’ must be namespaced

dcicutils.ff_utils.get_health_page(key=None, ff_env=None)¶: Simple function to return the json for a FF health page

dcicutils.ff_utils.get_indexing_status(key=None, ff_env=None)¶: Gets indexing status counts page in JSON

dcicutils.ff_utils.get_item_facet_values(item_type, key=None, ff_env=None)¶: Gets all facets and returns all possible values for each one with counts ie: dictionary of facets mapping to a dictionary containing all possible values for that facet mapping to the count for that value format: {‘Project’: {‘4DN’: 2, ‘Other’: 6}, ‘Lab’: {…}}

dcicutils.ff_utils.get_item_facets(item_type, key=None, ff_env=None)¶: Gets facet query string information ie: mapping from facet to query string

dcicutils.ff_utils.get_metadata(obj_id, key=None, ff_env=None, check_queue=False, add_on='')¶: Function to get metadata for a given obj_id (uuid or @id, most likely). Either takes a dictionary form authentication (MUST include ‘server’) or a string fourfront-environment. Also a boolean ‘check_queue’, which if True will use information from the queues and/or datastore=database to ensure that the metadata is accurate. Takes an optional string add_on that should contain things like “frame=object”. Join query parameters in the add_on using “&”, e.g. “frame=object&force_md5” REQUIRES ff_env if check_queue is used

dcicutils.ff_utils.get_metadata_links(obj_id, key=None, ff_env=None)¶: Given standard key/ff_env authentication, return result for @@links view

dcicutils.ff_utils.get_response_json(res)¶: Very simple function to return json from a response or raise an error if it is not present. Used with the metadata functions.

dcicutils.ff_utils.get_schema_names(key=None, ff_env=None)¶

Create a dictionary of all schema names to item class names i.e. FileFastq: file_fastq

Args:: key (dict): standard ff_utils authentication key ff_env (str): standard ff environment string
Returns:: dict: contains key schema names and value item class names

dcicutils.ff_utils.get_search_generator(search_url, auth=None, ff_env=None, page_limit=50)¶

Returns a generator given a search_url (which must contain server!), an auth and/or ff_env, and an int page_limit, which is used to determine how many results are returned per page (i.e. per iteration of the generator)

Paginates by changing the ‘from’ query parameter, incrementing it by the page_limit size until fewer results than the page_limit are returned. If ‘limit’ is specified in the query, the generator will stop when that many results are collectively returned.

dcicutils.ff_utils.get_url_params(url)¶: Returns a dictionary of url params using urlparse.parse_qs. Example: get_url_params(‘<server>/search/?type=Biosample&limit=5’) returns {‘type’: [‘Biosample’], ‘limit’: ‘5’}

dcicutils.ff_utils.patch_metadata(patch_item, obj_id='', key=None, ff_env=None, add_on='')¶: Patch metadata given the patch body and an optional obj_id (if not provided, will attempt to use accession or uuid from patch_item body). Either takes a dictionary form authentication (MUST include ‘server’) or a string fourfront-environment.

dcicutils.ff_utils.post_metadata(post_item, schema_name, key=None, ff_env=None, add_on='')¶: Post metadata given the post body and a string schema name. Either takes a dictionary form authentication (MUST include ‘server’) or a string fourfront-environment. add_on is the string that will be appended to the post url (used with tibanna)

dcicutils.ff_utils.process_add_on(add_on)¶: simple function to ensure that a query add on string starts with “?”

dcicutils.ff_utils.purge_metadata(obj_id, key=None, ff_env=None)¶: Given standard key/ff_env authentication, attempt to purge the item from the DB (FULL delete). If the item cannot be deleted due to other items still linking it, this function provides information in the response @graph

dcicutils.ff_utils.purge_request_with_retries(request_fxn, url, auth, verb, **kwargs)¶: Example of using a non-standard retry function. This one is for purges, which return a 423 if the item is locked. This function returns a list of locked items to faciliate easier purging

dcicutils.ff_utils.search_es_metadata(index, query, key=None, ff_env=None, is_generator=False)¶

Executes a lucene search query on on the ES Instance for this environment.

NOTE: It is okay to use this function directly but for repeat usage please use SearchESMetadataHandler as it caches an expensive API request to AWS

Parameters:	index – index to search under query – dictionary of query key – optional, 2-tuple authentication key (access_key_id, secret) ff_env – ff_env to use is_generator – boolean on whether or not to use a generator
Returns:	list of results of query or None

dcicutils.ff_utils.search_metadata(search, key=None, ff_env=None, page_limit=50, is_generator=False)¶: Make a get request of form <server>/<search> and returns a list of results using a paginated generator. Include all query params in the search string. If is_generator is True, return a generator that yields individual search results. Otherwise, return all results in a list (default) Either takes a dictionary form authentication (MUST include ‘server’) or a string fourfront-environment.

dcicutils.ff_utils.search_request_with_retries(request_fxn, url, auth, verb, **kwargs)¶: Example of using a non-standard retry function. This one is for searches, which return a 404 on an empty search. Handle this case so an empty array is returned as a search result and not an error

dcicutils.ff_utils.search_result_generator(page_generator)¶

Simple wrapper function to return a generator to iterate through item results on the search page

NOTE: Depending on the nature of the page generator, which may involve separate external calls to a resource like elastic search that is not transactionally managed, the data being queried may change between those calls, usually to add (though theoretically even to remove) an element.

Consider a case where the data to be queried is indexed in elastic search as A,C,E,G,I,K,M. but where a page size of 3 is used with start position 0. That call will return A,C,E. The user may expect G,I on the second page, but before it can be done, suppose an element D is indexed and that the stored data is A,C,D,E,G,I,K,M. Requesting data from start position 0 would now return A,C,D but we already had the first page, so we request data starting at position 3 for the second page and get E,G,I. That means our sequence of return values would be A,C,E,E,G,I,K,M, or, in other words, showing a duplication. To avoid this, we keep track of the IDs we’ve seen and show only the first case of each element, so A,C,E,G,I,K,M. (We won’t see the D but we weren’t going to see it anyway, and it wasn’t available the time we started, so the timing was already close.)

Unfortunately, we aren’t so lucky for deletion, though that happens more rarely. That will cause an element to fall out. So if we have A,C,E,G,I,K,M and C is deleted between the first and second call, getting us A,C,E first, and then on the second call when the data is A,E,G,I,K,M we get I,K,M, we’ll get the sequence A,C,E,I,K,M and will have missed legitimate element G. There is little to do with this without restarting (which might not terminate or might be O(n^2) in worst case). But deletion is unusual.

dcicutils.ff_utils.standard_request_with_retries(request_fxn, url, auth, verb, **kwargs)¶: Standard function to execute the request made by authorized_request. If desired, you can write your own retry handling, but make sure the arguments are formatted identically to this function. request_fxn is the request function, url is the string url, auth is the tuple standard authentication, and verb is the string kind of verb. any additional kwargs are passed to the request. Handles errors and returns the response if it has a status code under 400.

dcicutils.ff_utils.stuff_in_queues(ff_env, check_secondary=False)¶: Used to guarantee up-to-date metadata by checking the contents of the indexer queues. If items are currently waiting in the primary queue, return False. If check_secondary is True, will also check the secondary queue.

dcicutils.ff_utils.unified_authentication(auth=None, ff_env=None)¶

One authentication function to rule them all. Has several options for authentication, which are: - manually provided tuple auth key (pass to key param) - manually provided dict key, like output of

s3Utils.get_access_keys() (pass to key param)

string name of the fourfront environment (pass to ff_env param)

(They are checked in this order). Handles errors for authentication and returns the tuple key to use with your request.

dcicutils.ff_utils.update_url_params_and_unparse(url, url_params)¶: Takes a string url and url params (in format of what is returned by get_url_params). Returns a string url param with newly formatted params

dcicutils.ff_utils.upsert_metadata(upsert_item, schema_name, key=None, ff_env=None, add_on='')¶: UPSERT metadata given the upsert body and a string schema name. UPSERT means POST or PATCH on conflict. Either takes a dictionary form authentication (MUST include ‘server’) or a string fourfront-environment. This function checks to see if an existing object already exists with the same body, and if so, runs a patch instead. add_on is the string that will be appended to the upsert url (used with tibanna)

s3_utils¶

es_utils¶

class dcicutils.es_utils.ElasticSearchServiceClient(region_name='us-east-1')¶

Implements utilities for interacting with the Amazon ElasticSearch Service. The idea is, for the production setup, we implement a hot/cold cluster configuration where during the day (say 6 am to 8 pm EST) we run a larger cluster than at night/on weekends. Foursight will implement this mechanism.

resize_elasticsearch_cluster(*, domain_name, master_node_type, master_node_count, data_node_type, data_node_count=2)¶

Triggers a resizing of the given cluster name (the env name).

Parameters:	domain_name – name of domain we’d like to resize master_node_type – instance type we’d like to use for master nodes master_node_count – number of master nodes (disabled if 0) data_node_type – instance type we’d like to use for data nodes data_node_count – # of data nodes, 2 by default
Returns:	True if successful, False otherwise

dcicutils.es_utils.create_es_client(es_url, use_aws_auth=True, **options)¶: Use to create a ES that supports the signature version 4 signing process. Need to do role-based IAM access control for AWS hosted ES. Takes a string es server url, boolean whether or not to use aws auth signing procedure, and any additional kwargs that will be passed to creation of the Elasticsearch client.

dcicutils.es_utils.create_snapshot_repo(client, repo_name, s3_bucket)¶

Creates a repo to store ES snapshots on

info about snapshots on AWS https://docs.aws.amazon.com/elasticsearch-service/latest/developerguide/es-managedomains-snapshots.html

dcicutils.es_utils.execute_lucene_query_on_es(client, index, query)¶

Executes the given lucene query (in dictionary form)

Parameters:	client – elasticsearch client index – index to search under query – dictionary of query
Returns:	result of query or None

dcicutils.es_utils.get_bulk_uuids_embedded(client, index, uuids, is_generator=False)¶

Gets the embedded view for all uuids in an index with a single multi-get ES request.

NOTE: because an index is required, when passing uuids to this method they must all be of the same item type. The index can be determined by:

‘’.join([eb_env_name, item_type]) ex: fourfront-mastertestuser or fourfront-mastertestfile_format

Parameters:	client – elasticsearch client index – index to search uuids – list of uuids (all of the same type) is_generator – whether to use a generator over the response (NOT paginate)
Returns:	list of embedded views of the given uuids, if any

beanstalk_utils¶

Utilities related to ElasticBeanstalk deployment and management. This includes, but is not limited to: ES, s3, RDS, Auth0, and Foursight.

exception dcicutils.beanstalk_utils.WaitingForBoto3¶

dcicutils.beanstalk_utils.add_to_auth0_client(new)¶

Given an ElasticBeanstalk env name, find the url and use it to update the callback URLs for Auth0

Args:: new (str): EB environment name
Returns:: None

dcicutils.beanstalk_utils.auth0_client_remove(url)¶

Get a JWT for programmatic access to Auth0 using Client/Secret env vars. Then use that to remove the given url from the Auth0 callbacks list.

Args:: url (str): url to remove from callbacks
Returns:: None

dcicutils.beanstalk_utils.auth0_client_update(url)¶

Get a JWT for programmatic access to Auth0 using Client/Secret env vars. Then add the given url to the Auth0 callbacks list.

Args:: url (str): url to add to callbacks
Returns:: None

dcicutils.beanstalk_utils.beanstalk_info(env)¶

Describe a ElasticBeanstalk environment given an environment name

Args:: env (str): ElasticBeanstalk environment name
Returns:: dict: Environments result from describe_beanstalk_environments

dcicutils.beanstalk_utils.clone_beanstalk_command_line(old, new, prod=False, copy_s3=False)¶

Maybe useful command to clone an existing ElasticBeanstalk environment to a new one. Will create an Elasticsearch instance, s3 buckets, clone the existing RDS of the environment, and optionally copy s3 contents. Also adds the new EB url to Auth0 callback urls. Should be run exclusively via command line, as it requires manual input and subprocess calls of AWS command line tools.

Note:: The eb cli tool sets up a configuration file in the directory of the project respository. As such, this command MUST be called from that directory. Will exit if not called from an eb initialized directory.
Args:: old (str): environment name of existing ElasticBeanstalk new (str): new ElasticBeanstalk environment name prod (bool): set to True if this is a prod environment. Default False copy_s3 (bool): set to True to copy s3 contents. Default False
Returns:: None

dcicutils.beanstalk_utils.compute_cgap_prd_env()¶: Returns the name of the current CGAP production environment.

dcicutils.beanstalk_utils.compute_cgap_stg_env()¶: Returns the name of the current CGAP staging environment, or None if there is none.

dcicutils.beanstalk_utils.compute_ff_prd_env()¶: Returns the name of the current Fourfront production environment.

dcicutils.beanstalk_utils.compute_ff_stg_env()¶: Returns the name of the current Fourfront staging environment.

dcicutils.beanstalk_utils.compute_prd_env_for_env(envname)¶: Given an environment, returns the name of the prod environment for its owning project.

dcicutils.beanstalk_utils.copy_s3_buckets(new, old)¶

Given a new ElasticBeanstalk environment name and existing “old” one, create the given buckets and copy contents from the corresponding existing ones

Args:: new (str): new EB environment name old (str): existing EB environment name
Returns:: None

dcicutils.beanstalk_utils.create_db_from_snapshot(db_identifier, snapshot_name, delete_db_if_present=True)¶

Given an RDS instance indentifier and a snapshot ARN/name, create an RDS instance from the snapshot. If an instance already exists with the given identifier and delete_db is True, attempt to delete and return “Deleting”. Otherwise, return instance ARN.

Args:: db_identifier (str): RDS instance identifier snapshot_name (str): identifier/ARN of RDS snapshot to restore from
Returns:: str: resource ARN if successful, otherwise “Deleting”

dcicutils.beanstalk_utils.create_db_snapshot(db_identifier, snapshot_name)¶

Given an RDS instance indentifier, create a snapshot using the given name. If a snapshot with given name already exists, attempt to delete and return “Deleting”. Otherwise, return snapshot ARN.

Args:: db_identifier (str): RDS instance identifier snapshot_name (str): identifier/ARN of RDS snapshot to create
Returns:: str: resource ARN if successful, otherwise “Deleting”

dcicutils.beanstalk_utils.create_foursight(dest_env, bs_url, es_url, fs_url=None)¶

Creates a new Foursight environment based off of dest_env. Since Foursight environments don’t include “fourfront-” by convention, remove this if part of the env. Take some other options for settings on the env

Note: this will cause all checks in all schedules to be run, to initialize: environment.
Args:: dest_env (str): ElasticBeanstalk environment name bs_url (str): url of the ElasticBeanstalk for FS env es_url (str): url of the ElasticSearch for FS env fs_url (str): If provided, use to override dest-env based FS url
Returns:: dict: response from Foursight PUT to /api/environments
Raises:: Exception: if cannot get body from Foursight response

dcicutils.beanstalk_utils.create_s3_buckets(new)¶

Given an ElasticBeanstalk env name, create the following s3 buckets that are standard for any of our EB environments.

Args:: new (str): EB environment name
Returns:: None

dcicutils.beanstalk_utils.delete_beanstalk_command_line(env)¶

Maybe useful command to delete an existing ElasticBeanstalk environment, including associated ES, s3, and RDS resources. Will also remove the associated callback url from Auth0. Should be run exclusively via command line, as it requires manual input and subprocess calls of AWS command line tools.

Note:: The eb cli tool sets up a configuration file in the directory of the project respository. As such, this command MUST be called from that directory. Will exit if not called from an eb initialized directory.
Args:: env (str): EB environment name to delete
Returns:: None

dcicutils.beanstalk_utils.delete_bs_env_cli(env_name)¶

Use the eb command line client to remove an ElasticBeanstalk environment with some extra options.

Args:: env_name (str): EB environment name
Returns:: None

dcicutils.beanstalk_utils.delete_db(db_identifier, take_snapshot=True, allow_delete_prod=False)¶

Given db_identifier, delete an RDS instance. If take_snapshot is true, will create a final snapshot named “<db_identifier>-final-<yyyy-mm-dd>”.

Args:: db_identifier (str): name of RDS instance take_snapshot (bool): If True, take a final snapshot before deleting allow_delete_prod (bool): Must be True to allow deletion of ‘webprod’ DB
Returns:: dict: boto3 response from delete_db_instance

dcicutils.beanstalk_utils.delete_es_domain(env_name)¶

Given an Elasticsearch domain name, delete the domain

Args:: env_name (str): Fourfront EB environment name used for ES instance
Returns:: None

dcicutils.beanstalk_utils.delete_s3_buckets(env_name)¶

Given an ElasticBeanstalk env name, remove the following s3 buckets that are standard for any of our EB environments.

Args:: env_name (str): EB environment name
Returns:: None

dcicutils.beanstalk_utils.describe_beanstalk_environments(client, **kwargs)¶

Generic function for retrying client.describe_environments to avoid AWS throttling errors. Passes all given kwargs to describe_environments

Args:

client (botocore.client.ElasticBeanstalk): boto3 client

Returns:

dict: response from client.describe_environments

Raises:

Exception: if a non-ClientError exception is encountered during: describe_environments or cannot complete within retry framework

dcicutils.beanstalk_utils.get_beanstalk_environment_variables(env)¶

Acquires the environment variables used to deploy the given environment.

VERY IMPORTANT NOTE: this function will echo extremely sensitive data if run. Ensure that if you are using this you are not logging the output of this anywhere.

dcicutils.beanstalk_utils.get_beanstalk_real_url(env)¶

Return the real url for the elasticbeanstalk with given environment name. Name can be ‘cgap’, ‘data’, ‘staging’, or an actual environment.

Args:: env (str): ElasticBeanstalk environment name
Returns:: str: url of the ElasticBeanstalk environment

dcicutils.beanstalk_utils.get_bs_env(envname)¶

Given an ElasticBeanstalk environment name, get the env variables from that environment and return them. Returned variables are in form: <name>=<value>

Args:: envname (str): name of ElasticBeanstalk environment
Returns:: list: of environment variables in <name>=<value> form

dcicutils.beanstalk_utils.get_foursight_env(dest_env, bs_url=None)¶

Get a Foursight environment name corresponding the given ElasticBeanstalk environment name, with optionally providing the EB url for must robustness

Args:: dest_env (str): ElasticBeanstalk environment name bs_url (str): optional url for the ElasticBeanstalk instance
Returns:: str: Foursight environment name

dcicutils.beanstalk_utils.is_db_ready(db_identifier)¶

Checker function used with torb waitfor lambda; output must be standarized. Check to see if an RDS instance with given name is ready

Args:: db_identifier (str): RDS instance identifier
Returns:: bool, str: True if done, RDS address

dcicutils.beanstalk_utils.is_snapshot_ready(snapshot_name)¶

Checker function used with torb waitfor lambda; output must be standarized. Check to see if an RDS snapshot with given name is available

Args:: snapshot_name (str): RDS snapshot name
Returns:: bool, str: True if done, identifier of snapshot

dcicutils.beanstalk_utils.log_to_foursight(event, lambda_name='', overrides=None)¶

Use Foursight as a logging tool within in a lambda function by doing a PUT to /api/checks. Requires that the event has “_foursight” key, which is a subobject with the following: fields:

“check”: required, in form “<fs environment>/<check name>” “log_desc”: will set “summary” and “description” if those are missing “full_output”: optional. If not provided, use to provide info on lambda “brief_output”: optional “summary”: optional. If not provided, use “log_desc” value “description”: optional. If not provided, use “log_desc” value “status”: optional. If not provided, use “WARN”

Can also optionally provide an dictionary to overrides param, which will update the event[“_foursight”]

Args:: event (dict): Event input, most likely from a lambda with a workflow lambda_name (str): Name of the lambda that is calling this overrides (dict): Optionally override event[‘_foursight’] with this
Returns:: Response object from foursight
Raises:: Exception: if cannot get body from Foursight response

dcicutils.beanstalk_utils.remove_from_auth0_client(env_name)¶

Given an ElasticBeanstalk env name, find the url and remove it from the callback urls for Auth0

Args:: env_name (str): EB environment name
Returns:: None

dcicutils.beanstalk_utils.snapshot_and_clone_db(db_identifier, snapshot_name)¶

Given a RDS instance identifier and snapshot name, will create a snapshot with that name and then spin up a new RDS instance named after the snapshot

Args:: db_identifier (str): original RDS identifier of DB to snapshot snapshot_name (str): identifier of snapshot AND new instance
Returns:: str: address of the new instance

dcicutils.beanstalk_utils.source_beanstalk_env_vars(config_file='/opt/python/current/env')¶

set environment variables if we are on Elastic Beanstalk AWS_ACCESS_KEY_ID is indicative of whether or not env vars are sourced

Args:: config_file (str): filepath to load env vars from

dcicutils.beanstalk_utils.swap_cname(src, dest)¶: Does a CNAME swap and foursight configuration (pulled in from Torb) NOTE: this is the mechanism by which CNAME swaps must occur as of 9/15/2020

dcicutils.beanstalk_utils.whodaman()¶: Returns the name of the current Fourfront production environment.

log_utils¶

class dcicutils.log_utils.ElasticsearchHandler(env, es_server)¶

Custom handler to post logs to Elasticsearch. Needed to sign ES requests with the AWS V4 Signature Loosely based off of code here: https://github.com/cmanaha/python-elasticsearch-logger

calculate_log_index()¶: Simple function to name the ES log index by month Convention is: logs-<yyyy>-<mm> * Uses UTC *

emit(record)¶: Overload the emit method to post logs to ES

static get_namespace(env)¶: Grabs ES namespace from health page

resend_messages()¶: Send all records held in self.messages_to_resend in Elasticsearch in bulk. Keep track of subsequent errors and retry them, if they have been retried fewer times than self.retry_limit

schedule_resend()¶: Create a threading Timer as self.resend_timer to schedule resending any records in self.resend_messages after 5 seconds. If already resending, do nothing

class dcicutils.log_utils.ElasticsearchLoggerFactory(env=None, ignore_frame_names=None, es_server=None, in_prod=False)¶: Needed to bind the ElasticsearchHandler to the structlog logger. Use for logger_factory arg in structlog.configure function See: https://github.com/hynek/structlog/blob/master/src/structlog/stdlib.py

dcicutils.log_utils.add_log_uuid(logger, log_method, event_dict)¶: this function adds a uuid to the log

dcicutils.log_utils.convert_ts_to_at_ts(logger, log_method, event_dict)¶: this function is used to ensure filebeats uses our own timestamp if we logged one

dcicutils.log_utils.set_logging(env=None, es_server=None, in_prod=False, level=30, log_name=None, log_dir=None)¶

Set logging is a function to be used everywhere, to encourage all subsytems to generate structured JSON logs, for easy insertion into ES for searching and visualizing later.

Providing an Elasticsearch server name (es_server) will cause the logs to be automatically written to that server. Setting ‘skip_es’ to True for any individual logging statement will cause it not to be written.

Currently this only JSONifies our own logs, and the bit at the very bottom would JSONify other logs, like botocore and stuff but that’s probably more than we want to store.

Also sets some standard handlers for the following: add_logger_name - python module generating the log timestamper - timestamps… big surprise there convert_ts_ta_at_ts - takes our timestamp and overwrides the @timestamp key used by filebeats, so queries in ES will be against our times, which in things like indexing can differ a fair amount from the timestamp inserted by filebeats. StackInfoRenderer - capture stack trace and insert into JSON format_exc_info - capture exception and insert into JSON