Examples¶

Get Data Raw¶

Initialize a DataMonster object:

dm = DataMonster(<key_id>, <secret_key>)

Initialize a Datasource object (we will use a fake small data source from the provider XYZ for the purposes of this example):

ds = dm.get_datasource_by_name(
    'XYZ Data Source'
)

Get raw data from the data source, producing a schema and pandas dataframe:

schema, df = dm.get_data_raw(ds)

The schema will contain metadata for the data source, with keys and values showing the roles different columns play in the data. In the case of the above data source:

 >>> schema

{
     'lower_date': ['period_start'],
     'upper_date': ['period_end'],
     'section_pk': ['section_pk'],
     'value': ['panel_sales'],
     'split': ['category']
 }

This result indicates that the period_start column represents the lower date for each data point, and so on.

Next, looking at the dataframe we see:

>>> df.head(2)

category	panel_sales	period_end	period_start	section_pk
Not specified	-0.1139	2017-01-01	2016-10-02	617
Not Specified	-0.0523	2018-07-02	2018-04-02	742
Category1	-0.2233	2018-07-02	2018-04-02	742
Category1	-0.4132	2019-03-31	2019-01-01	205

Note that the section_pk column, which represents which company each data point refers to, is currently in the form of an internal DataMonster identifier and is not particularly useful for external use. To convert to a more usable form, try:

comps = ds.companies
section_map = {}
for comp in comps:
    section_map[comp.pk] = {"name": comp.name,
                            "ticker": comp.ticker}

def map_pk_to_ticker_and_name(section_map, df):
    ticker_dict = {
        pk: v["ticker"] for pk, v in section_map.items()
    }

    name_dict = {
        pk: v["name"] for pk, v in section_map.items()
    }

    df["ticker"] = df["section_pk"].map(ticker_dict)
    df["comp_name"] = df["section_pk"].map(name_dict)

    df = df.drop(["section_pk"], axis=1)

    return df

We can now use map_pk_to_ticker_and_name to produce a more human-readable dataframe. For example:

>>> map_pk_to_ticker_and_name(section_map, df).head(2)

category	panel_sales	period_end	period_start,	ticker,	comp_name
Not specified	-0.1139	2017-01-01	2016-10-02	PRTY	PARTY CITY
Not Specified	-0.0523	2018-07-02	2018-04-02	RUTH	RUTH’S HOSPITALITY GROUP
Category1	-0.2233	2018-07-02	2018-04-02	RUTH	RUTH’S HOSPITALITY GROUP
Category1	-0.4132	2019-03-31	2019-01-01	HD	HOME DEPOT

Filtering to Specific Dimensions¶

The raw data endpoint supports filtering to specific values for dimensions by applying key value pairs as a dictionary, where the key is the dimension name and the value is a list of possibilities for that dimension. Using the example above, we could do this in a variety of ways.

Filtering to specific companies (in this case, Party City and Home Depot):

>>> filters = {'section_pk': [617, 205]}
>>> schema, df = dm.get_data_raw(ds, filters=filters)

category	panel_sales	period_end	period_start	section_pk
Not specified	-0.1139	2017-01-01	2016-10-02	617
Category1	-0.4132	2019-03-31	2019-01-01	205

Filtering to specific dimension values (in this case, "Category1"):

>>> filters = {'category': ['Category1']}
>>> schema, df = dm.get_data_raw(ds, filters=filters)

category	panel_sales	period_end	period_start	section_pk
Category1	-0.2233	2018-07-02	2018-04-02	742
Category1	-0.4132	2019-03-31	2019-01-01	205

Combining filters across dimensions (in this case, "Category1" for Ruth’s Hospitality Group):

>>> filters = {'section_pk': [742], 'category': ['Category1']}
>>> schema, df = dm.get_data_raw(ds, filters=filters)

category	panel_sales	period_end	period_start	section_pk
Category1	-0.2233	2018-07-02	2018-04-02	742

Aggregating Results on Different Cadences¶

The raw data endpoint can also take an optional Aggregation object to request data with a time-based aggregation applied. For example:

from datamonster_api import DataMonster, Aggregation

dm = DataMonster(<key_id>, <secret_key>)

# Get Company for Home Depot
hd = dm.get_company_by_ticker('hd')

# Get our Data Source
ds = dm.get_datasource_by_name('XYZ Data Source')

# Filter to Home Depot data and aggregate by Home Depot's fiscal quarters
filters = {'section_pk': [hd.pk]}
agg = Aggregation(period='fiscalQuarter', company=hd)
dm.get_data_raw(ds, filters=filters, aggregation=agg)

Get Dimensions for Datasource¶

Assuming dm is a DataMonster object, and given this fake data source and company:

datasource = next(
    dm.get_datasources(query="Fake Data Source")
)
the_gap = dm.get_company_by_ticker("GPS")

this call to get_dimensions_for_datasource:

dimset = dm.get_dimensions_for_datasource(
    datasource,
    filters={
        "section_pk": the_gap.pk,
        "category": "Banana Republic",
    },
)

returns an iterable, dimset, to a collection with just one dimensions dict. Assuming from pprint import pprint, the following loop:

for dim in dimset:
    pprint(dim)

prettyprints the single dimension dict:

{
    "max_date": "2019-06-21",
    "min_date": "2014-01-01",
    "row_count": 1998,
    "split_combination": {
        "category": "Banana Republic",
        "country": "US",
        "section_pk": 707,
    },
}

Examples¶

Get Data Raw¶

Filtering to Specific Dimensions¶

Aggregating Results on Different Cadences¶

Get Dimensions for Datasource¶

datamonster-api

Navigation

Related Topics