Examples

Get Data Raw

Initialize a DataMonster object:

dm = DataMonster(<key_id>, <secret_key>)

Initialize a Datasource object (we will use a fake small data source from the provider XYZ for the purposes of this example):

ds = dm.get_datasource_by_name(
    'XYZ Data Source'
)

Get raw data from the data source, producing a schema and pandas dataframe:

schema, df = dm.get_data_raw(ds)

The schema will contain metadata for the data source, with keys and values showing the roles different columns play in the data. In the case of the above data source:

 >>> schema

{
     'lower_date': ['period_start'],
     'upper_date': ['period_end'],
     'section_pk': ['section_pk'],
     'value': ['panel_sales'],
     'split': ['category']
 }

This result indicates that the period_start column represents the lower date for each data point, and so on.

Next, looking at the dataframe we see:

>>> df.head(2)
category panel_sales period_end period_start section_pk
Not specified -0.1139 2017-01-01 2016-10-02 617
Not Specified -0.0523 2018-07-02 2018-04-02 742
Category1 -0.2233 2018-07-02 2018-04-02 742
Category1 -0.4132 2019-03-31 2019-01-01 205

Note that the section_pk column, which represents which company each data point refers to, is currently in the form of an internal DataMonster identifier and is not particularly useful for external use. To convert to a more usable form, try:

comps = ds.companies
section_map = {}
for comp in comps:
    section_map[comp.pk] = {"name": comp.name,
                            "ticker": comp.ticker}

def map_pk_to_ticker_and_name(section_map, df):
    ticker_dict = {
        pk: v["ticker"] for pk, v in section_map.items()
    }

    name_dict = {
        pk: v["name"] for pk, v in section_map.items()
    }

    df["ticker"] = df["section_pk"].map(ticker_dict)
    df["comp_name"] = df["section_pk"].map(name_dict)

    df = df.drop(["section_pk"], axis=1)

    return df

We can now use map_pk_to_ticker_and_name to produce a more human-readable dataframe. For example:

>>> map_pk_to_ticker_and_name(section_map, df).head(2)
category panel_sales period_end period_start, ticker, comp_name
Not specified -0.1139 2017-01-01 2016-10-02 PRTY PARTY CITY
Not Specified -0.0523 2018-07-02 2018-04-02 RUTH RUTH’S HOSPITALITY GROUP
Category1 -0.2233 2018-07-02 2018-04-02 RUTH RUTH’S HOSPITALITY GROUP
Category1 -0.4132 2019-03-31 2019-01-01 HD HOME DEPOT

Filtering to Specific Dimensions

The raw data endpoint supports filtering to specific values for dimensions by applying key value pairs as a dictionary, where the key is the dimension name and the value is a list of possibilities for that dimension. Using the example above, we could do this in a variety of ways.

Filtering to specific companies (in this case, Party City and Home Depot):

>>> filters = {'section_pk': [617, 205]}
>>> schema, df = dm.get_data_raw(ds, filters=filters)
category panel_sales period_end period_start section_pk
Not specified -0.1139 2017-01-01 2016-10-02 617
Category1 -0.4132 2019-03-31 2019-01-01 205

Filtering to specific dimension values (in this case, "Category1"):

>>> filters = {'category': ['Category1']}
>>> schema, df = dm.get_data_raw(ds, filters=filters)
category panel_sales period_end period_start section_pk
Category1 -0.2233 2018-07-02 2018-04-02 742
Category1 -0.4132 2019-03-31 2019-01-01 205

Combining filters across dimensions (in this case, "Category1" for Ruth’s Hospitality Group):

>>> filters = {'section_pk': [742], 'category': ['Category1']}
>>> schema, df = dm.get_data_raw(ds, filters=filters)
category panel_sales period_end period_start section_pk
Category1 -0.2233 2018-07-02 2018-04-02 742

Aggregating Results on Different Cadences

The raw data endpoint can also take an optional Aggregation object to request data with a time-based aggregation applied. For example:

from datamonster_api import DataMonster, Aggregation

dm = DataMonster(<key_id>, <secret_key>)

# Get Company for Home Depot
hd = dm.get_company_by_ticker('hd')

# Get our Data Source
ds = dm.get_datasource_by_name('XYZ Data Source')

# Filter to Home Depot data and aggregate by Home Depot's fiscal quarters
filters = {'section_pk': [hd.pk]}
agg = Aggregation(period='fiscalQuarter', company=hd)
dm.get_data_raw(ds, filters=filters, aggregation=agg)

Get Dimensions for Datasource

Assuming dm is a DataMonster object, and given this fake data source and company:

datasource = next(
    dm.get_datasources(query="Fake Data Source")
)
the_gap = dm.get_company_by_ticker("GPS")

this call to get_dimensions_for_datasource:

dimset = dm.get_dimensions_for_datasource(
    datasource,
    filters={
        "section_pk": the_gap.pk,
        "category": "Banana Republic",
    },
)

returns an iterable, dimset, to a collection with just one dimensions dict. Assuming from pprint import pprint, the following loop:

for dim in dimset:
    pprint(dim)

prettyprints the single dimension dict:

{
    "max_date": "2019-06-21",
    "min_date": "2014-01-01",
    "row_count": 1998,
    "split_combination": {
        "category": "Banana Republic",
        "country": "US",
        "section_pk": 707,
    },
}

How to Data Upload

The API is meant to programmatically refresh existing data uploads. The initial upload that specifies the schema must still be uploaded via UI. Currently the API supports the ability to search for data groups one owns, check the processing status, and uploading valid DataFrames to existing data groups.

for data_group in dm.get_data_groups():
    print(data_group)

Alternatively, one can fetch a data group by its ID and view its status:

dg = dm.get_data_group_by_id(1012)
dg.get_current_status()

To view the columns (and schema) of the data group in order to verify the type of data we wish to re-upload:

dg.columns

To refresh the data, call start_data_refresh with a valid pandas.DataFrame object that matches the schema of the data group.

df = pandas.DataFrame({
    'Start_Date': ['2019-01-01'],
    'end date': ['2019-01-02'],
    'dummy data 1': [1],
    'dummy data_2': [1],
    'Ticker': ['AAP'],
    ...
})
dg.start_data_refresh(df)
dg.get_current_status()

The status of the data group object will change to reflect the latest status

If the schema of dataframe does not match the schema expected by data group, an exception is raised with a useful message.