Examples¶
Get Data Raw¶
Initialize a DataMonster
object:
dm = DataMonster(<key_id>, <secret_key>)
Initialize a Datasource
object (we will use a fake small data source from the provider XYZ for the purposes of this example):
ds = dm.get_datasource_by_name(
'XYZ Data Source'
)
Get raw data from the data source, producing a schema and pandas dataframe:
schema, df = dm.get_data_raw(ds)
The schema will contain metadata for the data source, with keys and values showing the roles different columns play in the data. In the case of the above data source:
>>> schema
{
'lower_date': ['period_start'],
'upper_date': ['period_end'],
'section_pk': ['section_pk'],
'value': ['panel_sales'],
'split': ['category']
}
This result indicates that the period_start
column represents the lower date for each data point, and so on.
Next, looking at the dataframe we see:
>>> df.head(2)
category | panel_sales | period_end | period_start | section_pk |
---|---|---|---|---|
Not specified | -0.1139 | 2017-01-01 | 2016-10-02 | 617 |
Not Specified | -0.0523 | 2018-07-02 | 2018-04-02 | 742 |
Category1 | -0.2233 | 2018-07-02 | 2018-04-02 | 742 |
Category1 | -0.4132 | 2019-03-31 | 2019-01-01 | 205 |
Note that the section_pk
column, which represents which company each data point refers to, is currently in the form of
an internal DataMonster identifier and is not particularly useful for external use. To convert to a more usable form, try:
comps = ds.companies
section_map = {}
for comp in comps:
section_map[comp.pk] = {"name": comp.name,
"ticker": comp.ticker}
def map_pk_to_ticker_and_name(section_map, df):
ticker_dict = {
pk: v["ticker"] for pk, v in section_map.items()
}
name_dict = {
pk: v["name"] for pk, v in section_map.items()
}
df["ticker"] = df["section_pk"].map(ticker_dict)
df["comp_name"] = df["section_pk"].map(name_dict)
df = df.drop(["section_pk"], axis=1)
return df
We can now use map_pk_to_ticker_and_name
to produce a more human-readable dataframe. For example:
>>> map_pk_to_ticker_and_name(section_map, df).head(2)
category | panel_sales | period_end | period_start, | ticker, | comp_name |
---|---|---|---|---|---|
Not specified | -0.1139 | 2017-01-01 | 2016-10-02 | PRTY | PARTY CITY |
Not Specified | -0.0523 | 2018-07-02 | 2018-04-02 | RUTH | RUTH’S HOSPITALITY GROUP |
Category1 | -0.2233 | 2018-07-02 | 2018-04-02 | RUTH | RUTH’S HOSPITALITY GROUP |
Category1 | -0.4132 | 2019-03-31 | 2019-01-01 | HD | HOME DEPOT |
Filtering to Specific Dimensions¶
The raw data endpoint supports filtering to specific values for dimensions by applying key value pairs as a dictionary, where the key is the dimension name and the value is a list of possibilities for that dimension. Using the example above, we could do this in a variety of ways.
Filtering to specific companies (in this case, Party City and Home Depot):
>>> filters = {'section_pk': [617, 205]}
>>> schema, df = dm.get_data_raw(ds, filters=filters)
category | panel_sales | period_end | period_start | section_pk |
---|---|---|---|---|
Not specified | -0.1139 | 2017-01-01 | 2016-10-02 | 617 |
Category1 | -0.4132 | 2019-03-31 | 2019-01-01 | 205 |
Filtering to specific dimension values (in this case, "Category1"
):
>>> filters = {'category': ['Category1']}
>>> schema, df = dm.get_data_raw(ds, filters=filters)
category | panel_sales | period_end | period_start | section_pk |
---|---|---|---|---|
Category1 | -0.2233 | 2018-07-02 | 2018-04-02 | 742 |
Category1 | -0.4132 | 2019-03-31 | 2019-01-01 | 205 |
Combining filters across dimensions (in this case, "Category1"
for Ruth’s Hospitality Group):
>>> filters = {'section_pk': [742], 'category': ['Category1']}
>>> schema, df = dm.get_data_raw(ds, filters=filters)
category | panel_sales | period_end | period_start | section_pk |
---|---|---|---|---|
Category1 | -0.2233 | 2018-07-02 | 2018-04-02 | 742 |
Aggregating Results on Different Cadences¶
The raw data endpoint can also take an optional Aggregation
object to request data with a time-based aggregation applied.
For example:
from datamonster_api import DataMonster, Aggregation
dm = DataMonster(<key_id>, <secret_key>)
# Get Company for Home Depot
hd = dm.get_company_by_ticker('hd')
# Get our Data Source
ds = dm.get_datasource_by_name('XYZ Data Source')
# Filter to Home Depot data and aggregate by Home Depot's fiscal quarters
filters = {'section_pk': [hd.pk]}
agg = Aggregation(period='fiscalQuarter', company=hd)
dm.get_data_raw(ds, filters=filters, aggregation=agg)
Get Dimensions for Datasource¶
Assuming dm
is a DataMonster
object, and given this fake data source and company:
datasource = next(
dm.get_datasources(query="Fake Data Source")
)
the_gap = dm.get_company_by_ticker("GPS")
this call to get_dimensions_for_datasource
:
dimset = dm.get_dimensions_for_datasource(
datasource,
filters={
"section_pk": the_gap.pk,
"category": "Banana Republic",
},
)
returns an iterable, dimset
, to a collection with just one dimensions dict.
Assuming from pprint import pprint
, the following loop:
for dim in dimset:
pprint(dim)
prettyprints the single dimension dict:
{
"max_date": "2019-06-21",
"min_date": "2014-01-01",
"row_count": 1998,
"split_combination": {
"category": "Banana Republic",
"country": "US",
"section_pk": 707,
},
}