Given two dataframes df_1
and df_2
, how to aggregate values of df_2
into rows of df_1
such that date
in df_1
is between open
and close
in df_2
print df_1
date A B
0 2021-11-01 0.020228 0.026572
1 2021-11-02 0.057780 0.175499
2 2021-11-03 0.098808 0.620986
3 2021-11-04 0.158789 1.014819
4 2021-11-05 0.038129 2.384590
print df_2
open close location division size
0 2021-11-07 2021-11-14 LDN Alpha 120
1 2021-11-01 2021-11-14 PRS Alpha 450
2 2021-10-14 2021-11-27 HK Beta 340
I have tried this solution to joining my dataframes, now I need to find a way to aggregate. What I did so far is:
df_2.index = pd.IntervalIndex.from_arrays(df_2['open'],df_2['close'],closed='both')
df_1['events'] = df_1['date'].apply(lambda x : df_2.iloc[df_2.index.get_loc(x)])
print(calls['code'].iloc[0].groupby(['location', 'division'])['size'].sum())
location division
LDN Alpha 421.0
LDN Beta 515.0
NY Alpha 369.0
PRQ Alpha 132.0
Gamma 110.0
I need something that looks like this:
date A B LDN_Alpha LDN_Beta LDN_Gamma PRS_Alpha ...
0 2021-11-01 0.020228 0.026572 120 300 0 530
1 2021-11-02 0.057780 0.175499 ...
2 2021-11-03 0.098808 0.620986
3 2021-11-04 0.158789 1.014819
4 2021-11-05 0.038129 2.384590
Where the created columns are the sum of size
grouped by location
and division
Idea is first repeat date range by open
and close
columns, add original columns from df_2
and then use DataFrame.pivot_table
with DataFrame.join
:
df_1['date'] = pd.to_datetime(df_1['date'])
s=pd.concat([pd.Series(r.Index,pd.date_range(r.open, r.close)) for r in df_2.itertuples()])
df = df_2.join(pd.Series(s.index, s).rename('date'))
df = df.pivot_table(index='date',
columns=['location', 'division'],
values='size',
aggfunc='sum',
fill_value=0)
df.columns = df.columns.map(lambda x: f'{x[0]}_{x[1]}')
df = df_1.join(df, on='date')
print (df)
date A B HK_Beta LDN_Alpha PRS_Alpha
0 2021-11-01 0.020228 0.026572 340 0 450
1 2021-11-02 0.057780 0.175499 340 0 450
2 2021-11-03 0.098808 0.620986 340 0 450
3 2021-11-04 0.158789 1.014819 340 0 450
4 2021-11-05 0.038129 2.384590 340 0 450