• Jobs
  • About Us
  • professionals
    • Home
    • Jobs
    • Courses and challenges
  • business
    • Home
    • Post vacancy
    • Our process
    • Pricing
    • Assessments
    • Payroll
    • Blog
    • Sales
    • Salary Calculator

0

186
Views
Counting all combinations of values in multiple columns

The following is an example of items rated by 1,2 or 3 stars. I am trying to count all combinations of item ratings (stars) per month.

In the following example, item 10 was rated in month 1 and has two ratings equal 1, one rating equal 2 and one rating equal 3.

inp = pd.DataFrame({'month':[1,1,1,1,1,2,2,2], 
                    'item':[10,10,10,10,20,20,20,20], 
                    'star':[1,2,1,3,3,2,2,3]}
                  )

 month item star
0   1   10  1
1   1   10  2
2   1   10  1
3   1   10  3
4   1   20  3
5   2   20  2
6   2   20  2
7   2   20  3

For the given above input frame output should be:

   month    item    star_1_cnt  star_2_cnt  star_3_cnt
0   1       10      2           1           1
1   1       20      0           0           1
2   2       20      0           2           1

I am trying to solve the problem starting with the following code, which result still needs to be converted to the desired format of the output frame and which gives the wrong answers:

1   20  3   (1, 1)
2   20  3   (1, 1)

Anyway, there should be a better way to create the output table, then finalizing this one:

months = [1,2]
items = [10,20]
stars = [1,2,3]

d = {'month': [], 'item': [], 'star': [], 'star_cnts': [] }

for month in months:
    for star in stars:
        for item in items:
            star_cnts=dict(inp[(inp['item']==item) & (inp['star']==star)].value_counts()).values()
            d['month'].append(month)
            d['item'].append(item)
            d['star'].append(star)
            d['star_cnts'].append(star_cnts)
            
pd.DataFrame(d)

    month   item    star    star_cnts
0   1       10      1       (2)
1   1       20      1       ()
2   1       10      2       (1)
3   1       20      2       (2)
4   1       10      3       (1)
5   1       20      3       (1, 1)
6   2       10      1       (2)
7   2       20      1       ()
8   2       10      2       (1)
9   2       20      2       (2)
10  2       10      3       (1)
11  2       20      3       (1, 1)

​

over 3 years ago · Santiago Trujillo
4 answers
Answer question

0

Series.value_counts + Series.unstack to convert to dataframe

out = inp.value_counts()\
         .unstack('star', fill_value=0)\
         .rename(lambda x: f'star_{x}_cnt', axis=1)\
         .reset_index().rename_axis(columns=None)

print(out)

   month  item  star_1_cnt  star_2_cnt  star_3_cnt
0      1    10           2           1           1
1      1    20           0           0           1
2      2    20           0           2           1
over 3 years ago · Santiago Trujillo Report

0

You can use value_counts on the whole DataFrame and unstack:

out = (
 inp.value_counts()
    .unstack('star', fill_value=0)
 )

Alternatively, you can use crosstab.

As it requires 1D data as input, you can use tuples:

cols = ['month','item']
out = pd.crosstab(inp[cols].apply(tuple, axis=1), inp['star'])

out.index = pd.MultiIndex.from_tuples(out.index, names=cols)

Output:

star        1  2  3
month item         
1     10    2  1  1
      20    0  0  1
2     20    0  2  1
formated output

This was already nicely demonstrated by @ansev

(inp.value_counts()
    .unstack('star', fill_value=0)
    .rename(lambda c: f'star_{c}_cnt', axis=1)
    .reset_index()
    .rename_axis(columns=None)
 )

Output:

   month  item  star_1_cnt  star_2_cnt  star_3_cnt
0      1    10           2           1           1
1      1    20           0           0           1
2      2    20           0           2           1
over 3 years ago · Santiago Trujillo Report

0

One option, with pivot_table :

(inp
.pivot_table(index=['month', 'item'], 
             values = 'star', 
             columns='star', 
             aggfunc='size', 
             fill_value = 0)
.rename(columns = lambda col: f"star_{col}_cnt")
.rename_axis(columns = None)
.reset_index()
)
   month  item  star_1_cnt  star_2_cnt  star_3_cnt
0      1    10           2           1           1
1      1    20           0           0           1
2      2    20           0           2           1

Another option, with groupby(pivot_table is a wrapper around groupby + unstack):

(inp
.groupby(['month', 'item', 'star'])
.size()
.unstack(fill_value = 0)
.rename(columns = lambda col: f"star_{col}_cnt")
.rename_axis(columns = None)
.reset_index()
)

   month  item  star_1_cnt  star_2_cnt  star_3_cnt
0      1    10           2           1           1
1      1    20           0           0           1
2      2    20           0           2           1

Another option, with pd.get_dummies + groupby:

(pd.get_dummies(inp, columns=['star'])
.groupby(['month', 'item'])
.sum()
.add_suffix('_cnt')
.reset_index()
)
   month  item  star_1_cnt  star_2_cnt  star_3_cnt
0      1    10           2           1           1
1      1    20           0           0           1
2      2    20           0           2           1

performance wise, only tests can tell - I'd expect pivot_table to be slower than the groupby

over 3 years ago · Santiago Trujillo Report

0

This seems like a nice problem for pd.get_dummies:

new_df = (
    pd.concat([df, pd.get_dummies(df['star'])], axis=1)
    .groupby(['month', 'item'], as_index=False)
    [df['star'].unique()]
    .sum()
)

Output:

>>> new_df
   month  item  1  2  3
0      1    10  2  1  1
1      1    20  0  0  1
2      2    20  0  2  1

Renaming, too:

u = df['star'].unique()
new_df = (
    pd.concat([df, pd.get_dummies(df['star'])], axis=1)
    .groupby(['month', 'item'], as_index=False)
    [u]
    .sum()
    .rename({k: f'star_{k}_cnt' for k in df['star'].unique()}, axis=1)
)

Output:

>>> new_df
   month  item  star_1_cnt  star_2_cnt  star_3_cnt
0      1    10           2           1           1
1      1    20           0           0           1
2      2    20           0           2           1

Obligatory one- (or two-) liners:

# Renames the columns
u = df['star'].unique()
new_df = pd.concat([df, pd.get_dummies(df['star'])], axis=1).groupby(['month', 'item'], as_index=False)[u].sum().rename({k: f'star_{k}_cnt' for k in df['star'].unique()}, axis=1)
over 3 years ago · Santiago Trujillo Report
Answer question
Find remote jobs

Discover the new way to find a job!

Top jobs
Top job categories
Business
Post vacancy Pricing Our process Sales
Legal
Terms and conditions Privacy policy
© 2025 PeakU Inc. All Rights Reserved.

Andres GPT

Recommend me some offers
I have an error