Working with data

In the past few weeks, I've been quite occupied, despite staying at home most of the time. I find this lockdown a perfect timing for introspection, and would like to share some of my thoughts on the role of data at organisations, a book recommendation and an interesting paper on understanding listener behaviours at Spotify.

Virtual guest lecture#

I was incredibly honoured to deliver a short guest lecture on the topic of organisational changes in a data-driven world at my friend Carys’ course Contemporary Management at RMIT.

Working in the tech industry, I sometimes take it for granted that data is always available and valued by decision makers. However, after some brief research online, I realised that many companies still struggle with being data-driven. Despite seeing improvements in operational efficiency and cost reduction, executives have yet to justify the investments in data-driven innovation and cultural transformation [link].

The key takeaway is that effective data-driven change normally comes from the top, supported by well-architected data infrastructure, which is the backbone of all the other intelligence and innovation data can bring along. I constantly come back to this article by Monica Rogati - The AI Hierarchy of Needs and the chart below:

A senseless gif
source: https://hackernoon.com/the-ai-hierarchy-of-needs-18f111fcc007

Ultimately, the value of data science lies in making informed decisions and solving real world problems with data. As a domain expert, it is your job to help avoid the trap of being data rich but information poor.

Below are my slides for the guest lecture, and you can also find out Canva's data speciality structure and functions towards the very end!

Build a career in data science#

I recently finished reading this book Build a Career in Data Science by Emily Robinson and Jacqueline Nolis, which is extremely insightful and resourceful for anyone working in data science.

The book covers almost everything you want to know about data science as a career in general (nothing to technical here as there are tons of books and articles out there to explore ๐Ÿ˜‰), ranging topics from:

  • job expectations in companies of different types
  • how to gain the skills
  • how to build a portfolio
  • job searching
  • first month on the job
  • doing effective analysis
  • deploying ML models
  • working with stakeholders
  • further advancing your career

In my honest opinion, the book has such details that it really reflects what it is like when you work in the industry, your day-to-day interactions with different people, and what data science is meant to be.

Both at work and in the book, I have to constantly remind myself that asking the right questions and building relationships are the most effective ways to help me become productive.

“Asking questions helps you understand the details of your job more quickly. Building relationships allows you to understand the context of your role in the organization.”

I shared the book to the entire Data Analytics team at Canva, especially our newbies who joined the company at a very special time. Hope you'd find it useful too!

Diversity in music consumption at Spotify#

As I've been head down on exploring the relationship between Canva's premium product user behaviour and the long-term retention on the platform, my friend and colleague Paul shared me a piece on the Algorithmic Effects on the Diversity of Consumption at Spotify.

As a big music fan and a loyal subscriber of Spotify, I find this paper fascinating and enlightening.

In data, we are constantly looking for signals that unveil relationships. Here, the researchers are trying to understand how listening diversity is shaped by algorithmic recommendations, and how diversity can impact long-term user metrics such as conversion (i.e. from free to premium) and retention (i.e. retaining on the platform, or retaining as a premium subscriber).

Long story short, high consumption diversity in music is positively correlated with long-term metrics, and algorithm driven recommendations tend to reduce that. In addition, users exhibiting diverse music consumption over time will do so by switching from recommended to organic consumption.

In the paper, listening diversity is not simply the variety in genres or artists. Similarity of songs are embedded in a vector space using word2vec so that related songs are clustered closer in the space. As a result, songs that frequently appear in the same genre or playlist are closely related to each other.

A senseless gif
source: https://dl.acm.org/doi/10.1145/3366423.3380281

Another interesting concept is to segment Spotify listeners into generalist (users listening to a diverse set of songs) and specialist (users listening to very similar songs) via the generalist-specialist score (GS-score). At any specified time, a user's GS-score is computed as the centre of mass of song vectors, weighted by the number of times they listened. Specialists will have a higher score compared to generalists.

To evaluate consumption diversity and user retention, the paper focused on tens of millions of premium users active in July 2018 and their GS score during the month. By controlling user's activity level, researchers computed the empirical probability that users are active a year later and compared against global baseline average churn rate.

A senseless gif
source: https://dl.acm.org/doi/10.1145/3366423.3380281

This upward trending line towards the right clearly shows that the less diverse in music consumption, the more likely a user is going to churn vs the baseline. The gap for probability of churning is also wider for less active users (in the 10th activity percentile) compared to that for more active users.

Similar trends can be found in conversion from free to premium as well. Relative to the global benchmark, top active generalist listeners are almost 40% more likely to convert into paid subscribers, whereas low-activity and low-diversity users are among those least likely to convert.

A senseless gif
source: https://dl.acm.org/doi/10.1145/3366423.3380281

The paper also finds that typical user's music diversity remains relatively stable over time. Researchers collected data between July 2018 and June 2019 to study the evolution of user behaviour via correlation between first month GS-score and that in each subsequent month. This finding implies that ensuring a diverse music experience at the very beginning of a user journey could have long-term consequences on user engagement on the platform.

A senseless gif
source: https://dl.acm.org/doi/10.1145/3366423.3380281

I hope this is a good summary of the original paper. However, if you're not into paper reading (๐Ÿ™‹๐Ÿปโ€โ™€๏ธtrust me I'm not), here's a tweetstorm by one of the co-authors on Twitter.

What makes Canva's problem more complicated is that there's more than one JTBD (job-to-be-done) vs Spotify (mostly listening to music for general users); and there's no clear measurement of design diversity. However, the paper really sheds light on the direction of my work on exploring user's design diversity, premium feature usage diversity and implications on long-term retention.