Why R? Text Mining Hackathon Summary

It’s been 2 weeks since the end of a Text Mining 2020.whyr.pl/hackathon/ at Why R?. This was a promotional event before the Why R? 2020 conference that aimed to promote knowledge related to text analyses. We prepared 4 various challenges so that teams were able to pick tasks that suit their skills best. At the beginning of this week we published videos presenting winning solutions on youtube.com/WhyRFoundation channel. If you are interested in the course of the event follow this blog post.

We had an initial 300 participants interested in joining the hackathon before we even announced the theme and challenges. 2 weeks before the event start we announced it’s going to relate to the text mining and that we are aiming to get group submissions of a size of 4 or 5 team members. Around 60 people from all over the world formed 13 teams, from which 7 teams submitted at least one solution for any of the challenges presented at the hackathon. We’ve seen people from Australia, India, Germany, Poland, Senegal, US, Canada, Nepal, Spain, UK, Brazil!

Intro

A language agnostic competition devoted to text mining where every machine learning practitioner could find challenges to test his/her team!

At this hackathon you could scale the level of difficulty and the area of challenges on your own. Depending on skills and the time that you had you could tune the fun on your own!

Table of Contents

1. Why text mining? 2. Why Hackathon? 3. Challenges
4. Competition Rules 5. Mentors and Judges 6. Sponsor
7. Talks 8. For whom? 9. Organizers

Winners

Challenge 1 - Predictions

Challenge 2 - Segmentation

Challenge 3 - Churn

Challenge 4 - Text Analysis / Revealing the content

Why text mining?

Text mining is widely known within machine wandering practitioners. The increased interest in the text mining is caused by an augmentation of internet users and by rapid growth of the internet data which is said that in a great amount is a text data. Extracting information from articles, news, posts and comments have became a desirable skill but what is even more needful are tools for text mining models diagnostics and visualizations.

Even though there are a lot of tools, books and webinars available online there is still a place for the improvement and development.

Why Hackathon?

Hackathons are events where enthusiasts of a specific topic gather in one place to work together on challenges that arose for a particular community.

Hackathons tend to be timepressure events, where solutions need to be created quicky and active cooperation between participants is necessary. To set the pace of the event, participants are divided into teams which compete to prepare the most valuable solution and win a prize.

For a participant such an undertaking is a great chance to:

  • develop the ability to work in group
  • learn from more experienced practitioners
  • take part in lectures and workshops related to text mining
  • have a remarkable networking experience
  • participate in healthy and fair competition
  • test skills in comparison with the others
  • win prizes
  • learn new data analysis techniques
  • use new tools
  • brainstorm new business use cases

Challenges

Challenges and the guidance for solutions are published here

github.com/WhyR2020/hackathon

Competition Rules

Since the event was a competition with symbolic prices, we wanted like to grade solutions. Solutions were sent as videos (we aimed at max 5 min! per video). Videos should aim to present insights developed to solve stated challenges. Each team could send a solution for each challenge in a separate video (one video for one challenge). Details about hackathon criterias were announced at the opening of the hackathon and are below!

  • Whether there are at least 3 people in the team?
  • Is the presentation based on HackeR News data?
  • Is the solution a result of the teamwork?
  • Is the solution hosted in a public place?
  • Is this solution useful for the imaginary business team at Hacker News or has potential/clear business applications/story?
  • Is there a clear business problem/story that you are explaining?
  • How attractive is the use case?
  • How well are you able to present your solution?
  • Is the solution explainable?
  • Does the used solution have any statistical validation?

Presented solution should be submitted as a video. It was a nice to have if a solution is based on a presentation or a dashboard. For challenges 2-4 the winning solution was chosen based on insightfulness and usefulness of identified patterns. For challenge 1 the winning solution was chosen based on a cost function however we wanted to know how did teams get into such predictions?

Mentors and Judges

Speakers of the hackathon: Julia Silge, Kenneth Benoit.

Judges and mentors: Mateusz Zawisza, Piotr Zielonka, Marcin Kosinski, Maciej Eder, Michał Burdukiewicz.

McKinsey Analytics in Poland combines advanced data analytics solutions with in-depth industry and business knowledge, including multiple sectors such as commerce, banking, insurance, telecommunications, industrial production and heavy industry. McKinsey data scientists and architects, together with machine learning and data engineers, complement strategic and operational consulting and provide clients with advanced and robust data-driven solutions.

McKinsey Analytics experts specialize in many different areas: statistical learning, deep learning, evolutionary and multi-criteria optimization, multi-agent simulations, game theory, reinforcement learning, advanced econometrics, causal & Bayesian inference, uplift modelling, Explainable Artificial Intelligence, visualization and data engineering.

We are all looking forward to share with you some insights on how to identify and capture the most value and meaningful insights from data, and turn them into competitive advantages!

Talks

  • Julia Silge Data visualization for machine learning practitioners
  • Kenneth Benoit Why you should stop using other text mining packages and embrace quanteda
  • Why McKinsey Analytics? And how we use technology, data and global capabilities to serve our clients?

For whom?

We strongly encouraged people with analytic thinking skills to participate in the event. Data analysts, developers, storytellers, BI consultants, web designers, researchers, data enthusiast were all welcome since they could learn a lot from one another!!

  • Had a good understanding of text mining challenges?
  • Eager to got failiar with text mining concepts and good practices?
  • Wanted to meet people devoted to text analyses?
  • Enthusiastic about presenting insights related to text mining analysis?

The event was made just for you!

Event details

  • Place: Remote Global Challenge
  • Date: 23.09.2020 - 24.09.2020
  • Started - 5:00pm UTC 23.09.2020
  • Ended - 5:30pm UTC 24.09.2020

  • Talks during the event
    • 2020-09-23 5:00pm UTC Julia Silge Data visualization for machine learning practitioners
    • 2020-09-24 1:00pm UTC Kenneth Benoit Why you should stop using other text mining packages and embrace quanteda
    • 2020-09-24 5:30pm UTC Why McKinsey Analytics? And how we use technology, data and global capabilities to serve our clients?
  • For? For everyone interested in text analysis and data science!
  • Tech? Any software that helps you win can be used!

Organizers

Why R? Foundation - whyr.pl.