My First Experience at Data Hack Submit 2019(DHS 2019)

11 min readNov 24, 2019

Data hack Summit 2019(DHS 2019), India’s largest applied artificial intelligence and machine learning conference happened few days back(November 13, 2019 — November 16,2019) at NIMHANS convention center, Bengaluru. Here is my first experience in DHS 2019 from day 1 to day 4

November 13, 2019 — Day 1

We (myself, Sasidhar and silpa balagopal) as a team of 3 from Amrita Vishwa Vidyapeetham, Coimbatore reached NIMHANS convention center around 7.45 am and got our badge from registration desk. When we entered, the installation works were going on by American Express, Intel, H20.ai, Ericsson, Analytics Vidhya and AWS. I just saw two familiar faces (Sudalai Rajkumar and Sanjyam Bhutani) the great, simple kaggle toppers, I have been following them for a very long time through social platforms and always wanted to interact with them about their journey in kaggle and data science field. I introduced myself to them, they were very sweet and gave inputs to improve in spite of their busy schedule. :-). Then the event kick started with a keynote from Kunal Jain (Founder and CEO of Analytics Vidya).

Why is Enterprise Problem-Solving Not like chess or GO or ImageNet or Kaggle Challenges?

Speaker: Dr. Vikas Agarwal, works as a Senior Principal Data Scientist in Cognitive Computing for Oracle Analytics Cloud. His current interests are in automated discovery, adaptive anomaly detection in streaming data, intelligent context-aware systems, and explaining black-box model predictions.

Learning:

Robustness of the model
Importance of Casual dynamics — ML models remain vulnerable to out of distribution data(outliers)
In real time data, problem statement is not clearly defined unlike academic data.
Probabilistic and statistical knowledge is really important for understanding the features of data.

2. Data Science is not about how many models you build

Speaker: Eric Weber is currently a Senior Director and Head of Data Science & Strategy at ListReports, He is an expert in building the best infrastructure to support communication between business and technical leaders to ensure maximum impact of data science and research science efforts.

Learning:

Once you are into a company, try to understand their target and act accordingly.
Express your ideas in easy manner ( data visualization non- technical manner).
Impress others through the impact of your work for company (financially or any other way)

3. Video Encoding & Classification using Deep Learning

Speaker: Axel de Romblay is currently a Machine Learning Engineer at Dailymotion, a global video streaming company. His main domains of interest include Automated Machine Learning (he is the author of MLBox, an auto-ML open source package), Deep Learning, Recommendation systems and Reinforcement learning.He is passionate about Machine Learning and is always eager to contribute to the open source community.

Learning:

Got introduced to video classification
TFrecords to read the frames .tfrecords ( to read the images in faster manner)
RNN and LSTM on Video frames
Average pooling for each frame
Multimodal classification
https://github.com/AxeldeRomblay/DHS2019

4. Evolution of Deep Learning: A biological perspective

Speaker: Jie Mei is a computational neuroscience researcher who has completed her studies at the Ecole normale supérieure, Paris and Charité Universitätsmedizin Berlin. Her research interests include computational neuroscience, neurorobotics, machine learning and data analytics in healthcare and medicine. She is also an active startup advisor.

Learning:

One who is interested in AI in neurosciences should read about jei mei’s work.
The session was full of technical terms (which I didn’t understand much)
Overall talk is about applying fundamental neuroscience principles in AI

5. Borderless AI — A Healthcare Perspective

Speaker: Tarry Singh is Chairman & CEO of DeepKapha Ventures. He is co-founder and AI Researcher of the AI startups deepkapha.ai and curae.ai. He also participates in co-supervising Deep Learning PhD projects related to the above areas with the world’s leading universities in Germany, US, and China.

Learning:

Tarry was talking about limitless goals of AI in healthcare. And successful projects of his team in Cancer and ophthalmology. And their aim in the next one year as deepkapha.ai and curae.ai. I personally talked to him on application of GAN in diabetic retinopathy image and got few inputs for my project.

6. Identifying the operational and Transitional States of a machine

Speaker: Anurag is the VP & Global Business Unit Head — AI and Data Sciences at Nagarro. He is working in time series prediction like Machine Maintenance and statistical modelling for ML models.

Learning:

Machine maintenance prediction
Importance of univariate and multivariate distribution of time series data
Integration of multivariate and auto-encoders in understanding the behavior of data

7. Graph Convolutional Networks for semi-supervised classification

Speaker: Samiran Roy is currently working as a Senior Lead Data Scientist at Envestnet Yodlee. His role involves deploying Deep Learning, Reinforcement Learning, and Semi-Supervised learning-based products.

Learning:

Basics of Graph Convolutional Network (GCN)
Difference between Neural Network and GCN
Tackling the interdependence between the data features
Advantages of GCN
GCN may not work for any exact value prediction
https://github.com/samiranrl/DHS2019_GNN

November 14, 2019 — Day 2

Haptic Learning — Inferring Anatomical Features using Deep Networks

Speaker: Akshay bahadur, software engineer at symantec, India. He is passionate about machine learning in Computer Vision, Deep Learning and opencv. He constantly contributes to open-source libraries in ML field.

Learning:

Normalization
Feeding content using Webcam: MNIST
Feeding data by ‘Writing’ on screen: Quick, Draw and Digit Encoder
Exploring Hand Gestures: Emojinator and Rock + Paper + Scissors
Eye Aspect Ratio — (dlib library)
Facial Recognition — (read FaceNet Paper)
DeepSign — his project on sign language
https://github.com/akshaybahadur21/DHS-2019

2. Using Technology to save Lives

Speaker: Dr. Geetha Manjunath, Founder, CEO and CTO of NIRAMAI. She hold Ph.D in Computer Science. Currently, she is trying to develop a AI based solution for detecting early stage breast cancer from thermal images.

Learning:

Importance of Thermal images in detecting cancer
How important it is to get patents for novelty?
Impact of her team project to the society
How deep learning can make a change in the healthcare sector?

3. Evaluating ML Models for Bias — Build an Interpretable Model using a Financial Dataset

Speaker: Prateek and Rajesh both working for IBM India Software Labs. Both of them are interested in building ML models for Financial data. They are good at tackling the bias from a dataset.

Learning:

Introduced the open source toolkit by IBM
Importance of bias in data
Bias is not always bad to be removed from data
Metrics to detect bias: Statistical Parity difference and Disparate impact
LIME — Locally Interpretable Mode Agnostic Explanations ( need to explore on this)
https://github.com/IBM/AIF360

4. Creating and Deploying a pocket yoga trainer using Deep Learning (One of my favourite sessions in DHS 2019)

Speaker: Apurva and Mohsin are two fantastic humans worked as a team in HealthifyMe. Now both of them are working on creating a tool to teach yoga using AI.

Learning:

Pose detection from a video feed
Tensorflow records to reduce the memory usage
Keypoint regression to predict the pose
Train the user by comparing the pose of trainer and user
Measured through how exact we replicate the pose

5. Automated Portfolio Management using Reinforcement Learning

Speaker: Sonam Srivastava founder of wright research. She is good at investment decision making using artificial intelligence(ML and RL models)

Learning:

Introduction to deep RL, how to define a RL problem?
Introduction to the problem statement and definition of the network architecture
the types of networks used
the scoring functions
how to optimise for costs and other nuances
The demo on the notebook
exploratory data analysis
demonstration of the RL framework + other comparative frameworks
Demonstration of results in comparison with other comparative frameworks
Resources/Papers to find more about deep RL for portfolio optimization
http://www.wrightresearch.in/

6. Image Captioning using Attention Models

Speaker: Souradip Chakraborty and Rajesh Shreedhar Bhat works as Statistical research Analyst and data scientist at Walmart Labs.Souradip’s research interest lies in the area of Representation Learning, Graphical Networks, NLP & Vision and Bhat’s primarily focused on building reusable machine/deep learning solutions that can be used across various business domains.

Learning:

Read the paper show attend and tell
Good introduction for encoder decoder framework
Got idea for kaggle competitions
How to caption an image
https://github.com/rajesh-bhat/dhs_summit_2019_image_captioning

7. Why do we need to start solving our problems spatially?

Speaker: Rishabh is the co-founder of Locale.ai, where he provides geospatial analytics to companies to help them analyze their ground performance using location data. Think providing Uber-like location intelligence to every company with moving supply and/or demand.

Learning:

Using ML techniques for spatial data
How Swiggy/ Uber finds a location cluster
Finding frauds in particular geographical location

8. Demystifying BERT — How to interpret NLP models?

Speaker: Logesh is a Data Scientist at Mopro, building NLP products with awesome people. His work involves bringing Deep learning-based NLP solutions from early prototypes to production.

Learning:

Interpretation of BERT model
Encoder-decoder architecture in NLP
Token based sentiment analysis
Target based Sentiment Analysis
Bertviz library to visualize BERT model

November 15, 2019 — Day 3
The best day in DHS 2019 — Learnt lot of new ideas — GAN, Kaggle Grandmasters input, discussion with Pavel, SRK and Rohan Rao, Wow!!

Deploying DL models in Production

Speaker: Vishnu Subramanian currently works as an independent AI consultant, helping companies with AI strategy, architect solutions, and mentor teams. He has done several AI/Big data-related projects for large automobile, travel and retail companies. He also has finished in the top 2 % in several Kaggle competitions.

Learning:

To deploy one can use FLASK API, AWS Lambda, Kubernetees
Mixed Precision — using float16 instead of float32
Quantization launched few days back ( read about this)
Both mixed precision and Quantization are used to reduce the size of the model

2. Top Hacks from Kaggle GrandMaster

Speaker: Pavel pleskov currently working in H2O.ai. Pavel became the 1st in Russia and the 3rd in the world competition ranking at Kaggle. Since then, he was professionally participating and helping to organize online competitions and hackathons in DS / ML.

Learning:

Blending techniques
Imgaug for Augmentation
Catalyst
Fastai
H20.ai(already stacked model)
https://gitlab.com/ppleskov/datahack-summit-2019

3. Image ATM(Automatic Tagging Machine) — Image Classification for Everyone

Speaker: Dat is the Head of AI at Axel Springer Ideas Engineering, the innovation unit of Axel Springer SE which is the largest digital publishing house in Europe. His team mainly focused on computer vision problems from teaching a computer to understand aesthetics to upscaling low-resolution images.

Learning:

Learn how to use Image ATM e.g. which kind of input is needed, preprocessing, training and then evaluation
Learn how you can contribute to it as well
Learn about our image classification problems
https://github.com/idealo/imageatm — idealo’s open source code

4. Morphing images using Deep Generative Models(GANs)

Speaker: Xander is currently head of applied ML-research at Belgian AI scale-up ML6. Xander has a wide expertise in domains such as computer vision (object tracking, optical character recognition, image classification, ..) , natural language processing (chatbots, text classification, etc.) and many others using open source libraries like TensorFlow and PyTorch in combination with powerful compute resources on the Google cloud platform. Xander’s latest side-project involves applying Generative Adversarial Networks (GANs) to new types of content creation for digital artists

Learning:

This session was a big eye opener for GAN usage- Style GAN
Introduction to GAN
Style GAN algorithm and its variation from vanilla GAN
How one can generate random noise for a particular image?
How to morph face from another image?
https://artbreeder.com/
https://www.youtube.com/channel/UCNIkB2IeJ-6AmZv7bQ1oBYg

5. What sets the top hackers apart?

Speakers:

Sourabh Jha is an Associate at JP Morgan Chase and Co.He actively participates in data science competitions on Kaggle (highest global rank 252) and Analytics Vidhya (highest global rank 76).
Kiran R is the director of the Data Sciences CoE for VMware globally. He drives data sciences & advanced analytics projects across sales & marketing, digital, partner, pricing, and e-commerce in his functional role.
Mohsin Khan (a.k.a. ‘Tezdhar’) currently works as a Machine Learning Engineer @ HealthifyMe.Mohsin Khan has made winning Data Science competitions a habit at Analytics Vidhya in the past couple of years and has won several most competitive hackathons namely Innoplexus Hackathon, Capillary Machine Learning Hackathon and Churn Prediction Hackathon.
Sudalai Rajkumar (aka SRK) is a Data Scientist at H2O.ai Inc, building Driverless AI, an automated machine learning platform. He is currently leading the NLP efforts for this platform.he has solved a lot of interesting data science problems in multiple domains including finance, customer support, e-commerce, health care, transportation. He is a Kaggle Grandmaster in Competitions & Kernels section. He is currently ranked #3 on AV’s platform as well.
Sahil Verma is a Senior Data Scientist at Aditya Birla Finance Limited. His work is primarily focused on creating machine learning solutions in the Fintech domain for private banks, NBFCs and Startups. He is a Chemical Engineering graduate from IIT Delhi and is currently ranked 2nd on Analytics Vidhya’s global leaderboard.
Rohan Rao (a.k.a. ‘vopani’) currently works as a Data Scientist @ H2O.ai. He is a post-graduate in Applied Statistics from IIT-Bombay and part of elite group of Kaggle Grandmasters. His core expertise lies in driving, pipelining and building Machine Learning solutions hands-on.

Learning:

Rohan rao:

Understand different industrial problems from kaggle
Applied Maths & statistics(background)

SRK:

Team up with others to learn more
Linear Logistic Regression
2013–14 Kaggle
Don’t try harder problems first
Mechanical Engineer

Sourabh :

Material Science
Learn from other kernels to improve yourself

Moshin:

Aerospace Engineer
He applied ml in machine maintenance

Common Tips:

Never give up
Hands on practice
Create baseline model as soon as you can
Get out of your comfort zone
Yeah! GPU is an advantage for image based problems
Try transfer learning

6. Exploring PyTorch for AI Assistance in Medical Imaging

Speaker: Abhishek is currently working as a Deep Learning Researcher at CVEDIA. He has worked with Predible Health as a Computer Vision Engineer where he built state of the art segmentation algorithms/models in Computer Vision for Medical Imaging.

Learning:

Expected something in Medical images
But it was full of basics of pytorch and CNN

7. Generating Synthetic images from textual description

Speaker: Shibsankar is a Data Scientist at Walmart Labs, currently improving “Search” relevance in eCommerce through Machine Learning and Deep Learning.

Learning:

Complete basics of GAN
How to generate an image based on text description
Train the gan with both text and image for this application ( multi-model training)
Triplet loss
https://github.com/sanku-lib/text-to-image/blob/master/text-to-image-training.ipynb

November 16, 2019 — Day 4

Building Scalable Recommendation Systems

Speaker: Anand Mishra is Head of Engineering at Analytics Vidhya. He is an entrepreneur, an engineer and a data science professional all rolled into one.

With lots of expectations went to hotel La marvella for the last day DHS 2019, started nicely with application of math on recommendation systems. After a break he sticked with same math discussion, and hence no new ideas were gained (which i was expecting from a session). For a beginner it might be a good eye opener.

https://github.com/RamjiB/Building-Scalable-Recommendation-System-DHS

https://github.com/recommendationsystemworkshop

My First Experience at Data Hack Submit 2019(DHS 2019)

Written by Ramji Balasubramanian