Diving into a Data Scientist’s Perspective with Ekta Shah
This interview tells us about the approaching a Machine Learning problem, interview tips, a message for aspiring Data Scientists, and how technologies like AutoML & OpenAI’s GPT-3 are going to affect Data Scientists of the future.
Hi Ekta, please introduce yourself to our readers?
I am Ekta Shah. I am an IT engineer at the core. I have completed my Masters in Data Science from NMIMS University. I have worked with BNP Paribas as a Software Developer. Now, I am working as a Data Scientist at Viacom 18. I have gained knowledge in Data Science and Machine Learning at the core. I have worked with traditional machine learning, image classification, and reinforcement learning problems. I am a Data Science Mentor with Yay – Girlscript Foundation and I mentor data science courses on that platform. I mentor students who are looking towards breaking into the field of Data Science. I am a writer and blogger. I regularly contribute articles to Analytics India Magazine and AnalyticsVidhya.
Please throw some light on your work, related to audio analytics at Viacom 18?
Media companies face this issue of muting Bollywood songs when they are broadcast on their video OTT platforms. This is currently done via third-party applications like Shazam etc. Companies are now looking for building in-house capabilities to solve this problem by using Machine Learning and Deep Learning.
Your take on approaching an ML problem?
I would approach any ML problem by using the following steps –
- Ensure that the data is structured and consistent
- Understanding the different features and information about the data
- Exploratory Data Analysis
- Data Preprocessing
- Data Modelling
What are the basic requirements from a fresher in terms of a job(Data Science), like what are the things freshers must know for sure?
Whenever I am asked this question, I always go back to the first thing that I learned about Data Science i.e. the Venn Diagram which talks about the basic capabilities required to ace and understands the data science concepts.
The first bubble is about Mathematics and statistics which includes your Basic Statistics, Linear Algebra, Calculus, etc. Your basics around Mathematics should be very clear. The second bubble talks about Computer Science which revolves around the basic concepts of Data Structures and Algorithms, Programming and Networking, etc. The third bubble talks about Domain Expertise which is generally achieved via real-time experience in the industry. So knowing these three bubbles is a mandate for all data scientists.
For a Computer Vision engineer interested in stepping into the field of AudioAnalytics, what technologies or tech stack would you suggest making the change? Or should I rephrase it, from the Deep Learning perspective, how is voice data different from image data?
Image Data is a matrix of pixels and can be modeled using CNNs mainly while voice data is continuous data and it has very different properties like amplitude, offsets, etc. So when you want to model Audio data you can use the traditional Machine Learning models as well as simple Neural network models. The algorithms remain the same irrespective of how the structure of the data is. The tech stack to be prepared should be generic so that you can cater to any of the problems.
A lot of people, especially aspiring Data Science students are often really curious as to what the day-to-day work of a Data Scientist is in the workplace? Please share your experience?
On a day to day basis, I deal with a lot of different kinds of data. The data may be structured or unstructured. It’s generally one of the things from the data science lifecycle that data scientists are occupied with like obtaining the data, data preprocessing, EDA, modeling, etc. Most of the time is spent on getting the data ready for processing. Modeling is just 5-10% of the work.
Do you think Data Scientists will be replaced due to the rise of AutoML companies like DataRobot and H2O?
With the advent of technology, automation is something that will happen exponentially. AutoML will help you do the modeling part. But as you look around and see the different kinds of problems in the field of Data Science, data is not easily available or structured. So the data pre-processing pipeline can be automated but Data Scientists will be required to make decisions at every point. There cannot be a complete replacement of people by technology.
Your take on OpenAI’s GPT-3? Will it disrupt the job market for Software Engineers and Machine Learning Engineers?
GPT -3 is a disruptive technology in natural language and text generation. It can do things which we always thought would be done in the future. It will be used for things where it can be used. It will certainly not take up the space of Software Engineers and Machine Learning Engineers currently. Codes will be automatically written in the future sometime but it will bring with it new prospects for working. So technology will lead to change but you need to be in sync with time and keep upskilling yourself depending on advances in technology.
But GPT-3 can get trained very quickly to write SQL queries and Python scripts? Your take on this?
Yes GPT-3 is quite useful for writing SQL queries and Python scripts. This will make work easy but it will not replace them entirely. GPT-3 will just make things a bit easier. All kinds of codes won’t be automated after all.
Most of the speech recognition and real-time transcription software struggle with regional languages (like Hindi) as compared to a more globally-spoken language, say English. What do you think can be the possible reason for this?
English is the global language used in the tech industry which makes every technology piece to be tested on English first which makes English recognition systems much better. IForHindi to reach the same level, we will need to have that much experimentation done with Hindi as a language. There is a clear lack of technology experiments done on the Hindi language.
Your guidance for interview preparation according to an aspirant’s point of view?
Always be prepared well with your basics. Just have a clear understanding of the three bubbles mentioned in one of the questions above. That is enough for any kind of interview.
Your concluding thoughts for the future of Data Science and datamahadev.com as a platform?
The future of Data Science is very bright. Data Science is nothing but the basic application of maths on numerical data. Data science existed even in the 70s and 80s but we have realized its potential and impact recently which has created this huge spark around Data Science. Once companies can see the benefits it can bring to the organization, there will be a larger need for Data Scientists in the future and it will be here to stay.
Datamahadev.com is a great platform to keep yourself upbeat on any kind of technologies be it AI, ML, DL, NLP, Python, etc. It’s good to keep yourself updated about the latest trends in your field and it keeps you in sync with time. This is one such platform where you will get a lot of knowledge in a concise format about all the latest trends. Not only reading articles, but you can also even write and contribute on this platform.