hark99
Junior Data Scientist
Wah Cantt, Pakistan
Speaks: English
193 points
current rank #906, best rank #895
Joined Zindi 12 months ago

Programming Languages: Python

Data Science Specializations: Predictive modelling

What do you enjoy about data science and machine learning?

Its transforming everything like Electricity did in the past.

Bio

Developed my expertise in machine learning by completing core specializations in AI at Coursera and still doing more.

Projects

House Price Prediction with Improved ML Techniques

· Estimated Beijing & London City residential prices based on the attributes regardless of the data from previous years.

· Implemented versatile approach for model training with not only ensemble methods but also with modern machine learning techniques like Hybrid and Stacked Generalization with 15% error.

· Performed data cleaning to replace all the Chinese symbols with English words, and exploratory analysis to identify outliers, feature engineering to see the price change of a house far or close to centre of city through spatial data, and finding the statistical significance features through correlation.

Statistical Analysis with Python using National Health & Nutrition Examination Survey Data

· Applied inferential statistics like constructed of confidence intervals for the difference between two populations proportions (male and female smokers) and means (male and female Body Mass Index).

· Conduced a hypothesis test (at the 0.05 level) for the null hypothesis that the proportion of women who smoke is equal to the proportion of men who smoke.

· Applied Statistical-modelling techniques like linear and logistic regression, multilevel and marginal, and Bayesian Inference on different features of NHANES and other datasets.

Loan Predictions with Deep Learning

· Filtered thousands of student loan applications from IPEDS (integrated post-secondary college data) data through Shallow Neural Network Keras model to predict successful ones with 87% accuracy rate.

Image Search Application with OCR, Open CV, and Tesseract

· Built an application for character recognition and object detection in images.

· Returned a contact sheet of images for the searched keywords.

Sentiment Classifier with Python

· Designed a sentiment classifier that calculates the net score on hundreds of positive and negative tweets.

· Analyzed text and csv files to extract only tweets with the support of Python keywords and data types.

Clustering Model with Foursquare API

· Explored tourist or common locations in Manhattan and Downtown Toronto using Foursquare APIs.

· Clustered those locations based on foot traffic activity in the respective neighborhoods.

Energy Consumption in Netherland, a Nonlinear Regression Analysis

· Implemented all the architectural decisions like ETL, Data Cleansing, Feature Engineering, Model Designing (examined a non-linear relationship between the variables) and evaluation in predicting energy consumption for the year 2019.

Accident Risk Places, an Analysis of US Traffic Data

· Identified specific places of different boroughs where the accidents ratio is high.

· Predicted accident risks using text mining and data visualization with the help of Neural Network

IoT Data with Apache Spark, Node-Red & NoSQL

· Configured Node-Red IBM application with NoSQL database to operate IoT devices like Mobile Censor, Washing Machine, and wheel bearing data.

· Executed data analysis on the stored thousands of data through Apache Spark

Kaggle Machine Learning Competitions

· Ranked 234 (from 4245) in Jane Street Market Prediction, a competition to run the model against the future real market trading data. A classification problem handled through by omitting volatility days (outliers) and model training through Light GBM algorithm.

Among the top 4% in Predicting Housing Prices of residential homes in Ames, Iowa through Gradient Boosting technique with Mean Absolute Error Evaluation. A Competition for Kaggle Learner Users

Work & Publications
Aug 2018–present
Self-employed
Work
Deliver machine-learning solutions to different industries' customers. Analyzed different industries dataset for designing algorithms related to predictive modelling and clustering but not limited to automotive, banking, education, energy, housing, medical, and telecom.
Sep 2004–Aug 2008
Air university, BE Electrical (Telecom)
University