Machine Learning Engineer.
Enthusiastic about AI, programming, and constructing intelligent systems aimed at enhancing human experiences.
I am a graduate in Data Science and Artificial Intelligence from Princess Sumaya University for Technology. My passion for comprehending intelligence and its real-world applications in AI is boundless. I hold a profound interest in the field of Reinforcement Learning and LLMS. Presently, I am employed at Samsung Research & Development in Jordan. • Recent Skills and Technologies:
• Worked on multiple projects during this internship: 1) Topic Modeling: This project aims to find the set of topics that best describe a document. • Data: used the ABC NEWS dataset, it contains data of news headlines published over a period of 14 years. • Algorithms: • Stemming and Lemmatization as a text preprocessing step. • TF-IDF. • LDA for classifying Documents.2) Movie Rating Prediction: This project aims to predict the Rating of a movie based on a set of attributes. • Data : used the Netflix Movies and TV Shows This dataset consists of listings of all the movies and tv shows available on Netflix, long with details such as - cast, directors, ratings, release year, duration, etc. • Algorithms : • Data Cleaning, Feature engineering and Preprocessing. • Logistic Regression & Linear Regression. 3) House Price Prediction competition: This project aims to predict Price of a house based on a set of attributes. • Data : used the Housing Prices with 79 explanatory variables describing (almost) every aspect of residential homes in Ames, Iowa, this competition challenges you to predict the final price of each home. • Algorithms & Code : • Data Cleaning, Feature engineering and Preprocessing. • XGBoost. Me :)
This internship was focused to help solve a large data problem that faced the company, Problem Description: the company had more than 100k records of companies that had a lot of attributes for each company including name,domain,contacts, etc. but those records are duplicated in a fuzzy way, in which there are multiple names for the same company because of data entry and old company policy issues. Hence, the task was to reduce the number of duplicated records based on name similarity and data analysis process. Solution: • Technologies: ⚬ Python ⚬ Sql • Approach: ⚬ first i used fuzzy matching algorithms within the fuzzy-wuzzy library to find companies based on their name similarity using a python script. ⚬ then i went deeper into data analysis using Pandas library to find another attributes which can contribute to finding duplicate companies, like: domain name, region, email correspondence, etc. ⚬ after that i classified companies into: ▬ Deleted : company records that are not necessary. ▬ Merged : company records that need to be merged with their fuzzy duplicate. then we made stored procedures in sql to handle these companies. ⚬ By using pyodbc library i was able to connect to the company's live sql server, and with a python script i was able to solve the company's data problems and retain a better and more valuable database for the comapany.
• Extracurricular Activities: ⚬ Former member of PSUT's Data Science club: Helped in organizing and Giving Workshops regarding various topics: python, Kaggle competitions and jupyter-notebook analysis. ⚬ organizer for the AI–Ability initiative by PSUT which taught foundational ML to school students in Jordan. ⚬ Participation in IEEEXtreme 16.0. Certificate ⚬ Calculus one instructor. ⚬ Ana-Ushark initiative volunteer.
⚬ This course is taught by Andew-ng, an adjunct professor at Stanford University; founder & CEO of DeepLearning.AI. ⚬ It consists of the following 5 seperate courses (which, combined, take about 180 hours to complete): 1) Neural Networks and Deep Learning 2) Improving Deep Neural Networks: Hyperparameter Tuning, Regularization and Optimization 3) Structuring Machine Learning Projects 4) Convolutional Neural Networks 5) Sequence Models ⚬ Certificate