Assistant Data Scientist - Tenders Global

Assistant Data Scientist

United Nations High Commissioner for Refugees

tendersglobal.net

<!–

Description

–>

UNHCR’s registration system (proGres) exists to establish the identity of refugees and other forcibly displaced and stateless persons upon first encounter. This is essential to deliver protection and assistance subsequently. Registration data covers biographical data, demographic characteristics, socio-economic variables, and specific needs. Registration data underpins critical delivery of services from cash delivery to travel to third countries for resettlement. In a number of countries, UNHCR registration data serves as KYC (know-your-customer) data. UNHCR’s registration system (proGres) includes 17 million registered individuals and is managed by over 15,000 operators worldwide, covering over 130 countries.
To ensure the quality of the data contained in this system and to maintain the integrity of the data, the use of data science techniques is increasingly sought after, for instance, to identify outliers, including in the context of fraud detection. To this effect, data science methods need to be developed and implemented to advance on integrity, including faster detection of potential duplicates or the unusual use of the system itself. 
In addition, proGres is used by UNHCR teams around the world to support the identification of refugees for the purpose of finding solutions, such as resettlement to third countries. This is presently done using qualitative data and interview processes. For the purpose of this assignment, data science models will be scoped to predict the probability of individuals to be successful in a resettlement process, thereby allowing for a segmentation approach and shortening the overall identification period for the majority of the individuals under consideration.
Duties and responsibilities:
In order to carry out this function, the Assistant Data Scientist will have the following duties: 
  • Collaborate with the team on the development of the data science methodology to support deduplication within UNHCR registration data, with responsibilities including:
  • Data familiarization: Gain a deep understanding of the UNHCR registration data structure and data entities as well as deduplication policy and processes as applied on a daily basis in UNHCR operations.
  • Database understanding: Analyze data entities, relationships, and data quality issues within registration data.
  • Identifying and developing suitable models: Utilize machine learning algorithms to implement deduplication, integrating data from the registration data.
  • Collaboration and communication: Continuously collaborate with relevant teams within the organization on registration data to gather requirements and ensure alignment.
  • Identify and pre-process data: Utilize data cleansing and transformation techniques to prepare data from registration data.
  • Model development and optimization: Training and testing machine learning models for the detection of duplicate case entries within the registration data, incorporating data from proGres.
  • Documentation: Ensure thorough documentation of all processes, including model selection, data preprocessing steps, and model performance, as well as the integration of registration data.
  • Support for Fraud Detection Tools
  • Collaborate with the registration team to support the development of ML/AI-enhanced fraud detection tools, focusing on operator-driven manipulations and fraud.
  • Conceptualize a scalable set of analysis products, which can be deployed to all 130+ country operations using the system.
Resettlement model:
  • Understand the complexities of the resettlement process, selection criteria and the nuances of UNHCR’s proGres data that inform the process.
  • Draw on good practices in operations, having developed a data-driven resettlement selection process.
  • Identify relevant data entities within proGres and recognize challenges and caveats in resettlement data.
  • Support with the development, testing, and evaluation of predictive models capable of assessing the probability of individual resettlement based on historical proGres data and selected features.
  • Documentation: Ensure comprehensive documentation of the entire process within Microsoft Power Platform, including key steps and limitations of AI models and the use of proGres data. 
Monitoring and Progress Controls
Main deliverables:
  • Support the development and deployment of a deduplication tool for identifying duplicates in UNHCR’s registration database.
  • Support the development of a predictive model based on registration data to assess statistical probabilities for the purpose of expediting resettlement identification. 
  • Support the development of fraud detection algorithms based on registration data, specifically log data.
  • Communicate with other internal teams to develop collaborative frameworks and to harmonize ways of working together.
Qualifications and Experience:
Education:
  • University Degree in statistics, data science, mathematics, economics, or other quantitative social sciences.
Required Work Experience:
  • 1 year of relevant experience with Undergraduate degree; or no experience with Graduate degree; or no experience with Doctorate degree
  • Demonstrated experience in data collection, management, cleaning, processing, and applied data analysis.
  • Demonstrated experience user of the statistical programming languages R and/or Python.
  • Demonstrated experience with using SQL.
  • Demonstrated experience working with alternative data sources and/or statistical learning methods.
  • Demonstrated experience in ensuring the operational relevance of analytical and/or research work.
  • Demonstrated experience writing technical reports.
  • Demonstrated experience presenting work to both technical and non-technical audiences. 

<!—

<!–

–>

To help us track our recruitment effort, please indicate in your cover/motivation letter where (tendersglobal.net) you saw this job posting.

Job Location