• Data Profiling & Monitoring:
• Develop and enhance data profiling engines to assess completeness, validity, and integrity of datasets.
• Collaborate on AI-driven cataloguing projects, including a Purview Proof of Concept (POC).
• Monitor data lake quality and performance using tools such as Databricks/Cloud based data platforms.
• Machine Learning for Data Quality:
• Apply ML techniques (e.g., Random Forest) to improve duplicate identification and matching.
• Build models to validate taxonomy mapping using LLMs, similar to inferring roles/departments based on job titles.
• Automation and API Integration:
• Automate data update processes using APIs, reducing reliance on manual scripts, extracts, and CSVs.
• Design scalable solutions for automated data reconciliation and integrity checks.
• Data Quality Analysis:
• Conduct detailed data quality assessments, measuring completeness, validity, and consistency across datasets.
• Identify gaps in data pipelines and propose actionable solutions.
• New challenge in a International company. SSC. English. Barcelona.
• Python, SQL, Machine Learning
Expertise
• ML Model Development and Evaluation:
• Understanding statistical distributions and probabilities is key to choosing the right features, algorithms, and evaluation metrics for ML tasks (e.g., precision, recall, F1-score).
• Advanced tasks like enhancing duplicate detection or inferring roles with LLMs may involve probabilistic approaches.
• Data Quality Analysis:
• Quantifying and diagnosing data completeness, validity, and integrity often require statistical tests and descriptive analytics.
• General Problem-Solving:
• Statistical reasoning aids in diagnosing anomalies, reconciling datasets, and creating predictive models.
Must have
• Proficiency in Python (essential for ML and automation tasks).
• Strong understanding of statistics and probability, including hypothesis testing, regression, and probabilistic reasoning.
• Experience with machine learning techniques (e.g., Random Forest, clustering, or NLP-based models).
• Solid grasp of data quality concepts: completeness, validity, reconciliation, and profiling.
• Strong problem-solving skills and the ability to design scalable solutions.
Should-Have:
• Hands-on experience with Databricks (for data lake monitoring and ML implementation).
• Familiarity with data cataloguing tools like Purview or similar platforms.
• Working knowledge of SQL and large datasets.
Could-Have:
• Experience with R for statistical analysis or visualization.
• Knowledge of LLMs for advanced text or taxonomy-related projects.
• Familiarity with data governance frameworks or compliance requirements.
Are you looking for a place to work that inspire and challenge you? A place to unleash your potential? Then the PageGroup Barcelona Shared Service Center (SSC), with its flexible, open culture and meritocratic structure is the place for you.
• Meal vouchers
• Bonus
• Remote working (2 days per weeks)
• Medical insurance (after 6 months)
• Life insurance
• Private pension (after 2 years)
• Flexible compensation (after 6 months)
• July & August 36h per week
• Holidays per year - 25 days
• 20 working days per year to work from abroad
• EAP - since day one
Ver más
¡No te pierdas nada!
Únete a la comunidad de wijobs y recibe por email las mejores ofertas de empleo
Nunca compartiremos tu email con nadie y no te vamos a enviar spam
Suscríbete AhoraÚltimas ofertas de empleo de Desarrollo de Software en Barcelona
Desarrollador/a Android
NuevaCAS TRAINING
BCNC GROUP
Barcelona, ES
Front End Engineer
NuevaWallapop
Barcelona, ES
Banco Sabadell
Barcelona, ES
QA Remote Java Developer
16 ene.Plexus
Data Analyst
15 ene.Redarbor
Sabadell Consumer Finance
Barcelona, ES
NA
Cànoves i Samalús, ES
Climate Impact Scientist
15 ene.Lobelia Earth
Barcelona, ES
Senior Data Engineer
15 ene.HAYS
Barcelona, ES