• Data Profiling & Monitoring:
• Develop and enhance data profiling engines to assess completeness, validity, and integrity of datasets.
• Collaborate on AI-driven cataloguing projects, including a Purview Proof of Concept (POC).
• Monitor data lake quality and performance using tools such as Databricks/Cloud based data platforms.
• Machine Learning for Data Quality:
• Apply ML techniques (e.g., Random Forest) to improve duplicate identification and matching.
• Build models to validate taxonomy mapping using LLMs, similar to inferring roles/departments based on job titles.
• Automation and API Integration:
• Automate data update processes using APIs, reducing reliance on manual scripts, extracts, and CSVs.
• Design scalable solutions for automated data reconciliation and integrity checks.
• Data Quality Analysis:
• Conduct detailed data quality assessments, measuring completeness, validity, and consistency across datasets.
• Identify gaps in data pipelines and propose actionable solutions.
• New challenge in a International company. SSC. English. Barcelona.
• Python, SQL, Machine Learning
Expertise
• ML Model Development and Evaluation:
• Understanding statistical distributions and probabilities is key to choosing the right features, algorithms, and evaluation metrics for ML tasks (e.g., precision, recall, F1-score).
• Advanced tasks like enhancing duplicate detection or inferring roles with LLMs may involve probabilistic approaches.
• Data Quality Analysis:
• Quantifying and diagnosing data completeness, validity, and integrity often require statistical tests and descriptive analytics.
• General Problem-Solving:
• Statistical reasoning aids in diagnosing anomalies, reconciling datasets, and creating predictive models.
Must have
• Proficiency in Python (essential for ML and automation tasks).
• Strong understanding of statistics and probability, including hypothesis testing, regression, and probabilistic reasoning.
• Experience with machine learning techniques (e.g., Random Forest, clustering, or NLP-based models).
• Solid grasp of data quality concepts: completeness, validity, reconciliation, and profiling.
• Strong problem-solving skills and the ability to design scalable solutions.
Should-Have:
• Hands-on experience with Databricks (for data lake monitoring and ML implementation).
• Familiarity with data cataloguing tools like Purview or similar platforms.
• Working knowledge of SQL and large datasets.
Could-Have:
• Experience with R for statistical analysis or visualization.
• Knowledge of LLMs for advanced text or taxonomy-related projects.
• Familiarity with data governance frameworks or compliance requirements.
Are you looking for a place to work that inspire and challenge you? A place to unleash your potential? Then the PageGroup Barcelona Shared Service Center (SSC), with its flexible, open culture and meritocratic structure is the place for you.
• Meal vouchers
• Bonus
• Remote working (2 days per weeks)
• Medical insurance (after 6 months)
• Life insurance
• Private pension (after 2 years)
• Flexible compensation (after 6 months)
• July & August 36h per week
• Holidays per year - 25 days
• 20 working days per year to work from abroad
• EAP - since day one
Veure més
No et perdis res!
Uneix-te a la comunitat de wijobs i rep per email les millors ofertes d'ocupació
Mai no compartirem el teu email amb ningú i no t'enviarem correu brossa
Subscriu-te araDarreres ofertes d'ocupació de Desenvolupament de Programari a Barcelona
CAS TRAINING
BCNC GROUP
Barcelona, ES
Front End Engineer
NovaWallapop
Barcelona, ES
Banco Sabadell
Barcelona, ES
QA Remote Java Developer
16 de gen.Plexus
Data Analyst
15 de gen.Redarbor
Data Analyst- Reporting Financiero
15 de gen.Sabadell Consumer Finance
Barcelona, ES
Instalador/a Fibra - Cableado Estructurado
15 de gen.NA
Cànoves i Samalús, ES
Climate Impact Scientist
15 de gen.Lobelia Earth
Barcelona, ES
Senior Data Engineer
15 de gen.HAYS
Barcelona, ES