Senior Data Engineer LatAm
Colombia
,
Costa Rica
About project
Join us at Provectus to be a part of a team that is dedicated to building cutting-edge technology solutions that have a positive impact on society. Our company specializes in AI and ML technologies, cloud services, and data engineering, and we take pride in our ability to innovate and push the boundaries of what's possible.
We are seeking a talented and experienced Senior Data Engineer. As part of our diverse practices, including Data, Machine Learning, DevOps, Application Development, and QA, you will collaborate with a multidisciplinary team of data engineers, machine learning engineers, and application developers. You will encounter numerous technical challenges and have the opportunity to contribute to exciting open-source projects, build internal solutions, and engage in R&D activities, providing an excellent environment for professional growth.
Let's work together to build a better future for everyone!
Requirements:
- 5+ years of hands-on experience in data engineering, with a strong emphasis on Databricks and Apache Spark.
- Expertise in cloud platforms (AWS preferred, GCP, or Azure) preferably with relevant certifications.
- Proficiency with cloud data warehouse technologies (Snowflake/Redshift/Clickhouse).
- Proven experience in handling batch data workflows using Airflow or similar tools.
- Experience in handling real-time data using Kafka and streaming processing frameworks.
- Strong command of Python and SQL for building scalable and efficient data pipelines.
- Strong communication skills with the ability to collaborate effectively in English.
- Exceptional problem-solving skills and the ability to thrive in fast-paced environments.
Nice To Have:
- Solid experience with Infrastructure as Code tools such as Terraform or AWS CloudFormation.
- Databricks Data Engineering certification.
- Experience designing scalable APIs using frameworks such as FastAPI or Flask.
- Familiarity with BI tools such as Power BI, QuickSight, Looker, or Tableau.
- Experience in implementing Data Mesh architecture or distributed data solutions.
- In-depth knowledge of Data Governance principles, including Data Quality, Lineage, Security, and Cost Optimization.
- Familiarity with Machine Learning frameworks (e.g., AWS SageMaker, MLflow) and classical ML tasks (e.g., OCR).
- Experience in building Generative AI applications such as chatbots and RAG systems.
Responsibilities:
- Collaborate with clients to understand their IT environments, applications, business requirements, and digital transformation objectives.
- Collect, process, and manage large volumes of structured and unstructured data from diverse sources.
- Work directly with Data Scientists and ML Engineers to design and maintain robust, resilient data pipelines that power Data Products.
- Define and develop data models that unify disparate data across the organization.
- Design, implement, and optimize ETL/ELT pipelines, ensuring scalability, performance, and maintainability.
- Leverage Databricks and Apache Spark to perform complex data transformations and processing.
- Build, test, and deploy scalable data-driven solutions in collaboration with cross-functional teams.