Provectus is a Silicon Valley-based Artificial Intelligence consultancy and solutions provider. We offer AI/ML expertise to emerging startups and well-established enterprises in the US market.
Currently, we are seeking a highly motivated and self-driven Data engineer.
Large international team
A lot of internal projects that differ in business objectives
- 3-5 years of relevant experience working on big data platforms like Hadoop, Spark, S3, EMR, Presto, etc.
- 3+ years of hands-on experience working with data processing pipelines and SQL queries
- 2-3 years of hands-on experience in modeling and designing schema for data lakes or for RDBMS platforms.
- Experience with RDBMS, including at DBA is level is strongly preferred with Oracle, Mongo, MySQL or combination
- Strong understanding of distributed data processing concepts like data partitioning, bucketing, distributed joins and aggregation, Map/Reduce, file formats, etc.
- Strong experience in tuning and optimizing SQL queries on distributed query processing engine like Spark or Presto
- Good experience with programming languages like Java, Python, etc.
- Familiarity with AWS services, Solution Architect certification preferred.
- Excellent communication and interpersonal skills.
- BS or MS in Computer Science, or equivalent.
- Fast learner, able to pick up new ideas and approaches quickly
Nice to have:
- Experience with streaming frameworks like Kafka, Spark Streaming
- Experience working with REST APIs, Streaming APIs, or other Data Ingress techniques like upload/download methods such as SFTP or via browser, web crawlers, etc.
- Familiarity with other cloud vendor services, like GCS technologies.
- Data Storage: S3, Delta.io, Data file formats like Parquet, ORC, etc.
- Data Processing: Apache Spark & EMR
- RDBMS: Oracle, My SQL, Mongo
- Data Querying: Dremio
- Metastore: Hive Metastore or AWS Glue
- Cloud: AWS required, multi-cloud nice to have