Senior Data Engineer (Remote)

  • Full Time
  • Remote
  • Mid Level

Want to help everyday Americans invest and build wealth? Financial inequality is increasing, and too many people are getting left behind. At Stash, we are passionate about democratizing wealth creation through education, advice, and products that help customers achieve greater financial freedom.

At Stash, data is at the core of how we make decisions and build great products for millions of users. As a Data Engineer you will be a part of our Data Platform Team which is leading the architectural design decisions and implementation of a modern data infrastructure at scale. You will build distributed services and large scale processing systems that will support various teams to work faster and smarter. You will partner with Data Science to help productionize machine learning models and algorithms into actual data driven products that will help make smarter products for our users.

Tools and technologies in our tech stack (evolving):
Hadoop, Yarn, Spark, MongoDB, Hive
AWS EMR/EC2/Lambda/kinesis/S3/Glue/DynamoDB/API Gateway, Redshift
ElasticSearch, Airflow, and Terraform.
Scala, Python

Tech stack (evolving):

Spark, Scala, Python, Kafka, AWS EMR, Hive, Redshift, Lambda, SNS, SQS, S3, Looker, DynamoDB, CircleCI, Terraform.

What you’ll do:

  • Contribute to the design/architecture new initiatives such as real time streaming pipelines, tooling around data governance, build job orchestration abstractions to manage resources on AWS
  • Collaborate with the team to build tools for data science/marketing teams
  • Design integration pipelines for new data sources and improve existing pipelines to perform efficiently at scale
  • Provide technical guidance to the team
  • Leverage best practices in continuous integration and deployment to our cloud-based infrastructure
  • Optimize data access and consumption for our business and product colleagues

Who you are:

  • 4+ years of professional experience working in data warehousing, data architecture, and/or data engineering environments, especially using spark, hadoop, hive etc with solid understanding of streaming pipelines.
  • At least 1+ years of experience in streaming pipeline development
    Proficiency in at least one high-level programming language Scala
  • Good understanding of databases
  • BS / MS in Computer Science, Engineering, Mathematics, or a related field
  • You have built large-scale data products and understand the tradeoffs made when building these features
  • You have a deep understanding of system design, data structures, and algorithms
  • You have an excellent knowledge of distributed computing frameworks such as Hadoop MapReduce, Spark.
  • You have a strong knowledge of following AWS infrastructure – EMR, S3, Redshift
  • You have strong understanding of data quality, governance
  • You are a team player, self-driven, highly motivated individual who loves to learn new things

Gold stars:

  • Experience in Machine Learning infrastructure
  • Experience in Search Engines

**no recruiters please**

Tagged as: airflow, AWS, hadoop, lambda, python, redshift, scala, spark, terraform, yarn

To apply for this job please visit grnh.se.