Senior Data Engineer
Verana Health, a digital health company that delivers quality drug lifecycle and medical practice insights from an exclusive real-world data network, recently secured a $150 million Series E led by Johnson & Johnson Innovation – JJDC, Inc. (JJDC) and Novo Growth, the growth-stage investment arm of Novo Holdings.
Existing Verana Health investors GV (formerly Google Ventures), Casdin Capital, and Brook Byers also joined the round, as well as notable new investors, including the Merck Global Health Innovation Fund, THVC, and Breyer Capital.
We are driven to create quality real-world data in ophthalmology, neurology and urology to accelerate quality insights across the drug lifecycle and within medical practices. Additionally, we are driven to advance the quality of care and quality of life for patients. DRIVE defines our internal purpose and is the galvanizing force that helps ground us in a shared corporate culture. DRIVE is: Diversity, Responsibility, Integrity, Voice-of-Customer and End-Results. Click here to read more about our culture and values.
Our headquarters are located in San Francisco and we have additional offices in Knoxville, TN and New York City with employees working remotely in AZ, CA, CO, CT, FL, GA, IL, LA, MA, NC, NJ, NY, OH, OR, PA, TN, TX, UT , VA, WA, WI. All employees are required to have permanent residency in one of these states. Candidates who are willing to relocate are also encouraged to apply.
Job Title: Senior Data Engineer
As a Senior Data Engineer at Verana Health, you will be responsible to lead key initiatives that will help in achieving business objectives. You will have strong hands-on experience in design & development of cloud services on Databricks/ AWS using PySpark, Spark SQL and other big data framework. Deep understanding of data quality metadata management, data ingestion, and curation. Analyzing the systems and requirements to provide the best technical solutions with regard to flexibility, scalability, and reliability of underlying architecture. Document and improve software testing and release processes across the entire data team.
Job Duties and Responsibilities:
- Build scalable, automated data pipelines in AWS/Databricks utilizing PySpark, SparkSQL.
- Design solutions to solving problems related to ingestion of highly variable data structures in a highly concurrent cloud environment.
- Design , hands on development and implementation of automated data pipelines to enable ETL of large data sets utilizing Pyspark, Apache Airflow, dockers on
- Build microservices for providing abstraction to data ingestion related processes and information.
- Research, perform proof-of-concept and leverage performant database technologies(like Aurora Postgres, Elasticsearch, Redshift) to support end user applications that need sub second response time.
- Retain metadata for tracking of execution details to reproducibility and providing operational metrics.
- Keep improving the integrity of data platform by implementing data quality checks, observability and resiliency in the pipeline.
- Work closely with technology teams to understand processes and policies driving the team goals.
- Development of data services using RESTful API’s which are secure(oauth/saml), scalable(containerized using dockers), observable (using monitoring tools like datadog, elk stack), documented using OpenAPI/Swagger by using frameworks in python/java and automated CI/CD deployment using Github actions.
- Participate in code reviews.
- A minimum of a BS degree in computer science, software engineering, or related scientific discipline.
- A minimum of 4 years of experience in engineering and operationalizing data pipelines with large and complex data sets.
- 3+ years of experience with AWS / Databricks
- 2+ years of experience with orchestrating data pipelines with apache airflow
- Demonstrated ability to build software tools in a collaborative, team oriented environment that are product and customer driven.
- Experience in ETL design, implementation and maintenance, preferably in Pyspark
- Experience with OO programming in a production setting, preferably Python
- Extensive experience with Advanced SQL
- Good understanding of relational databases, NoSQL Databases - Graph Databases, Document Database, etc.
- Utilizes source code version control.
- Hands-on experience with Docker containers and container orchestration.
- Healthcare and medical data experience is a plus.
- Additional experience with modern compiled programming languages (C++, Go, Rust)
- Experience building HTTP/REST APIs using popular frameworks
- Building out extensive automated test suites
- We provide health, vision, and dental coverage for employees
- Verana pays 100% of employee insurance coverage and 70% of family
- Plus an additional monthly $100 individual / $200 HSA contribution with HDHP
- Spring Health mental health support
- Flexible vacation plans
- A generous parental leave policy and family building support through the Carrot app
- $500 learning and development budget
- $25/wk in Doordash credit
- Headspace meditation app - unlimited access
- Gympass - 3 free live classes per week + monthly discounts for gyms like Soulcycle
You do not need to match every listed expectation to apply for this position. Here at Verana, we know that diverse perspectives foster the innovation we need to be successful, and we are committed to building a team that encompasses a variety of backgrounds, experiences, and skills.
Please note pay ranges for major metropolitan areas may be different.