Databricks Data Engineer (Ci/Cd En Azure)

Detalles de la oferta

About Us:
At Derevo, we are dedicated to empowering businesses and individuals to unleash the value of data within organizations. We achieve this by implementing analytics processes and platforms with a comprehensive approach covering the entire cycle necessary to achieve it.

Derevo started in 2010 with a simple idea - to create more than a company, but a community and a space where everyone has the opportunity to build a dream.

At Derevo, we believe in human talent that is free and creative. Being human is our superpower!

**Databricks Data Engineer**

**Summary**:
The desired profile should have at least 5 years hands-on experience in designing, establishing, and maintaining data management and storing systems. Skilled in collecting, processing, cleaning, and deploying large datasets, understanding ER data models, and integrating with multiple data sources. Efficient in analyzing, communicating, and proposing different ways of building Data Warehouses, Data Lakes, End-to-End Pipelines, and Big Data solutions to clients, either in batch or streaming strategies.

**Technical Proficiencies**:

- SQL:
Data Definition Language, Data Manipulation Language, Intermediate/advanced queries for analytical purpose, Subqueries, CTEs, Data types, Joins with business rules applied, Grouping and Aggregates for business metrics, Indexing and optimizing queries for efficient ETL process, Stored Procedures for transforming and preparing data, SSMS, DBeaver
- Python:
Experience in object-oriented programming, Management and processing datasets, Use of variables, lists, dictionaries and tuples, Conditional and iterating functions, Optimization of memory consumption, Structures and data types, Data ingestion through various structured and semi-structured data sources, Knowledge of libraries such as pandas, numpy, sqlalchemy, Must have good practices when writing code
- Databricks / Pyspark:
Intermediate knowledge in

Understanding of narrow and wide transformations, actions, and lazy evaluations

How DataFrames are transformed, executed, and optimized in Spark

Use DataFrame API to explore, preprocess, join, and ingest data in Spark

Use Delta Lake to improve the quality and performance of data pipelines

Use SQL and Python to write production data pipelines to extract, transform, and load data into

tables and views in the Lakehouse

Understand the most common performance problems associated with data ingestion and how to

mitigate them

Monitor Spark UI: Jobs, Stages, Tasks, Storage, Environment, Executors, and Execution Plans

Configure a Spark cluster for maximum performance given specific job requirements

Configure Databricks to access Blob, ADL, SAS, user tokens, Secret Scopes and Azure Key Vault

Configure governance solutions through Unity Catalog and Delta Sharing

Use Delta Live Tables to manage an end-to-end pipeline with unit and integrations test
- Azure:
Intermediate/Advanced knowledge in

Azure Storage Account:
Provision Azure Blob Storage or Azure Data Lake instances

Build efficient file systems for storing data into folders with static or parametrized names, considering possible security rules and risks

Experience identifying use cases for open-source file formats like parquet, AVRO, ORC

Understanding optimized column-oriented file formats vs optimized row-oriented file formats

Implementing security configurations through Access Keys, SAS, AAD, RBAC, ACLs

Azure Data Factory:
Provision Azure Data Factory instances

Use Azure IR, Self-Hosted IR, Azure-SSIS to establish connections to distinct data sources

Use of Copy or Polybase activities for loading data

Build efficient and optimized ADF Pipelines using linked services, datasets, parameters, triggers, data movement activities, data transformation activities, control flow activities and mapping data flows

Build Incremental and Re-Processing Loads
- CICD (deseable)

**Process Automation**: Automate the deployment, scaling, and de-scaling of Azure Databricks clusters using tools like ARM Templates, Terraform, or Azure DevOps Pipelines.

**Monitoring and Performance Optimization**: Set up alerts and monitor key performance metrics in Azure Databricks using Azure Monitor and other monitoring tools. Optimize cluster and workload performance to ensure efficiency and scalability.

**Security and Compliance**: Implement security controls and compliance policies in Azure Databricks

**Integration with Azure Services**: Integrate Azure Databricks with other Azure services such as Azure Data Lake Storage, Azure SQL Database, Azure Synapse Analytics, and Azure DevOps to create end-to-end data analytics solutions.

**Configuration and Secrets Management**: Manage configurations and sensitive secrets using Azure Key Vault or other secrets management solutions. Ensure the security of credentials and access keys.

**Training and Support**: Provide training and technical support to development and data analytics teams in the effective use of Azure Databricks. Documen


Salario Nominal: A convenir

Fuente: Whatjobs_Ppc

Requisitos

Data Scientist

Eaton's Corporate Sector division is currently seeking a Data Scientist. **What you'll do**: - Analyze large and complex data sets to extract insights and a...


Eaton - México

Publicado 11 days ago

Technical Support Engineer, Enterprise

**About the Team** As a member of the Technical Support Engineering Team, we are the front line at Outreach who interact with more customers than anyone els...


Outreach - México

Publicado 11 days ago

Soporte Técnico

SOLICITAMOS INGENIERO DE SOPORTE TÉCNICO A EQUIPOS DE COMPUTO E IMPRESORAS ACTIVIDADES - Ejecutar mantenimientos preventivos y correctivos a los diferentes...


Group Cos México - México

Publicado 11 days ago

Monitorista

**Vacante para la empresa Sears en Galerias Mazatlan -Mazatlán, Sinaloa**: SEARS MAZATLAN GALERIAS SOLICITA MONITORISTA **Requisitos**: - SECUNDARIA TERMIN...


Sears - México

Publicado 11 days ago

Built at: 2024-12-27T01:58:14.961Z