Role Summary
- Connect and model complex distributed data sets to build repositories, such as data warehouses, data lakes, using appropriate technologies.
- Lead teams in the management of data related contexts ranging across addressing small to large sized data sets, structured/unstructured or streaming data, extraction, transformation, curation, modelling, building data pipelines, identifying right tools, writing SQL/Java/Python code.Leader within the Community of Practice/Center of Excellence to create/enhance standards and best practices
**Responsibilities**:
- Partner with Senior Data Solution Architect to create and maintain optimal solutions aligned to published standards with focus on automation and orchestration
- Lead efforts to ensure the health and hygiene of platforms including upgrades, migrations, etc.
- Assemble large, complex data sets that meet functional /non-functional business requirements
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
- Build the infrastructure required for optimal extraction transformation, and loading of wide variety of data Develop models/prototypes to provide observations, identify trends and patterns with leadership to assess potential solutions
- Develop statistical models, algorithms needed for reporting and analytics with high level complexity
- Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics
- Work with stakeholders including Management, Domain leads, and Teams to assist with data-related technical issues and support their data infrastructure needs
- Based on new/enhanced data security policies and procedures, build/enhance technology footprint related to encryption, obfuscation and role-based access
- Create data tools for analytics and data scientist team members
- Extensive knowledge of data and analytics framework supporting data lakes, warehouses, marts, reporting, etc
- Defining data retention policies, monitoring performance and advising any necessary infrastructure changes based on functional and non-functional requirements
- In depth knowledge of data engineering discipline
- Extensive experience working with Big Data tools and building data solutions for advanced analytics
- Minimum of 7+ years' hands-on experience with a strong data background
- Solid programming skills in Java, Python and SQL
- Clear hands-on experience with database systems - Hadoop ecosystem, Cloud technologies (e.g.
AWS, Azure, Google), in-memory database systems (e.g.
HANA, Hazel cast, etc) and other database systems - traditional RDBMS (e.g.
Teradata, SQL Server, Oracle), and NoSQL databases (e.g.
Cosmos, MongoDB, DynamoDB)
- Practical knowledge across data extraction and transformation tools - traditional ETL tools (e.g.
Informatica, DataBricks) as well as more recent big data tools
- ML distributed computing, dataset processing in parallel for training.
Python coding with GPU parallelism.
- Expertise of docker and k8s.
Being familiar with yaml for deployments.
- Extensive background in programming, databases and/or big data technologies OR
- BS/MS in software engineering, computer science, economics or other engineering fields