Skip to content

jolisper/data-engineering-rust

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

74 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

data-engineering-rust

Data Engineering with Rust Course (Duke University)

Learning Goals

Week 1

In this first week, we will delve into Rust's powerful and flexible data structures. You'll gain insights into different types of sequences, maps, and sets, discovering their distinctive characteristics and potential use cases. Through hands-on exercises and projects, you'll apply these concepts in practical scenarios. This includes creating a Fruit Salad CLI with a variety of sequences, comparing languages using both HashMap and BTreeMap, managing unique fruits with HashSet and BTreeSet, and prioritizing data using a Binary Heap. These experiences will equip you with a robust understanding of how to use Rust data structures effectively for data organization and manipulation, forming a vital foundation for the weeks ahead.

Week 2

In week 2, you will delve deep into the safety features and security principles that make Rust a powerful and reliable language for systems programming. You'll learn how to prevent data races, manage memory effectively to avoid leaks, safely interoperate with C libraries using Rust's Foreign Function Interface (FFI), and build robust concurrent applications leveraging Rust's unique mechanisms for handling mutable and immutable data. Through exploring automatic bounds checking and safe transmutes, you will learn to avoid common errors and undefined behaviors that are prevalent in other languages. To apply what you've learned, you will be tasked with a series of hands-on exercises that demonstrate these principles in real-world scenarios. You'll create a multi-threaded web server, a command-line application that processes diverse types of user input, and systems that would typically be vulnerable to bugs in other languages but remain secure due to Rust's inherent safety mechanisms. Through these exercises, you will gain a concrete understanding of how Rust's safety features lead to more secure, robust, and efficient code.

Week 3

In week 3, you will delve deeper into the Rust ecosystem by exploring various libraries and tools specific to data engineering. You will learn how to process CSV and Parquet files, and make use of Rust's async capabilities to build efficient web scrapers and API consumers. Also, you'll get acquainted with popular data processing libraries like Polars and Apache Arrow. Furthermore, you'll discover how Rust interfaces with data processing systems for message passing, how it deals with REST and gRPC protocols, and the methods of integrating with AWS SDK for cloud-based data operations. Each project is designed to highlight the efficiency, safety, and concurrent processing capabilities of Rust in handling large-scale data engineering tasks. With a focus on hands-on practice, this week aims to equip you with the skills to solve real-world data engineering problems using Rust.

Week 4

In week 4, you will learn how to design and implement data storage solutions and pipelines using Rust, focusing on leveraging Rust's unique capabilities to manage storage technologies effectively. You'll explore strategies for constructing robust data processing solutions and migrating existing warehousing and processing systems to more efficient Rust-oriented solutions. A significant emphasis will be placed on understanding and utilizing key Rust libraries for data processing and mastering best practices for error handling in Rust to enhance system resilience and reliability. You will apply your knowledge by working on a series of practical assignments that involve creating a Rust-based data pipeline, migrating a simple data processing system to Rust, and utilizing Rust libraries to improve a sample data processing scenario. You'll also handle simulated errors in a Rust-based data processing environment, which will help solidify your understanding and application of Rust's error handling mechanisms.

About

Data Engineering with Rust Course (Duke University)

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published