COURSE AIMS AND OBJECTIVES: Getting familiar with some advanced topics in the area of database systems, data warehouses, NoSQL paradigm and Big Data processing.
COURSE DESCRIPTION AND SYLLABUS:
1. Expanding knowledge of relational databases. Complex data types. Distributed databases. Aims, purposes, advantages and flaws of data distribution. Structure of the distributed database, replication, fragmentation. Distributed transactions protocols.
2. NoSQL paradigm. Why NoSQL? Aggregated data model. NoSQL database properties. Consistency and CAP theorem. Types of NoSQL database: key-value, document, graph and column-family databases. Map/reduce algorithm in the NoSQL world.
3. Data warehouses. Aims and purposes. Data warehouse models. The process of data warehouse creation. OLAP. ETL (extract-transform-load) process.
4. Big Data. Distributed file systems. Google FS and HDFS. Apache Hadoop ecosystem. Map/reduce algorithm. Distributed calculations and analyses. Apache Spark.
|