Introduction to Big Data and Teradata Aster
This a note for Teradata Aster Basics 6.10 Exam a.k.a TACP(Teradata Aster Certified Professional).
Recommended courses are followings and this note is for the 2nd course.
- Teradata Certification, What’s New and How to Prepare
- Introduction to Big Data and Teradata Aster
- Introduction to Teradata Aster Analytics*
- Introduction to Teradata Aster Database Administrator*
SQL vs SQL-MR: SQL is better for standard transformation. SQL-MR is better for custom transformation(e.g. log extraction)
R creates multiple copies of data during processing, and doesn’t automatically run in parallel.
Aster R run in parallel across the Aster MPP architecture.
FSE(Foreign Server Encapsulation): Supports remote data platforms other than Aster and Teradata. (e.g. Oracle, Hadoop, DB2, etc)
QueryGrid Aster-Teradata: Join tables in Taeradata and Aster Database
QueryGrid Aster-Hadoop: Copy data from Hadoop to Aster, from Aster to Hadoop. HCatalog: Table metastore service for Hive, Pig, and so on.
Deployment Options: Aster Apliance, Cloud, Software Only(RHEL) and Aster on Hadoop.
Data Prepartion: IPGeo, Pivot, JsonParser, Apach Log Parser and PSTParserAFS
Aster Analytics Portfolio
- Data Acquistion
- Data Prepartion
- Advanced Analytics
- Analytic Engine
- Aster SQL-MR
- Aster SQL-GR (Based on Bulk Synchronous Processing)
- Aster R
- SNAP Framework
- Integrated Optimizer
- Integrated Executor
- Unified SQL Interface
- Common Storage System and Services
- Multi-Type Storage
- AFS(Aster File Store)
Queen: Cluster Coordination, Distributed Query Planning, System Tables
Worker Node: Send back results to Queen
Loader: Loading data to Aster
- Aster username/password
- TD Wallet
Multi-Version Concurrency Control(MVCC): Eliminate the needs of read locks while ensuring that the database maintains the key ACID(Atomicity, Consistency, Isolation, Durability)
Two Level Query Optimization
- Queen Global Optimizer: Rule Based
- v-Worker Optimizer: Cost Based. The cost is determined by the demographics of the v-Worker fragment of the distributed data.
Dynamic Workload Management
- User-based policies
- Time-based policies
- Object-based policies
- IP-based policies
- Periodic Re-evaluation
nCluster’s columnar capability is a custom development of Aster. Not part of PostgreSQL. Columnar limitation is append only(no updates or deletes)
Columnar advantage and limitation
- Use NOT NULL whenever possible
- Avoid variable length data
- Don’t SELECT/ANALYZE any columns unless it is necessary
Three compression levels
- Hot data: No or low compression
- Cold data: Medium or High compression
Informatica has Aster connector. Others uses nCluster loader.
Aqua Data Studio: http://www.aquafold.com/
Viewpoint portlet for Aster
- Aster Node Monitor
- Aster Completed Processes