cs161 2004 Lecture 13: Using Databases Why (and when) you may want to use relational databases. Why? Transactional (why aren't the logs we described good enough?) They've thought about searching (joining, indexing) Associated tools (schema changes, visualization, export, backup) You can worry about speedup later, high-level optimization That's where the data already is Multilanguage access Simple persistence Let someone else multiplex the disk (asynch is easy) Why not? Complicated - takes some learning, tuning, administrating Expensive - but some good ones are free Slow - sometimes gives you "better" demands than you need Closed - hard to know what's going on, change Highwind, Viaweb, Yahoo don't use databases. Rules of thumb: Implementing a well known protocol: db less useful Flexibility, or implementation speed important: db good Flat Files Jim Smith jsmith cs161 BHPS A Jim Smith jsmith cs157 Alg B Steve Brown sbrown cs161 BHPS B Steve Brown sbrown cs157 Alg B Greg Brady gregb cs161 BHPS A contains redundancy hard to query, insert, delete Hierarchal Model - based on trees eliminates some redundancy doesn't handle many-to-many well fragile cs161 BHPS Jim Smith jsmith A Steve Brown steve B Greg Brady gregb A cs157 Alg Jim Smith jsmith B Steve Brown steve B Network Model - based on set hard to use, change relationships? relied on knowing structure to find tables cs157 cs161 | /\ | \ jim steve greg Relational Model - based on "relations" declarative data access cs161 BHPS cs157 Alg Jim Smith jsmith Steve Brown steve Greg Brady gregb Key benefits All data is at the "top" - named, not located Eliminates redundancy - a correctness, not space, concern Eliminate Redundancy one subject have distinct fields no redundant data primary key so that the table can be related to others. Grades? cs161 jsmith A cs157 jsmith B cs161 steve B cs157 steve B cs161 gregb A Views Result of a query Logically just another table, or relation They are named and "stored" Data types Basic: CHAR(size), integers, floats, etc Dates, money, points, shapes NULL User-defined Indexes B-Trees are really hard to write yourself Once you know what you want fast, you can ask for indexes on those fields If you normally want to compute a GPA for student, index GRADES on username If you normally want to compute a class avg, index GRADES on courseno Database connections TCP, with a login procedure Usually, apps will want to maintain connections very amenable to our asynch model not very amenable to bulk data serving Stored procedures security - acls on procs, users can only run procs, not any old query performance - precompiled, query plan built