Focusing software projects on delivery RSS 2.0
 Sunday, May 18, 2008

In interviews I need to know how well the candidate knows databases and start off by going back to basics.  After establishing that the candidate knows that the 'R' in RDBMS stands for 'Relational', I follow it up by asking for a definition of 'Relation'.  I will accept answers along the lines of "... a set of tuples" or at a push, "... a table" (A table is a lay term for a relation, particularly when used in the context of SQL databases).  However, most times I receive an answer that starts something like "A relation is when you have another table with a foreign key...", at which point I ask "Are you talking about a relation or a relationship?", which is followed by stunned looks or muttered agreement.  I have interviewed hundreds of people over the years and only one or two haven't used a 'foreign key' based answer.

For some reason the concept of foreign keys in SQL databases have been confused with the relation.  I think it comes from the colloquial use of statements such as "Relate employee and employer on EmpNo" - the 'relate' really refers to a relationship but since the speaker may remember a single semester course in relational theory (apparently making him an expert) recalls that it sounds a lot like the 'relational' in relational database and thus the confusion starts.  Maybe it was the result of Bill Clinton's infamous statements where, I can imagine, during a brief period when mathematics was more practical than abstract, a group of exhausted mathematicians decreed "After much experimentation we can safely say that 'sexual relations' are not part of the relational model and probably refer more to the physical relationship between two somewhat consenting adults"

I'm no mathematician sitting in some old ivy covered building at an academic institution - I deliver systems based on SQL databases and (ADO) DataSets every day and am a big fan of the technologies that I use.  However, having had a background in more formal methodologies and working with great, intelligent people over the years - I have tried to understand the theoretical basis for what I use and a smattering of relational theory fits into that understanding - I think that some knowledge of relational theory is important to build systems on top of SQL Server.

With that as a background, hopefully you will understand why, when using ADO DataSets and Visual Studio, I cringe every time that I have to refer to the ADO version of 'relation'.  Even typing that last sentence makes my hair stand up.

DataSets don't seem to officially claim to have anything to do with relational theory and with that as a disclaimer they don't do anything wrong.  Just as SQL Server or Oracle don't make any claims about fully supporting the relational model for fear of upset computer scientists and mathematicians spray-painting integrals all over the Microsoft campus.  However, since DataSets talk to SQL Databases which in turn are (sort of) based on the relational model - the distance between DataSets and relational theory is not that far - the inference that the DataSet relation has something to do with the relational model's relation does exist.   The only 'official' association that I have found is in MSDN (Datasets in Visual Studio Overview) which states "The structure of a DataSet is similar to that of a relational database; it exposes a hierarchical object model of tables, rows, columns, constraints, and relationships." - Which should be enough to upset a few theorists.

The ADO Relation is simply incorrectly named and propagates the confusion that people have about thinking about relations and relationship interchangeably.  The relation is a fundamental construct in relational theory and has formal relational algebra, relational calculus and other mathematics behind it.  The ADO relation is simply a badly named foreign key.  It is a fundamental error and is like calling the steering wheel of a car a chassis.

Not only does Microsoft abuse the term, they also use it inconsistently.  Visual Studio developers that have used the DataSet designer are used to the term 'Relation' as it is in the IDE (right click | Add | Relation) and I have myself said to developers things like "Create a relation (cringe) between those two DataTables".  However, the actual class that is being created is a DataRelation class.  Thankfully, the class is not called a relation and DataRelation is sufficiently devoid of significant meaning (like calling the aforementioned steering wheel a DirectionChassis) that one can get away with it.  But why then does Visual Studio call it a relation, and why couldn't they just have used 'relationship' in Visual Studio? 

Unfortunately it doesn't stop at Visual Studio.  The DataSet class has a property DataSet.Relations which returns, not a collection of DataTables as someone with some understanding of the relational model would expect, but rather a collection of DataRelations (er... foreign keys?).  DataSet.Relations.Add() does not add a new table, it adds a new DataRelation(ship).

To add to the confusion, the msdata schema definition (xmlns:msdata) uses the term 'Relationship' (msdata:Relationship annotation) or the xs:keyref annotation for foreign keys.  Generating DataSet Relations from XML Schema in MSDN recommends this after creating more confusion by stating "In a DataSet, you form an association between two or more columns by creating a parent-child relation." (aaargh! cringe! What on earth is a 'parent-child relation'?)

I think that Microsoft's abuse of the term 'relation' is shocking and breaks thirty-odd years of sound theory as well as breaking a few object orientation rules along the way by not naming things clearly and consistently.  I don't think that it is going to be an easy problem to fix, but the easiest change would be to rename 'Relation' to 'Relationship' in Visual Studio - most developers wouldn't even notice.  The DataRelation class is more core to the framework but is thankfully so badly named that it has no meaning anyway so it could almost be left alone.  I don't know how easy it will be to change the Relations property on the DataSet which is horribly orphaned and should have at least be named DataRelations from the beginning.

The number of people reading this post can never be as much as the number of developers going - Right click | Add | Relation - perpetuating the confusion.  I don't propose a Relational Vigilante Group that spray paints notations of set operators across the Microsoft campus and I imagine (sadly) that the generation that developed and believed in relational theory - the basis for nearly all our business systems - are literally a dying breed.  The father of relational theory EF Codd died in 2003 at the age of 79 and CJ Date, a driver of the relational model, is probably feeling a bit old and won't be able to compete with millions of young, energetic Visual Studio developers churning out thousands of blog pages every day.

Caveat Lector is is a Latin phrase meaning "Let the reader beware".  All that I ask is that as a .NET developer that you be aware of the terms 'Relation', 'Relationship' and their respective meanings in relational theory, ADO, visual studio and I suppose, the social sciences.

Simon Munro

Originally posted here

Sunday, May 18, 2008 5:54:44 PM (South Africa Standard Time, UTC+02:00)  #    Comments [0] -
Archive | .net
Comments are closed.
Archive
<August 2008>
SunMonTueWedThuFriSat
272829303112
3456789
10111213141516
17181920212223
24252627282930
31123456
About the author/Disclaimer

Disclaimer
The opinions expressed herein are my own personal opinions and do not represent my employer's view in any way.

© Copyright 2008
Simon Munro
Sign In
Statistics
Total Posts: 12
This Year: 12
This Month: 0
This Week: 0
Comments: 2
Themes
Pick a theme:
All Content © 2008, Simon Munro
DasBlog theme 'Business' created by Christoph De Baene (delarou)