Datasets vs. Custom Entities

So you want to build your own entity objects? Maybe you are even purchasing or authoring a code-gen tool to do it for you. I like to use Datasets when possible and people ask why I like them so much. To be fair, I'll write a list of reasons to not use datasets and create your own entities - but for now, this post is all about the pros of datasets. I've been on a two week sales pitch for DataSets with a client so let me summarize.

  • They are very bindable.
    This is less of an issue for Web forms which don't support 2 way databinding. But for Win forms, datasets are a no brainer. Before you go and say that custom classes are just as bindable and could be, go try an example of implementing IListSource, IList, IBindingList and IEditableObject. Yes you can make your own custom class just as bindable if you want to work at it.
  • Easy persistence.
    This is a huge one. Firstly, the DataAdapter is almost as important as the DataSet itself. You have full control over the Select, Insert, Update and Delete sql and can use procs if you like. There are flavours for each database. There is a mappings collection that can isolate you from changes in names in your database. But that's not all that is required for persistence. What about optimistic concurrency? The DataSet takes care of remembering the original values of columns so you can use that information in your where clause to look for the record in the same state as when you retrieved it. But wait, there's more. Keeping track of the Row State so you know whether you have to issue deletes, inserts, or updates against that data. These are all things that you'd likely have to do in your own custom class.
  • They are sortable.
    The DataView makes sorting DataTables very easy.
  • They are filterable.
    DataView to the rescue here as well. In addition to filtering on column value conditions - you can also filter on row states.
  • Strongly Typed Datasets defined by XSD's.
    Your own custom classes would probably be strongly typed too...but would they be code generated out of an XSD file? I've seen some strongly typed collection generators that use an XML file but that's not really the right type of document to define schema with.
  • Excellent XML integration.
    DataSets provide built in XML Serialization with the ReadXml and WriteXml methods. Not surprising, the XML conforms to the schema defined by the XSD file (if we are talking about a strongly typed dataset). You can also stipulate whether columns should be attributes or elements and whether related tables should be nested or not. This all becomes really nice when you start integrating with 3rd party (or 1st party) tools such as BizTalk or InfoPath. And finally, you can of course return a DataSet from a Web Service and the data is serialized with XML automatically.
  • Computed Columns
    You can add your own columns to a DataTable that are computed based on other values. This can even be a lookup on another DataTable or an aggregate of a child table.
  • Relations
    Speaking of child tables, yes, you can have complex DataSets with multiple tables in a master detail hierarchy. This is pretty helpful in a number of ways. Both programmatically and visually through binding, you can navigate the relationship from a single record in master table to a collection of child rows related to that parent. You can also enforce the the referential integrity between the two without having to run to the database. You can also insert rows into the child based on the context of the parent record so that the primary key is migrated down into the foreign key columns of the child automatically.
  • Data Validation
    DataSets help with this although it's not typically thought of as an important feature. It is though. Simple validations can be done by the DataSet itself. Some simple checks include: Data Type, Not Null, Max Length, Referential Integrity, Uniqueness. The DataSet also provides an event model for column changing and row changing (adding & deleting) so you can trap these events and prevent data from getting into the DataSet programmatically. Finally with the SetRowError and SetColumnError you can mark elements in the DataSet with an error condition that is can be queried or shown through binding with the ErrorProvider. You can do this to your own custom entities with implementation of the IDataErrorInfo interface.
  • AutoIncrementing values
    Useful for columns mapped to identity columns or otherwise sequential values.

This is not an exhaustive list but I'm already exhausted. In a future post, I'll make a case for custom entities and not DataSets, but I can tell you right now that it will be a smaller list.

Comments

  • Barry Gervin February 10, 2004 3:31 PM

    Personally, I dislike datasets. Hold on, I'll qualify that - I dislike datasets as a replacements for a domain model.

    For all the reasons you mention, datasets can be a fine solution for a single page. I do NOT think that they belong in the center of a system's architecture. Furthermore, they greatly impede one's ability to perform serious business logic and rules checking.

    Although datasets may be ok for one-off situations, I try to dissuade developers from relying on them too much.

  • Barry Gervin February 12, 2004 4:57 PM

    I think it would be most valuable to itemize the issues that lead you to this opinion...which is not uncommon. As I mentioned, I'm working on a list myself to present both cases for when and when not.

  • Barry Gervin March 3, 2004 3:57 PM

    Other advantages are

    - Other MS products know how to work with DataSets. One example is the new Infopath SR-1, in particular with diffgrams (try to make your custom entities support diffgrams!). Future Office versions will also know how to deal with them.

    - Third party control vendors usually provide very good support for datasets. They don't test them with your custom classes.

    About Udi's comments, that's the usual reason to not to use DataSets. They make hard writing business logic with them.

    If you combine DataSets with a business rule engine, so you don't need to write business logic using the DataSets, then you have the best of both worlds. Take a look for example at Biztalk 2004's business rule engine.

    DataSets don't have very good press among OO heads. Now, if you change the word 'DataSet' to 'XML' then they start looking you differently, even if it's the same idea. DataSets are as necessarily bound to a database schema as XML files. That's the usual way of using it, but it's not the only one.

    Also, using custom entities can provide the idea that can model your database with objects, and that's the most dangerous trap in enterprise development today, as you start dealing with lazy/eager loading, partial instantiation, object identity, etc, etc, etc.

  • Barry Gervin March 9, 2004 7:03 PM

    I'm keeping this list bookmarked, so I'm adding a couple of more advantages ;)

    - Using 'ExtendedProperties' you can easily add metadata in runtime to the dataset that can be easily consumed. You can also do this with attributes in custom classes but it cannot be done in runtime, and they are more expensive and difficult to consume

    - There is a lot of community support around DataSets (articles, doc, books, etc), but there is no support for your own flavor of custom entities. Each programmer that needs to use your business logic layer will need to learn to use your components. That does not happen with Datasets. Any .NET programmer should know how to deal with them.

  • Barry Gervin March 13, 2004 3:02 PM

    Would you use a dataset, strongly typed or otherwise, to represent a single "row of data"/entity ?

  • Barry Gervin March 18, 2004 11:52 AM

    A single row? I haven't yet used a dataset for this case...and I'm not sure I would. In fact, this may seem strange, but I don't think I've even had en entity or a business entity that has fallen into that category in the past 2-3 years. Perhaps that's a sign that I'm building way to complex of systems - or that I'm not creating small enough or atomic enough entities. I do tend to think of entities as self contained (somewhat transactional) documents - that contain all the tables requried as part of the business transaction. For example, an Order. That's too easy - it has Order Header and Order Detail. What about Customer? Sure that sounds like a single record - but in those cases I've found that there always ends up being some "extra" info attached to the customer - like the list of their contacts/employees, or something else. Like I said - haven't had a case where the customer is on it's own.

    If I did have this case, now that I think of it, I'd probably be lulled into using a dataset out of practice - and in particularly if I'm writing a data access component and persisting it to the database - just to easy to use those data adapters to make this happen complete with the table mappings collection - it's just to fast to do it otherwise.

  • Barry Gervin April 10, 2004 3:50 PM

    Andres,

    Biztalk 2004 Rules Engine works fine with custom entities.

    "using custom entities can provide the idea that can model your database with objects, and that's the most dangerous trap in enterprise development today" - the ways to model a DB are very well documented, and some base themselves on object models.

    Furthermore, entities are pretty much just structures which represent a collection of data in different contexts. A "customer" entity may contain an Id, first name, last name, and ssn. Entities don't contain logic - besides basic validation logic ( ssn must be a specific format ).

    The logic of the system is split into discrete services and each are used as needed. Notification is a fine example of a service, persistence is another. The BL as a single "mush" of all logic is distasteful, IMHO, I expound on this here: http://udidahan.weblogs.us/archives/017820.html

    "That does not happen with Datasets. Any .NET programmer should know how to deal with them" - Any .Net programmer can figure out how to pass a simple class ( like "customer" ) to a service, so I don't see any problems there.

    Barry,

    "haven't had a case where the customer is on it's own" - it all depends on how you define customer, of course. In my architectures, each service has its own definition. Most of the time, the definitions between services are compatible - for instance when the development of all the services is part of a single version of a system. When the definitions aren't compatible, you move to a common message bus architecture ( this has been discussed ad nasium and can be easily googled ).

    About the issue of complex systems - well, I've been part of > 100 man year DoD development projects that were VERY complex ( not to mention the complexity inherent in size ). During the architecture and design sessions, datasets were suggested, analyzed, and discussed, within the organization and with various experts ( MS as well ), and were found to be lacking in several critical respects - performance being one of them.

    I shall reiterate, datasets aren't my first choice on any project, but often prove useful for demo code. Just recently I implemented a small project in under a month, using all the techniques of SOA mentioned in my blog, without a single dataset, table, view, or adapter.

    "just to easy to use those data adapters to make this happen complete with the table mappings collection - it's just to fast to do it otherwise" - Development and basic testing was done in under 2 weeks. We went into alpha with 1 open bug. No bugs were uncovered during alpha ! The open bug was ( obviously ) closed. Beta testing went off without a single bug too ! The project was done for MS Israel and is now live for over 2 weeks without a single problem. I'm not saying that this couldn't be done with datasets et al, just that I haven't heard of it happening ( and I've seen and heard about quite a lot of projects ).

    These are just the thoughts of a single developer, and all anecdotal "evidence" should be taken with a grain of salt ( or two ), but those hooked on datasets should try it a different technique, just to see what its like. I used to use datasets ( et al ) for EVERYTHING, now, I use them for nearly nothing.

  • Barry Gervin April 10, 2004 6:14 PM

    There is no doubt you can use datasets or custom entities to make a successful project - or an unsuccessful one for that matter, so comparing projects delivery dates doesn't help much with the education of custom entities vs. datasets.

    Can you elaoborate more on your evaluation of datasets - specifically the performance problems? Certainly there is a known issue regarding remoting datasets - even the binary serialization is done with XML so this is way too verbose. There is some good code examples to be googled of customizing the dataset serialization to be faster - and this has been fixed in Whidbey. Any other performance issues I'm in the dark over so please do elaborate on your eval.

  • Barry Gervin April 11, 2004 10:24 AM

    I'll get into the performance issues later ...

    just found Plip's analysis of this issue - its worth a look : http://weblogs.asp.net/Plip/archive/2004/04/11/111128.aspx

  • Barry Gervin April 11, 2004 9:28 PM

    Udi,

    If you are discussing which data structure you should use when exposing your data using a SOA, then custom classes are probably a better choice, as noone will use the DataSet's advantages as the users of that layer probably don't understand DataSets.

    Anyway, I'm curious on how do you handle for example the update of an order in your scenario, without a diffgram.

    I was discussing which data structure was better for entities that I need to use in my own application, and I want to bind, persist, filter, sort, etc

    >Any .Net programmer can figure out how to
    >pass a simple class ( like "customer" ) to a
    >service, so I don't see any problems there

    If I need to implement my own flavor of a data structure as capable as the dataset, then the user of my data structure needs to learn to use it.

  • Barry Gervin April 12, 2004 11:38 AM

    There was an interesting similar discussion on Andres Aguiar's weblog, at http://weblogs.asp.net/aaguiar/archive/2003/06/16/8757.aspx

    I agree both have advantages and disadvantages, but following OOP principles, encapsulation can provide both. After all, model objects are only a bunch of methods on top of a data container. The DataSet can be seen as a common "data ground" for these objects, enabling you to use dynamic data and binding when you need it, and throughout your BLL, use the custom entities, sparing the burden of "translating" your data between layers and components.

    Though that doesn't break performance issues (even making them worse). And it provides applications a straight access to the DAL (if you want to be able to use binding advantages, you have to leave the dataset accessible, so you have to bind all event and start heavy validation), small to medium applications could stand those issues. Did someone ever tried this approach?

  • Barry Gervin April 13, 2004 12:04 AM

    Plip should definately look at typed datasets - it looks like he's writing them himself. I've seen that often enough but he seems to go to a greater extent than most people do.

    Compared to a dataset (typed let's say - but doesn't really matter) his storage of data is ok for 1 or two instances of your "car". For larger result sets though this will be inefficient "row" based storage of your objects and not as efficient as the columnar storage of data in datasets using value type arrays like int[] for your "Id". There is some good performance to be had with column-based storage such as a dataset and if you are not doing a dataset and writing your own custom entities that may hold large collections of data - I would recommended implementing that approach.

  • Barry Gervin April 13, 2004 2:01 PM

    I think that the .Net petshop v3 went with a hybrid approach - custom classes to represent single entities, and datasets/datatables to represent many entities. I'm not sure that I particularly like that since you can't really go from one to the other. But it's worth a look anyway.

  • TrackBack April 13, 2004 2:12 PM

  • TrackBack April 13, 2004 5:01 PM

    <p>
    The <a href="http://www.lazycoder.com/weblog/index.php">lazycoder</a> puts up <a href="http://www.lazycoder.com/weblog/index.php?p=51">his take</a> on the whole <a href="http://objectsharp.com/Blogs/barry/archive/2004/02/10/273.aspx">custom classes versus datasets ( et al ) debate</a>. He draws the line on making strongly typed collections, for no other reason than performance - rather, performance pertaining to the amount of memory utilization.
    </p><p>
    The true fact of the matter is that the memory overhead you pay for using strongly typed collections is quite small. 900 Customer objects in a CustomerCollection is the same order ( as in big-O notation ) as 900 datarows in a datatable. If anything, your ability to control the memory utilization is greater when using the CustomerCollection since you control the footprint of each Customer. You have much less control over the memory footprint of the datarow.
    </p><p>
    So, if memory utilization is better with strongly typed collections ( or even the same assuming a common case ), memory utilization does not appear to be a valid concern for choosing datasets & co over strongly typed collections.
    </p><p>
    Development time is a valid concern, and is brought up quite often. From my experience, it does take less time to use datasets to get to the 80% functionality mark. However, that last 20% becomes that much harder because the ability to control datasets' behaviour is quite diminished. Thus, over the entire development lifecycle, I find that strongly typed everything works in my best interests.
    </p><p>
    One final note - I'm not against auto-generated code. Quite the opposite, really. I try to automate any parts of the development process that I can. However, I am against relying on code generated by something not under my control, and am dead set against not understanding how that code works. Code breaks, automated code breaks automatically.
    </p>

  • TrackBack April 14, 2004 3:30 AM

  • Barry Gervin April 14, 2004 10:38 AM

    This is great information, Barry, and I appreciate you sharing it.

    I also enjoy using typed datasets and have yet to build an application that could have been built more quickly or better using typed objects. That being said, I mainly build web applications and Intranet applications for small businesses, so I can't specifically speak to the needs of large applications.

    I can say, however, that with all the page caching and object caching currently built into .NET as well as what is coming in 2.0, my applications are just not hitting the database as much as they did in classic ASP. I also don't load 900 rows of anything for a web page, which has a short life anyway, so memory consumption has never been an issue. And, like you said, a quick Google will help one to find binary serialization solutions for DataSets if you need the added performance.

  • Barry Gervin April 14, 2004 11:05 AM

    <A href="http://weblogs.asp.net/plip">Plip</A> does a good job of showing a strongly typed ds alternative - a strongly typed custom entity. He addresses sorting and filtering benefits I mentioned of a dataset, although he doesn't implement any support for updating (i.e. storing original values or a version/aka timestamp)

    Another point that cam up in his feedback is support for null's and somebody suggested that dataset's have the same need to implement some null value handling - so I should have added to my list of ds benefits the built in support for null values.

    DataColumn's do have an "AllowDbNull" property which in the case of typed dataset's is derived to be true with minoccurs=0 in the xsd. But furthermore, the DataRow provides a IsNull(column) set of overloads to ask a column in a given row if it's null. Typed Dataset's offer Is<columnname>Null() varieties on the typed DataRow as well. Internally, this is handled on each nullible column with a bit array to store the null/not null values for each row. This is superior to any imaginary or default value technique which only works in a narrow set of business requirements or interpretations. I'll also mention that column value getters in datasets always check the bit array first to return null before going after the value array that stores the actual column value....a good technique worthy of emulation if you want to create your own custom entities.

  • Barry Gervin April 14, 2004 11:17 AM

    Just to get another thing down on paper.....

    One of the things mentioned about a custom entity in all of these discussion is the ability to do custom validation. You can implement a column value setter that checks the set value and perform some validation. A more general notion is the ability to add your own code to a custom entity whereas a typed dataset has no room for that - unless you want it regen'd when you change your schema and lose your changes.

    Developing custom entities is going to get a lot easier in whidbey with generics and persistence with ObjectSpaces. Datasets get better in whidbey tool. With partial classes, there will be support for a generated dataset as a partial class. You will be able to add your own code in another file which is the other part of the partial class and at compile time they'll get compiled together. This is a good pattern for anybody doing code-gen - and of course they'll also have their own individual source control histories which is nice.

    In the meantime, Dataset's get a bad rap as a use for a rich entity. If you don't want to throw out the baby with the bathwater, I've seen two techniques for adding your own code. The first technique I've seen with some limited success is to inherit from the typed dataset. You till have the problem that your ancestor is still a dataset so you can't engage any extra code through inheritance....but you can extend, override, hide/shadow stuff from the typed dataset and dataset parts of the hierarchy.

    The other option is to contain/host the dataset in your own entity class which could be derived from a base class (to acquire functionality through inheritance). This option lends itself more to handling events than overriding behaviour. You can still handle the various dataset events in a direct descendent of typed dataset however (like column & row changing/changed events) to perform validation and stop changes to your values.

    Datasets also have the option of getting uglier too in whidbey. Being able to add data access right inside your typed dataset which spawns the notion of a DbDataTable (a DataTable with Fill/Update methods and hence the seeds of it's own data-adapter built right in). I've been told that this is a rad option, possibly for the non-enterprise scale developer...and clearly not for me. I really hope I can craft an enterprise template to disable that option.

  • Barry Gervin April 14, 2004 11:22 AM

    BTW - this is a great exercise which as lots more to discuss and looking forward to exploring more details of both sides and just wanted to take a moment to thank the participants on both sides of this discussion - keep up the excellent work. On that note...any of you guys going to TechEd? Correct me if I'm wrong, but this would make a great panel discussion birds of a feather session. No?

  • Barry Gervin April 14, 2004 5:19 PM

    Adding another point of view, I think that the value of DataSets also depends on the kind of application you are building. If you are developing ASP.NET applications they have less value than if you are doing Windows Forms applications.

    One reason is because data binding in asp.net is much more primitive than in Windows Forms, and making it work as good as with datasets is difficult.

    But the main reason is that in Windows Forms you need a disconnected architecture. You need to retrieve data, keep track of the changes, and then persist it. Doing this with custom classes is a lot of work.

    If I want to build a business logic layer, I want to be able to use it from asp.net apps and windows forms apps. Doing that with DataSets is easy. Doing it with custom classes is not.

    BTW, I'll go to TechEd, but I'm not sure if this topic is big enough for a BOF, but we can try ;)

  • Barry Gervin April 16, 2004 7:48 AM

    Barry, Andres,

    Have either of you developed an entire system using strongly typed / custom classes ?

  • Barry Gervin April 17, 2004 6:35 AM

    Udi,

    I don't like the tone of the question ;) (I can ask you if you ever built an app using DeKlarit's DataSets ;), but anyway I'll answer it.

    We built a custom IDE with a database backend. The database was an old c-tree isam database, and the data access is done using a managed c++ layer on top of C code. We are using custom classes for that product. We basically don't need any DataSet feature. We don't bind, etc. It's not a tipical database-based business application.





  • Barry Gervin April 17, 2004 9:06 PM

    Udi,

    Good debating tactic - question the credentials of your opponent :)

    I've built systems using typed and untyped classes and datasets. That's 4 different scenarios. When I say "built", that can vary between being a consultant on an advisory basis, a contributing developer, and right up to lead designer/architect. Obviously I've only used datasets with .NET. Prior to .NET I've had experience with both typed and untyped classes in a various of platforms including Java, Delphi, and PowerBuilder (mostly untyped there).

    Most of my early work with .NET was untyped (including datasets) but I've increasingly seen more typed projects (both classes and typed datasets.

    Unfortunately, most of the time I see teams choose how they want to do things based on instincts, gut feel and antecdotal evidence...and the later mostly in defence of the first two. My recent work revolves around defining best practices and roadmaps to the most appropriate .net technology. The problem with bp's is that when they get communicated verbally ususally lose some of the fidelity of "when" or "why" to use or not to use or "how" to use a given technology....and that's what this thread is all about....filling in all of the when's & why's.

    So to say something like "Datasets" have poor performance so never use them is a huge dis-service if it's not qualified. Slow to load? Slow to remote? Slow to serialize? Slow to Develop? Not scalable? Those blanks need to be filled in.

    So far this is where I am on the performance issue: Slow to Load? Inconsequential as long as you load them with constraints & indexing turned off during load. Slow to remote? Yes if you don't override the built in binary serialization...or wait until Whidbey. Not scalable? I've got a case where datasets are actually more scalable than loading custom objects with a datareader. This is mostly accidental/slippery slope where a connection is being held open longer because it was more convenient for a developer to do some extra work during the datareader looping. The dataset solution doesn't give you the option to do extra workd "during the load" so it's a bit safer in that situation. .

    This last problem I don't consider a back breaker one way or the other though. The point is you can make mistakes using technique. You can't always blame your tools and you need to have processes in place to performance & load test any solution you do....developers are human so you have to surround them with processes like testing to validate the assumptions against expectations.

  • Barry Gervin April 18, 2004 1:39 PM

    Maybe we should have another thread about this.

    I think perhaps the main reason that some developers find strongly typed entities (datasets or otherwise) distasteful is not that they are strongly typed, but that there is no untyped access to it's contents?

    Certainly a lot of strongly typed entity implementation patterns, samples and code generators don't offer dual typed and untyped access to their contents. Of course there is always reflection but the thought of accessing data through reflection increases the distaste. Perhaps that is a reflection (pun intended) of the the reflection api not to mention it's performance. That's another thread in itself.

    Strongly typed datasets are built on top of the untyped dataset. So without resorting to reflection it's easy to walk the tables, columns and rows to collect not only data but meta data.

    I think that is a good metaphor for any implementation of a Typed Entity - that within the implementation there is an untyped mechanism. I'm hestitatnt to say "storage" but I'm certainly thinking about that.

    One of the things I find with teams that use code generation for typed classes is that somewhere along the line they wish for some extra code to be inserted that they didn't think about when they first defined their templates. Improving the template can often mean regenerating existing classes. It's a clear requirement for effective code generators to allow for regeneration of modify classes without overriding the modifications. Partial classes in whidbey solve this problem and in fact the new dataset generator and environment takes advantage of this and allows me to add my own code to user files that get compiled together into one class.

    A lot of functionality required for an entity doesn't necessarily have be code gen'ed and again, typed datasets are a good demonstration of this. The generated classes are inherited from something else (in this case, System.Data.DataSet). By building on top of the untyped functionality in the DataSet, you get lots of things for free - and my point is that these freebies are only possible by the fact that at the core of this type of entity is untyped data. For example, other than wrapper functions, there is nothing in a Typed DataSet's code that handles null values - but that is handled in the ancestor. Original Values, Modification State flags and error states are other biggies that come for free with a DataSet.

    I don't bring these up necessarily to advocate a dataset but to rather demonstrate the power of untyped cores to your entities and the things you can add for free in your ancestor (or external helper classes if need be) if there is an abstract untyped way of getting to the core data.

    I'm still thinking about how Plip's example is going to be elegantly refactored to support updates in the future. I can't see a good way out here....and in general, I'm trying to come up with some kind of universal requirements or best practices for custom entities.

  • Barry Gervin April 18, 2004 4:47 PM

    Andres,

    No tone. Just curious.

    Barry,

    Thanks, ( I actually wish I thought of it as a tactic ) but I'm apparently tactless :)

    I'm actually reflecting on why I so dislike datasets, strongly typed or otherwise, and haven't come up with an answer yet. It could be that I don't control how things get done ( that and the fact that I'm something of a control freak when it comes to my code ).

    Original Values, Modification State flags and error states aren't things that particularly matter to me, unless I'm working truly disconnected.

    I also find that my custom classes fit a lot better with my overall architecture point of view - SOA - than do datasets.

    Actually, I'm quite a misfit in the custom classes camp, advocating that these classes should contain only basic validation logic and having all work done in external services. Most advocates of the custom classes way of work are "Object Bigots" ( Fowler's words, not mine ) and, IMO, dislike datasets as well. THis is about as far as we go in the same camp. Once I pipe up about logic being outside the classes, I hear a *gasp*, and some variation of "but, but, but - WHERE'S YOUR ENCAPSULATION !? ".

    I've taken something of a detour on my "Road To SOA" series on my blog, and I'll try to get back to it. Maybe then what I have to say about custom classes will make more sense.

  • TrackBack April 19, 2004 10:46 AM

  • Barry Gervin April 19, 2004 6:14 PM

    Udi,

    Just a quick question. How do you handle updates with custom classes? Do you have some kind of optimistic concurrency support?

  • Barry Gervin April 20, 2004 10:38 AM

    I'll tell you why I dislike DataSets, they are huge objects containing a bunch of stuff that I don't need 95% of the time. I like Andres wrapper approach, but it still drags the DataSet along inside it. I'd rather have a small, strongly-typed class or struct and populate it using a DataReader.

    Ideally, I'd like MS to create an IDataSet interface with just enough defined members to allow us to fill our custom classes in the standard way.

  • Barry Gervin April 22, 2004 12:26 AM

    Scott,

    If you don't need the stuff the 95% of the time then you should not use the DataSets, of course.

    Anyway, it reminds me of a friend of mine, who is a C programmer, and uses that argument to not write code in .NET ;)

    Regards

  • Barry Gervin April 23, 2004 8:54 AM

    I'm doing some work on inventorying many of the MS Sample applications (IssueVision, TaskVision, FM Stocks, ShadowFax, Jaggle, PetShop, etc.) and what technologies (Win/WebForms, Web/Ent Services, etc), techniques (Datasets, custom classes, layers and partitions, etc), application blocks (DAAB, EMAB, etc.) they demonstrate.

    I stumbled onto one that was written by some folks at Infragistics. One of the design goals was to not use Datasets. I'll be doing some more review of this app over the coming weeks.

    http://windowsforms.net/articles/writingntierapps.aspx

  • Barry Gervin April 23, 2004 9:48 AM

    A couple of follow up comments on the Tracker above...it's not written by Infragistics, but it does use their winforms controls.

    More importantly, I had a quick peek at two common flaws in custom entity samples. Updateability with optimistic concurrency, and Handling of nulls.

    This app does use timestamps in the database and keeps those in the entity. Good practice if your db has them, but not a generic solution like datasets original values. The entities in this sample do not show any support for having the concept of nullible values in their entities.

  • Barry Gervin May 4, 2004 11:51 PM

    Updates are done like follows:

    PersistenceService.Update(myCustomer);

    Concurrency I handle in various ways depending on the situation but its usually one of the following:

    1. The sql of the update checks all values ( like what the data adapter generates ). I like this approach because I don't dirty my entities with unnecessary baggage.

    2. Add a datetime of last update value to the entity - simpler sql.

    3. Add a version number to the entity - simpler sql, like 2.

    I then simply check the number of rows updated, and if nothing has been updated, I raise an appropriate exception ( sometimes I call it a ConcurrencyException, most times I prefer an EntityHasNotBeenUpdatedException ).

  • Barry Gervin May 5, 2004 9:35 AM

    Udi,

    If you follow the first approach, then you need to have the old values available. Where do you store them?

    I mean, if you use ASMX to retrieve a Customer, change something, and send it back, how do you know the old values?

    Also, in a 'SOA', I could want to retrieve an 'Order' as a whole, add an OrderLine, remove another OrderLine, update the header, etc. How do you send those changes back to the middle tier?

    Regards,

    Andres

  • Barry Gervin May 5, 2004 4:07 PM

    Andres,

    Old values can be easily stored by holding a shallow copy of your data. Its usually in the cases where I get lazy ( which turns out to be quite often ) that I go with the other approaches.

    What you're looking for in your example is the ability to work disconnected. SOA doesn't try to solve that problem. A service might be something like AddOrderLineToOrder which would get the minimum needed information and coordinate all the work for you.

    Just to expound a little:

    public void AddOrderLineToOrder(int productId, int amount, double discount, int orderId)
    {
    // all of the following needs to be done in a transaction ( obviously )

    // fill a product entity using the productId - could use caching to improve performance
    // throw exception if no product exists

    // fill order entity using orderId - could use caching to improve performance
    // throw exception if no product exists

    // calculate new total sum of order using price found in product and discount and amount

    // create new orderline entity with all data now available
    // persist it and connect to order

    // apply any volume discounts or special offers - automatic 2 for ones for example
    // create new orderline entities as a result
    // persist them and connect them to the order too

    // update the new order
    // persist it

    // email the user about the wonderful specials he just got
    }

    There's enough to do one step at a time! god only knows what rules to employ and how to coordinate batch changes based on them.

    When all you've got to do for each action is a simple CRUD, then it makes sense to batch them. But when you've got heavy business logic/rules - or worse, those that change often - batching changes together to improve performance while creating more complexity is not the solution. That's IMO, of course :)

  • Barry Gervin May 6, 2004 11:25 AM

    Mm... That's a quite 'chatty' API, and you'll have a hard time to add all the lines in the same transaction (unless you use WS-Transactions).

    Also, how do you deal with updating a Customer? You need to send the shallow copy back to the server, unless you want to keep state there.

    Also, in my opinion, SOA does try to solve the 'work disconnected' scenario.

    Regards,

    Andres

  • TrackBack May 12, 2004 4:07 PM

    After several back-and-forths on the subject, a nice thread grew between myself, Barry Gervin, and Andres Aguiar&nbsp;on the subject - read it here. For any one wondering which way to go, I can only say that my way rocks! But...

  • Barry Gervin May 12, 2004 9:41 PM

    I agree with Udi. Having an Order object which has got both implementation (e.g. how to persist itself to the data store) as well as the state (e.g. attributes for the Order) is not a good idea.

    The first criticism is that, what happens if you populate the Order object, pass it back to the presentation layer and then call the persist method? Although you could argue that still you want to persist the value to a data store but the way you persist your object on the server (i.e. SQL Server database) is different from the way you persist your object on your PocketPC (an XML file or SQL Server CE).

    So there is a clear difference between those classes that implement functionality (e.g. business and data layer) and those that represent business entities. You populate the business entities and then pass it between components. This helps you in looking at the interactions as "Message Passing", rather than RPC. It also helps make your business/data layer objects stateless.

    If you are concerned about transactions, you could either use COM+, or create a SqlTransaction object (thanks to ADO.NET) and then pass it to those methods that need to be part of the same transaction. Again, although the workflow method has some internal state (which is in fact the state of the transaction), but as soon as the method call for the workflow finishes, all of the state is lost, so the object and connection can be returned to the pool.

    If you have an Order object and want to add an Order Line, you first read the existing values (the same way you do with the DataSet) which includes the order + orderlines, do the modification (remember you are passing the business entity including order+orderlines around) and then pass it to a final method which persists the data back to the data store. It is all up to that object (which is your data access layer) on how to perform the update (i.e. resolve conflicts, etc).

  • TrackBack May 12, 2004 11:52 PM

    A Thread on DataSet vs. Custom Classes

  • Barry Gervin May 13, 2004 11:27 AM

    Hi all, interesting thread!

    Andres and Udi, about updating orders:

    Granted, it's a chatty API that Udi suggests, but it's quite probably the appropriate one, seeing as how the adding of order lines is prone to extensive business logic, just as Udi demonstrates in his example. (btw, Udi, I'm another 'odd camper' who prefers to keep my business logic in a separate Service layer and let is work on the custom entities)

    Still, in cases where batch updates are viable, why not just send the updated graph serialized as an xml document to a Web Service on the server, that accepts the whole document in a parameter?

    You could also send along the original xml document that the server gave you in another parameter, enabling the server to do optimistic concurrency. Alternatively, you could cache the original document in server-side Session state or similar.

    The thing is, there's really no difference if you use custom entities or datasets on the server as far as the client-server communication is concerned. It is going to be Document Oriented and based on SOA anyway, as long as you don't go about Remoting the custom entites (I'm not a fan of that approach).

    So, the question would be:

    (1) Is it easier to implement the Retreive Services producing the xml documents using custom entities or datasets when the client requests data, and

    (2) is it easier to implement the Create/Update/Delete Services (both the fine-grained ones such as AddOrderLineToOrder() as well as coarse-grained services accepting whole updated documents) using custom entities or datasets when the client wants to update data?

    The answer would be - it depends!

    Of course, each approach (custom entites and datasets) takes a bit of getting used to. But in my experience it's worth getting to know both!

    Best Regards
    /Mats Helander

  • Barry Gervin May 13, 2004 11:57 AM

    Mehran,

    I'm not suggesting to have the persistence methods in the Order itself. The DataSets have a 'DataAdapter' that knows how to persist them.

    Mats,

    Sometime ago I learned that the right answer for any question is 'it depends' ;).

    Anyway, the point here is that people who uses custom entites does not see that they need to keep/send the old values and the new ones, or they use timestamps for optimistic locking. Once they find the limitations of that approach, they need to build something quite complex manually (that's what some O/R mappers usually do), and that's pre-built in the DataSet.

    About exchanging complete Orders or OrderLines, I still prefer working with the whole Order. That's the real 'entity', the one that needs to be consistent, that could require more complex concurrency checkings (i.e., noone added another line to the order), etc.

  • Barry Gervin May 13, 2004 1:11 PM

    Hi Andres,

    I agree completely with you that optimistic concurrency is important, and like you I prefer to use the the column-level approach rather than versino columns, for higher scalability. While I can't argue with the fact that datasets support optimistic concurrency, I'll also have to say that I wouldn't use custom entities that didn't, so in my eyes that point is kind of moot.

    I also agree with you that sending whole entities or even graphs around as xml documents is very comfortable, and something I try to do whenever possible, but sometimes the fine-grained service is the way to go...as you say,, 'it depends' is really an extremely useful phrase ;-)

    Another useful phrase is 'Ask the domain expert!' :-)

    /Mats

  • Barry Gervin May 14, 2004 3:48 PM

    I've done many projects with Typed Datasets and and few less without.
    I find the typed datasets easier to work with because third party controls/apps work with datasets, crystal reports. And the serialization is much easier.

    I only wish you could merge datasets. In one project we had a Typed dataset for every data entry page. and when we had customer and vendor which were both company tables, but different fields updated. Maintaining the datasets became an issue.

  • Barry Gervin May 15, 2004 5:46 PM

    The service example I gave "AddOrderLineToOrder" would be called in-process on the server. Should the server expose a document exchange protocol ( as is quite common ), after receiving the document on the server, a number of calls would have to be made like "AddOrderLineToOrder".

    Therefore, the issue decomposes from (so-called) client-server to between services on the server.

    My argument for custom classes derives from the need to perform actual business logic on the server, and not just toss the changes into a database.

  • Barry Gervin May 16, 2004 8:50 AM

    Of course the document based services like "UpdateOrder" will be broken up into several in-process calls to methods like "AddOrderLineToOrder" on the server.

    That doesn't mean that it isn't sometimes necessary to expose fine-grained services like AddOrderLineToOrder to out of process clients as well.

    In any case, I agree that it is when you need to perform more serious business logic than just shuffling data in and out of the db that custom classes really start becoming useful - whether you opt to distribute your business logic over the custom classes or in a separate service layer.

    I was probably a bit unclear on this before. I realize that reading my earlier post with points (1) and (2) in it, it looks like I'm suggesting that all that is being done on the server is basic CRUD (when exchanging documents) and then I ask "can custom classes help you with this?"...

    What I really meant was that one should ask the question "Do I do anything /more/ than just the basic CRUD on the server?" and then, if the answer is "yes", perhaps custom classes can be an interesting option...

    I'll try to be more clear! ;-)

  • Barry Gervin May 16, 2004 9:36 PM

    Udi,

    OK, but when you send the Order from the client to the server, you need to know which rows were added. You need to have that information in your Order custom entity.

    I could write the same business logic using a 'OrderLineDataRow'. Is the same as using a custom class, _unless_ your custom classes are really a 'domain model', with inheritance, composition, etc. When I say a domain model I'm not talking about where you put your business logic but where you are writing

    Order.Customer.Id

    or

    Order.CustomerId

    If you are doing the first, then you'll have a lot of problems moving to a service oriented architecture. If you are doing the second, you won't, but in that case writing business logic with the custom entity is not more difficult than using a DataRow.

  • Barry Gervin May 18, 2004 9:42 AM

    Andres,

    I do it the first way (Order.Customer.Id) and I don't have any problems with SOA. Quite the opposite, in fact, SOA and a Document Oriented approach are a very natural fit with the custom entities. A document usually corresponds to a subset of the domain model.



  • Barry Gervin May 18, 2004 12:25 PM

    Mats,

    It's a subset of the domain model, but the subset includes different views of 'Customer'. For one document the Customer could have some fields, and for other document, it could have other fields. When I'm retrieving an order I don't want all the Customer data. When I'm retrieving a Customer, I do.

    In that case, you need multiple custom entities for the Customer.

  • Barry Gervin May 18, 2004 4:36 PM

    Andres,

    That's quite right!

    I just posted an entry in my blog addressing this, perhaps you'll find it interesting:

    <a href="http://news.pragmatier.com/matshelander/Weblog/DisplayLogEntry.aspx?LogEntryID=6">Serializing Domain Objects</a>

    As you say, I use multiple (overlapping) sets of custom entities that I design using a Document Oriented perspective for just this purpose and that reside on top of the O/R Mapped custom entities.

  • Barry Gervin May 18, 2004 4:37 PM

    Oops, that link became very ugly. Sorry about that, my bad.

  • TrackBack May 19, 2004 12:15 PM

    Dataset/Custom Entity Discussion and My 2 Cents

  • Barry Gervin May 23, 2004 3:03 AM

    Andres,

    When sending documents from client to server, I prefer to view this problem as a connection between different systems issue.

    Firstly, I develop the "core" system "properly" and then consider how to connect it to other systems.

    In cases where there is a high speed connection between the systems, I prefer to expose functionality as I showed above and let the connecting system work it out (that is, unless I have to write the connecting system <grin />).

    The issue you raised above - Order.Customer.Id vs Order.CustomerId has too much background to cover in a comment, so I'll post on it later. Sometimes, these seemingly mundane issues make such a difference on how systems hang together that to let inexperienced programmers decide (as too often happens in bloated organizations, from my experience) can bring development to its knees as a result of bugs late in the lifecycle.

    No offense is intended, of course, against anyone here. I'm just venting a little :)

  • TrackBack June 1, 2004 11:38 AM

  • TrackBack June 1, 2004 12:02 PM

  • TrackBack June 2, 2004 10:13 AM

  • Barry Gervin June 8, 2004 11:51 AM

    Any one have a good example using datasets,similar to tracker or pet shop 3.0?

    So far, i like custom entities much better, but if someone can point me to a GOOD example of using a dataset across layers, i would be more then willing to give them a shot, but as of right now, i cant see how i could use them correctly.

    Thanks

  • Barry Gervin June 9, 2004 3:18 AM

    IssueVision uses a typed dataset although the sample is somewhat simplistic. So does TaskVision. http://www.windowsforms.net/default.aspx?tabIndex=7&tabId=44

  • Barry Gervin June 13, 2004 8:39 AM

    Yeah, i have looked at both and neither are that great of an example. Taskvision isnt 3 layered eitehr. I was hoping to find one like Tracker cept built with datasets.

  • TrackBack June 14, 2004 3:04 PM

  • Barry Gervin June 17, 2004 12:03 PM

    Well i did find a really good example, its got everything from mobile phone and the compact framework to a thick client app.

    http://www.learn247.net/werock247

  • Barry Gervin June 28, 2004 2:18 PM

    Barry, I think you might want to consider creating a direct link to this post right off your home. This is the best discussion I've seen so far, not only datasets vs. custom entities but also on general architecture best practices on DAL/BLL, etc.

  • Barry Gervin July 1, 2004 6:06 PM

    So... that took a while to finish reading all of the comments...

    Udi,
    (if you are still following) you talk about implementing optimistic concurrency by holding a shallow copy of the entity object. How do you do this?

    Mats,

    Why is column-level approach (for concurrency control) more scalable than using versionno(or timesamp) approach?

    Barry,

    Can you explain this "columnar storage" vs. row-based storage idea? Or point me to the right resource where I might read more on the topic?

    And all,

    How would you work with an entity, for example, Account that contains Address entity and Contact entries? And each contact entry would also contain an Address entity as well?
    So in a disconnected situation, how would you handle adds/edits/deletes to the disconnected Account entity WRT persisting the changes to the database? I would gather this is rather easy with DataSet approach but how about for custom entities?

    Thanks

  • Barry Gervin July 1, 2004 8:08 PM

    Columnar storage concept is referenced in the last paragraph here: http://objectsharp.com/Blogs/barry/archive/2004/02/24/284.aspx

  • Barry Gervin July 2, 2004 9:25 AM

    Thanks for the link. Although the linked page - in Mike Pizzo's comment - explains the concept of columnar storage well enough, it really doesn't delve into the trade-offs made when possibly switching to a row-based storage. Any ideas?

    BTW, I am reading Designing Data Tier Components and Passing Data Through Tiers from the MS P&P team. I've only read the very beginning parts but I've got to say it's already cleared some of the confusions I had regarding the topic. I'd suggest that it be a required reading if you hadn't already. Also, I should mention that Mike Pizzo is a contributor of the document.

  • Barry Gervin July 2, 2004 1:07 PM

    I've found the following in .NET Data Access Architecture Guide(another by MS P&P) on p.45:
    (sorry about the length...)

    Including All Columns in the WHERE Clause
    This option prevents you from overwriting changes made by other users between
    the time your code fetches the row and the time your code submits the pending
    change in the row. This option is the default behavior of both the Data Adapter
    Configuration Wizard and the SQL code generated by the SqlCommandBuilder.
    This approach is not a recommended practice for the following reasons:
    * If an additional column is added to the table, the query will need to be modified.
    * In general, databases do not let you compare two BLOB values because their
    large sizes make these comparisons inefficient. (Tools such as the
    CommandBuilder and the Data Adapter Configuration Wizard should not
    include BLOB columns in the WHERE clause.)
    * Comparing all columns within a table to all the columns in an updated row can
    create excessive overhead.

    Including Unique Key Columns and the Timestamp Columns
    With this option, the database updates the timestamp column to a unique value after
    each update of a row. (You must provide a timestamp column in your table.) Currently,
    neither the CommandBuilder nor the Data Adapter Configuration Wizard
    supports this option.

    Including Unique Key Columns and the Modified Columns
    In general, this option is not recommended because errors may result if your application
    logic relies on out-of-date data fields or even fields that it does not update.
    For example, if user A changes an order quantity and user B changes the unit price,
    it may be possible for the order total (quantity multiplied by price) to be incorrectly
    calculated.

    -------------
    Sounds like to me, the best option is still using a timestamp(version number) column. My question is does DataSet use option #1 or #3?

  • Barry Gervin August 6, 2004 4:51 PM

    I hear a lot of people saying how wonderful datsets are because you can bind,etc.. Seems people just don't want to write code anymore and they think Microsoft has handled all the real work for you. Hogwash!!! Personally give me an entity anyday over a heave dataset! I can write the code I need fill a list box.

  • Barry Gervin August 24, 2004 6:12 AM

    I haven't been here in a while, but it seems that so many people arrive at my blog from here that I just HAD to return.

    One short comment for now, though -

    I've recently began consulting on a project that made extensive use of datasets. Anyway, they had this teeny-tiny little problem that they weren't handling. You see, when several users used their rich client at the same time, every once in a while, a DbConcurrencyException popped up. So, the thing was that the system needed to support quite a few concurrent users for the next release, and they were worried that the fact that they just ignored (read, caught and didn't handle) the above exception would wreck their data integrity.

    Well, the clincher was that because the call to DataAdapter.Update would fail if they didn't insert rows in the correct order, update in (a different) correct order, and delete in (yet another different) correct order, that they decided to not use any foreign keys in the database.

    I could go on and on, but I guess that my point here is that because the use of datasets hid so much of the mechanics of what went on down there that the developers had dug themselves into a whole they couldn't dig their way out of.

    Bottom line, I do NOT suggest beginner developers use datasets because they won't know how (and why) to tweak the default behaviors to handle all the wierd and wonderful cases that occur in the real world.

    For the experienced developers out there, you could make any way work, so it's more a matter of taste than anything. Personally, I prefer the taste of a lean mean entity to the bloated dataset.

  • Barry Gervin August 24, 2004 8:42 AM

    I guess you don't use strings either - just char arrays right? Better yet, why not bit arrays for everything.

    There are good technical reasons for and against any technical architectural. The above is not one of them. Otherwise, if we took this advice, we'd all be programming in assembler language.

    see the law of leaky abstractions(http://www.joelonsoftware.com/articles/LeakyAbstractions.html).

  • Barry Gervin August 27, 2004 1:19 AM

    I like a combination of all methods mentioned so far. I build DataSets and DataTables for my Cached data, and use custom DataManager objects for "doing stuff" to my data.

    When a user requests a list(s) of his Stuff, I dump it into a DataSet/Table, and then manage it through a custom StuffManager class.

    Make sense?

  • TrackBack August 28, 2004 6:10 PM

  • Barry Gervin September 8, 2004 8:08 AM

    Barry,

    The law of leaky abstractions is exactly what I'm talking about. However, your comments don't speak to my point. Don't use abstractions you don't understand - when developers don't understand the issues surrounding concurrency, then using an abstraction like datasets may leave them in the dark when a DbConcurrencyException pops up.

  • Barry Gervin September 9, 2004 5:54 AM

    Lots of good discussions about custom entities and datasets. Can anyone comment on using XML to "pass data between tiers", esp. that SQLXML is evolving?

  • Barry Gervin September 13, 2004 1:25 AM

    When you say "tiers" are you referring to logical layers or to actual deployment tiers?

  • Barry Gervin September 13, 2004 7:51 AM

    I meant logical layers.

  • Barry Gervin September 14, 2004 1:57 AM

    Then why use plain xml (which is in essence the lowest common denominator) when you can use a much richer paradigm? If you're talking about passing data between deployment tiers, that's a different story. But, since the issue is moving data between object A that's in memory to object B that's also in memory, both running on the same platform and in the same process, possibly even in the same AppDomain, I see no advantage to xml besides possibly to ease the move to a distributed deployment model. But in that case, there are so many other critical issues to take care of that the fact that you're passing around xml will have a negligible impact.

  • Barry Gervin October 13, 2004 5:06 PM

    Fantastic debate..... I would like to pose a question for all of you but first some background on my working environment. I am part of a group of .NET architects at a large (very very large) company of about 80k users so perf and scale is immensely important. We are in the midst of building a framework that has the notion of a business entity. A business entity is intended to be the "over the wire" structure so footprint is very important. Today our framework doesn't dictate what the BE is. In other words it could be a datagram or a custom object collection (regardless of 1 or 1+) or a dataset (typed or not). Now all of these "work" for forms clients or web clients with varying degrees of work. Datasets are easy to use for all the reasons you guys have articulated but their footprint is large which means the over the wire size and serialization takes more time that a custom object collection. The downside to the custom object collection is binding. Binding can be done but it's not "free" as is with a DS. Furthermore, DS holds state so updated are easy. We use the following approach to determine if a custom object needs to be persisted. If indentity (id field) is not present it is a insert, if id is present and the delete bit (inherited from BE base) is not set it is a update. If id is present and the delete bit is set it is a delete. Optimisstic locking is done using datetime stamps carried on the object.

    Now onto our other problem. What to do about SOA. Our basic approach is our middle tier is exposed via web services such that we can serve as many clients as possible in a loosely coupled way. If your goal is true SOA what type of Business entity do you use? Well datasets don't make sense unless you are in a pure MSFT environment, which we are not. Custom objects work great if you are ok with losing your behaviour which really means you are using datagrams not rich objects. We have also successfully prototyped co-locating the custom object assemblies on the middle tier and client at the expense of loose coupling in favor of truely rich objects (data and behaviour) by dinking with the proxy class. To date we still struggle with what to recommend to our developers. Bottom line is it depends. If you are really SOA, datasets are out because you cannot assume anything about your consumer except that they can consume a SOAP message. I tend to lean towards the custom object approach and the persistance scheme mentioned above.

    So my question to all of you is... If you are doing SOA, and I assume some of you are doing it in a heterogenous vendor environment. What are you using as your over the wire structure? And why?

  • Barry Gervin October 16, 2004 4:19 PM

    It makes sense (almost by definition) that your BE's are over the wire.....but no matter what your BE implementation you can have multiple formats for "over the wire".

    What exactly do you mean by "Over the wire"? Remoting? DCOM (via COM+)? SOAP?

    With Soap we are likely looking at an XML format. Custom classes will serialize naturally pretty straight forward. So will Datasets but you need to think about the support of the entity to be required? Do you need diffgram functionality? If so, you are likely going to want a dataset (or something pretty close). If not, then certainly a dataset is going to be a bloated format....unless you do something special.

    When you say the dataset has a big footprint - you probably mean the wire format - the diffgram XML serialization. The internal representation is actually pretty good for the functionality and you'd likely not implement concurrency support and null value support with in your own custom entity format any smaller.

    It is possible to GetXml and strip the diffgram fluff and end up with an identical lightweight XML document that you'd get out of an XML serialized custom class. If your system was the host, you'd be doing this transformation in the Service Interface layer, and if you were in the client tier you'd reconstitute a dataset (if you wanted) in the service agent.

    You should NEVER - and I mean NEVER expose business methods (even for just returning entities) out of your system/service without wrapping them in a service agent. Likewise, you should never talk to an external service without wrapping the calls in a service agent (similar notion as a data access class).

    So the point I'm getting at - is that I don't the wire format footprint is a key factor in choosing how you structure your business entities....if your systems talk to each other through layers - which is what you should be doing to maximize your investment and help mitigate future interop risks.

    If you do thing this way - you can provide multiple service interfaces for different systems that want to talk to you - you might have a SOAP/.NET client that