dslreports logo
site
 
    All Forums Hot Topics Gallery
spc

spacer




how-to block ads


Search Topic:
uniqs
626
share rss forum feed


drew
Automatic
Premium
join:2002-07-10
Port Orchard, WA
kudos:6

Data Load Question

This is a long one that's half blog post, half question. Check for the bold for the question.

We are re-designing an existing information system at work. This IS is a work mangement tool that ends up combining several items "local" and "corporate" databases into a single database that can then be assigned/tracked/nickel/dimed/death by people who do such things.

This is our first "Agile" project and I'm enjoying the accountability and buy-in from our "customer" (who also happens to be another department) as they were a major part of the architecting the new system from a conceptual standpoint. We just finished Iteration 0 and one of the tasks for Iteration 1 is to get the first data interface into our application's database. Below you will find some parts of the domain model detailed, however the names have been changed as appropriate.

Our technology stack is: C#, .NET 4.0, EntityFramework 4.3, ASP.NET MVC 4 with Razor, SQL Server 2008 R2. I recognize this may not be your flavor of technology, but this is where we are heading as a shop and I don't need to hear about how great technology X or Y is for a particular task. We won't be using it, sorry.

This is our "MASTER" work item. There are some abstract classes out there that Task inherits from, but ignore those for the sake of our discussion.
cs code:
public class Task
{
public int Id { get; set; }
public string Name { get; set; }
public string Description { get; set; }
public DateTime? MasterCompletionDate { get; set; }
}

This is an external database work item that will contain all of the properties of Task, but has a few of its own properties that are unique to it.

cs code:
public class RfvTask : Task
{
public string InternalNumber { get; set; } //yes, this is a string, remember, these are databases we have no control over
public string InternalNumberRevision { get; set; }
}

All work items of type RfvTask are going to be provided by the external database named Rfv. We plan on using either SSIS packages or T-SQL stored procedures to fill RfvTask items into the database.

Using Table-Per-Type (TPT) Inheritance, we are storing RfvTask work items in the Task AND RfvTask tables using the following code

cs code:
modelBuilder.Entity<Task>().ToTable("Task");
modelBuilder.Entity<RfvTask>().ToTable("RfvTask");

This means the project's database has the following tables with the following properties: *Note: DataAnnotations have been placed in the project but I have not replicated them here, so please ignore the inefficient table pages*


Table: Task
Id (int, not null, PK)
Name (nvarchar(###), null)
Description (nvarchar(####), null)
MasterCompletionDate (datetime, null) **

Table: RfvTask
Id (int, not null, PK)
InternalNumber (nvarchar(##), null)
InternalNumberRevision (nvarchar(#), null)


**: If anyone has any tips as to why the DateTime property gets created as datetime in SQL server instead of datetime2 even when we've used Fluent API to tell it to map the property to datetime2, let me know.

The RfvTask.Id property is going to also be the FK for the Task table. This means that when I'm returning an object of type RfvTask, RfvTask.Name is going to come from the Task table and RfvTask.InternalNumber is going to come from the RfvTask table.

The question
How would you implement the nightly load of data into the system from the Rfv database? Some concerns:
•As the IDENTITY (Id) is generated by the SQL server, how would you use SSIS to insert the records? Our PK will *never* come from the data sources.
•We are going to need to insert new records from Rfv (and all other data sources we end up implementing) and also UPDATE existing records. If Description is updated in Rfv, it needs to get updated in this new application during the next interface run.
•Remember: there are going to be at least 8 different external interfaces.

I really appreciate the discussion here. I am extremely green and this project is well outside my comfort zone.

--
flickr | 'Cause I've been waiting, all my life just waiting
For you to shine, shine your light on me


DC DSL
There's a reason I'm Command.
Premium
join:2000-07-30
Washington, DC
kudos:2

said by drew:

The question
How would you implement the nightly load of data into the system from the Rfv database? Some concerns:
•As the IDENTITY (Id) is generated by the SQL server, how would you use SSIS to insert the records? Our PK will *never* come from the data sources.
•We are going to need to insert new records from Rfv (and all other data sources we end up implementing) and also UPDATE existing records. If Description is updated in Rfv, it needs to get updated in this new application during the next interface run.
•Remember: there are going to be at least 8 different external interfaces.

I really appreciate the discussion here. I am extremely green and this project is well outside my comfort zone.

Personally, I avoid the EF completely and roll my own bulk loaders/updaters in the BLL and DAL calls. A bit more work but it gives far greater flexibility and absolute control over processing and exception/error handling.

This article should give you a bit of insight as to how to go about it. Reading it makes it apparent that the amount of effort is better put into just rolling your own.

--
"Dance like the photo isn't being tagged; love like you've never been unfriended; and tweet like nobody is following."


drew
Automatic
Premium
join:2002-07-10
Port Orchard, WA
kudos:6

We aren't planning on using EF to do the inserts from external sources.

I am just really surprised that SSIS doesn't make this brain-dead simple.



DC DSL
There's a reason I'm Command.
Premium
join:2000-07-30
Washington, DC
kudos:2

Ah. You mentioned EF in your stack so I figured you were beholden to it. Avoid it as much as you can. Preferably, ditch it in its entirety and just create your own classes/objects. I use my own stock base classes and interfaces, and manager framework, that are beaucoup more flexible and less baggage-laden.

SSIS is a tool that allows for generic, app-agnostic ETL for DBAs who aren't developers. That works fine, unless you need to perform transformations or BR enforcement beyond basic validation/sanity checking. It also just plain sucks when you have parent-child relationships where SS is generating an Identity PK. Just give it to a developer to write.
--
"Dance like the photo isn't being tagged; love like you've never been unfriended; and tweet like nobody is following."



drew
Automatic
Premium
join:2002-07-10
Port Orchard, WA
kudos:6

I *am* one of the developers :P

My particular task this iteration is to get this working.

I'm trying to vastly simplify this as easily as I can. Start small, work my way up.




You can see how this particular data item needs to be broken out. Considering the PK/FK relationship in RfvTask how do you propose doing the insert query? I've spent far too long googling this and all of the examples using the OUTPUT parameter fall short for my use case. SCOPE_IDENTITY() is also problematic.

I'm not sure why I'm failing at this so bad!

--
flickr | 'Cause I've been waiting, all my life just waiting
For you to shine, shine your light on me


DC DSL
There's a reason I'm Command.
Premium
join:2000-07-30
Washington, DC
kudos:2

No matter how you slice it, you have to capture SCOPE_IDENTITY() from the INSERT and return it to the caller. This is quite easy.

CREATE PROCEDURE sprocInsertParent
(
    @ident int output = -1,
    @fld1 varchar(20)
)
AS
INSERT INTO parentTbl
(
    column_uno
)
VALUES (
    @fld1
);
SELECT @ident = SCOPE_IDENTITY();

---

CREATE PROCEDURE sprocInsertChild
(
    @childIdent int output = -1,
    @parentIdent int,
    @childFld1 int,
    @childFld2 varchar(50)
)
AS
INSERT INTO childTbl
(
    column_parent_id,
    column_child_uno,
    column_child_due
)
VALUES
(
    @parentID,
    @childFld1,
    @childFld2
);
SELECT @childIdent = SCOPE_IDENTITY();

Your business objects would be like:

Public cParent : someBase
{
//greatly over-simplified
public int systemID {get; set; }
public string fieldUno {get; set; }
public list kiddies = new list();
}

Public cChild : someBase
//also greatly over-simplified
{
public int childSystemID {get; set; }
public int parentSystemID { get; set; }
public int fieldUno {get; set; }
public string fieldDue {get; set; }
}

Default the ID values to Integer.MinValue so you can differentiate a "new" item.

Have a list sourceRecs;
Iterate through the source data, create a parent record, then this.kiddies.add all of it's child records.
sourceRecs.add this;

To move them to the db, create a SQLCommand for the parent insert and another for the child insert sprocs. Be sure you have a SQLParameter with .Direction=Output for the ID.

Iterate through the sourceRecs collection.
Create a SQLTransaction
Assign the parent values to the .SqlParameter("whatever").Value of each parameter.
parentInsertProcCmd.Execute
Assign the return parameter's value to the parent's ID property.
If it's a positive, non-zero, iterate through your child collection assign the same value to each member's parent ID
assign the child values to the childInsert proc parameters
childInsertProcCmd.Execute
assign the child's id returned to the child member.
commit the transaction.

(Sorry about how this may look...I'm doing it via phone.)

That's just the gist of it.

In SSIS script, you would be doing the same thing but you would have to replicate whatever validation or business rules you may have...rather than simply relying on the app's functionality.
--
"Dance like the photo isn't being tagged; love like you've never been unfriended; and tweet like nobody is following."


drew
Automatic
Premium
join:2002-07-10
Port Orchard, WA
kudos:6

That's the approach that a co-worker, who originally architected the domain model before moving off the project, suggested - except using another project inside our solution; ensuring we used our repositories for doing the heavy lifting.

As the solution hadn't been downcoverted to use EF4 yet, I couldn't really make it work. So, I spent the morning/afternoon figuring out how to do it in strict T-SQL.

The gist of how I solved the problem is:
•Insert the data coming from RfvMasterDbTaskTable into a table variable with its own IDENTITY column.
•Use the table variable's Id field to use a WHILE loop and iterate through the data using two separate INSERT INTO statements that SELECT from the table variable.

The approach I took ended up dealing very well with updates vs. inserts as I already had all the data I needed to compare a new record vs. an old one.

I will try to sanitize the T-SQL I wrote and post it here in case some poor soul comes across it.
--
flickr | 'Cause I've been waiting, all my life just waiting
For you to shine, shine your light on me