I use SQL Server 2016
and I have a very busy DataFlow task
. In my DataFlow task
, I use Multicast component
for some reason. After creating a new Flow in my DataFlow
, I need to delete some of the columns in the new flow because they are useless.
我使用SQL Server 2016,我有一个非常繁忙的DataFlow任务。在我的DataFlow任务中,由于某些原因,我使用了多播组件。在我的DataFlow中创建一个新流之后,我需要删除新流中的一些列,因为它们是无用的。
Just for more information, I need to do that because I have more than 200 columns in my flow and I need less than 10 of those columns.
为了获得更多信息,我需要这样做,因为我的流中有超过200列,而我需要的列少于10列。
How can I delete the columns in DataFlow Task
in SSIS?
如何删除SSIS中的DataFlow任务中的列?
1
You can add an extra component of some sort. However, this will never reduce complexity or improve performance. Just thinking about it, logically, you are adding an additional interface that needs to be maintained. Performance-wise, anything that will eliminate columns means copying one set of rows from one buffer to a whole other buffer. This is called an asynchronous transformation, and it is better described here and here. You can imagine that copying rows is less efficient than updating them in place.
您可以添加某种额外的组件。然而,这永远不会降低复杂性或提高性能。从逻辑上考虑,您正在添加一个需要维护的额外接口。在性能方面,任何消除列的方法都意味着将一组行从一个缓冲区复制到另一个缓冲区。这被称为异步转换,在这里和这里都有更好的描述。可以想象,复制行比在适当的地方更新行效率要低。
Here are some recommendations for reducing complexity, which will, in turn, improve performance:
以下是一些减少复杂性的建议,反过来将提高性能:
These guidelines will get you headed in the general direction, but do post more questions for tuning specific performance problems.
这些指导方针将使您朝着大致的方向前进,但是在调优具体的性能问题时,要做更多的问题。
1
I believe that you can pass just one data flow path to a UNION ALL
task to remove columns from that single data flow.
我相信,您可以只传递一个数据流路径到一个UNION ALL任务,以从单个数据流中删除列。
Take the single data flow path that you would like to remove columns from and pass it to a Union All
task. Then open up the Union All
task right click on the column(s) you would like to remove from that path and select delete.
取您希望从其中删除列的单个数据流路径,并将其传递给Union All任务。然后打开Union All task右键单击要从该路径中删除的列并选择delete。
Usually I think the source of the data should be altered to not send the unwanted columns out, but your case is special. With one path out of the multicast needing all of the columns from the source, while one path does not.
通常,我认为应该修改数据的来源,以不将不需要的列发送出去,但您的情况是特殊的。多播中的一条路径需要来自源的所有列,而另一条路径不需要。
1
First of all, i don't think that what you are asking will give a better performance because the data is loaded from source then multiplied when using Multicast
Then The component that will reduce the column number
...
首先,我不认为你所问的会带来更好的性能,因为数据是从源加载的,然后在使用多播时再乘以组件,这会减少列数……
You can do this multiple way:
你可以用多种方法:
If you can create another DataFlow Task
with a Reduced columns source (ex: OLEDB command with specific columns) it is better
如果您可以使用减少的列源创建另一个DataFlow任务(例如:OLEDB命令与特定的列),那就更好了。
You can add Script component
with an Asynchronous Output (like shown in the image below) and add the specifid columns to the output, map them using a Vb.net or C# script, something like this:
您可以添加带有异步输出的脚本组件(如下图所示),并将specifid列添加到输出中,使用Vb.net或c#脚本对它们进行映射,如下所示:
Output0Buffer.AddRow()
Output0Budder.OutColumn = Row.inColumn
UNION ALL
component and select the columns you needSide Note: It is good to test each scenario performance and choose the better
附注:最好测试每个场景的性能并选择更好的
本站翻译的文章,版权归属于本站,未经许可禁止转摘,转摘请注明本文地址:http://www.silva-art.net/blog/2017/03/08/d322718bc54d28a267c8d624f7bb1f15.html。