c# - The union of the intersects of the 2 set combinations of a sequence of sequences -


how can find set of items occur in 2 or more sequences in sequence of sequences?

in other words, want distinct values occur in @ least 2 of passed in sequences.

note: not intersect of sequences rather, union of intersect of pairs of sequences.

note 2: not include pair, or 2 combination, of sequence itself. silly.

i have made attempt myself,

public static ienumerable<t> unionofintersects<t>(                                   ienumerable<ienumerable<t>> source) {     var pairs =             s1 in source             s2 in source             select new { s1 , s2 };      var intersects = pairs         .where(p => p.s1 != p.s2)         .select(p => p.s1.intersect(p.s2));      return intersects.selectmany(i => i).distinct(); } 

but i'm concerned might sub-optimal, think includes intersects of pair a, b , pair b, seems inefficient. think there might more efficient way compound sets iterated.


i include example input , output below:

{ { 1, 1, 2, 3, 4, 5, 7 }, { 5, 6, 7 }, { 2, 6, 7, 9 } , { 4 } } 

returns

{ 2, 4, 5, 6, 7 } 

and

{ { 1, 2, 3} } or { {} } or { } 

returns

{ } 

i'm looking best combination of readability , potential performance.


edit

i've performed initial testing of current answers, my code here. output below.

original valid:true doomeroneline valid:true doomersqllike valid:true svinja valid:true adricadar valid:true schmelter valid:true original 100000 iterations in 82ms doomeroneline 100000 iterations in 58ms doomersqllike 100000 iterations in 82ms svinja 100000 iterations in 1039ms adricadar 100000 iterations in 879ms schmelter 100000 iterations in 9ms 

at moment, looks if tim schmelter's answer performs better @ least order of magnitude.

you can try approach, might more efficient , allows specify minimum intersection-count , comparer used:

public static ienumerable<t> unionofintersects<t>(this ienumerable<ienumerable<t>> source      , int minintersectioncount     , iequalitycomparer<t> comparer = null) {     if (comparer == null) comparer = equalitycomparer<t>.default;     foreach (t item in source.selectmany(s => s).distinct(comparer))     {         int containedinhowmanysequences = 0;         foreach (ienumerable<t> seq in source)         {             bool contained = seq.contains(item, comparer);             if (contained) containedinhowmanysequences++;             if (containedinhowmanysequences == minintersectioncount)             {                 yield return item;                 break;             }         }     } } 

some explaining words:

  • it enumerates unique items in sequences. since distinct using set should pretty efficient. can speed in case of many duplicates in sequences.
  • the inner loop looks every sequence if unique item contained. thefore uses enumerable.contains stops execution 1 item found(so duplicates no issue).
  • if intersection-count reaches minum intersection count item yielded , next (unique) item checked.

Comments