Array Design (week of 8/17/06)
Arrays in ASAP
Dense or sparse
regular or irregular strides
We should probably implement these each as different classes but provide methods to convert from one to another as contents change
It seems like everyone has efficient techniques for organizing arrays on disk, but few address the in-memory problem
Is it too simple/already solved implicitly?
What do we want to be the driving feature of this design? Search-able? Chunk-able? Efficient I/O?
For the simple (regular stride, dense) case we could adopt something like this: http://www.boost.org/libs/multi_array/doc/user.html and extend and optimize it for the other cases.
It supports some stride features
Blitz++ seems like the best option for simple but optimized array management
I have successfully built it with the default configuration on the CS machines
We may need to modify this library for partial evaluation when we break arrays into chunks
Although POOMA might be even closer.
Its inherent parallelism might do too much of our work for us / interfere
Borealis
Need to add arrays in two phases
Expressions - first simple number ones with math ops and progress on to supporting more custom types like strings and irregular strides.
What is the best way to pass around these arrays in memory? Smart pointers? Heap allocated?
How do we want to store sparse arrays wrt to generic arrays?
Boxes
Handle arrays serialization of arrays?
Make boxes co-located for streams?
This is where most of the optimization will happen, but will require expression implementations as a building block
Will have to coincide with distributed chunking through partial evaluation
Need to handle chunking as well as reuniting the pieces later
Would need to be divided up into array characteristics for ASAP for return types
What do we need to do differently with our db operators?
Will need to write wrappers for the Blitz operators in the form of expressions of type array
Operators needed:
+, -, *, / (for vectors and scalars)
==, !=, <, <=, >, >=
dot product
cross product -half implemented in Blitz
dimension getter
[]
transpose
invert -not implemented by Blitz
Catalog management
Borealis Expression Library
Aggregate -Parse and handle aggregate box. StandardAggregate contains most of the implementation. This will be minimally impacted by arrays in the expression phase because it only uses expressions in its arguments.
EString -Basic string type for standard functions. May make a user-defined type for Arrays.
EStringPool -manages allocation and string management. Will not be impacted by arrays.
ExprLexer -Lexical analyzer for string expressions. Invoked in expression parse method. Takes input streams and breaks them into tokens. Will need to add previously unimplemented operators like the dot product and brackets.
ExprParser -Builds an expression tree from lexer output. If we build the expressions properly this will probably require minimal modification, only for new ops
ExprParserTokenTypes.txt -Defines strings to be regarded as tokens. Will need to add our new ops and type to this list.
Will we need to have an input method in XML for comparing arrays (i.e. a filter predicate that could ask if an array is 3x3 and filled with ones)?
Expr.g -Add support for dot, [], etc. I am leaning against making arrays declarable in our initial pass (as in the above example).
ExprUtil -Support functions
Will need to add appropriate field types for arrays
Handles math and relational ops and precedence in generic, templated way
This part may be fine without modification
ExprWalker -This generates and evaluates(?) expressions from SQL-style queries. Do boxes translate themselves into this in their respective classes? This will require no modification if everything is implemented in the standard fashion.
Expression -Base class that all expressions are built from. May want to add a hook to this base class to differentiate between array and scalar contents for memory management purposes. Will need to implement is_impl and evalImpl methods for our standard functions.
EvalContext -Tells the expression which tuples it refers to
MathFunctions -Implementations of various templated math ops. Will need TypedArray versions of this. These will be wrappers for Blitz equivalent library calls
Get arg(n) of expression and evaluate it in the given context
Return appropriate function applied to its children
expression walker and Function use makeExpression to generate expressions for later evaluation
NArgs - This enforces that an expression has the correct number of arguments. It also has the base class for Function. The Function factories will need to be modified to create array funcs.
SQL* - Manage and access tables of expressions. This will require minimal to no intervention for arrays.
Arrays should probably be like TypedExpression<>.
TypedArrayExpression<T>
Fields (arrays) can derive from this
StandardAggregate -Window management and other standard functions. Should not be impacted by arrays
StandardFunctions -Generic functions such as min/max/strlen/int. Mostly helpers. Will need to add a few of our own here.
modules/common/DataType - Will need to add our type(s) to this. Is there any convenient way to make this templatable? i.e Array<int> [3][3][3] It does not look like it at first glance.. This will make the from string method not have the correct length.
Boxes
May do array allocation and memory management here
What is a table wrt QBoxes?
Qbox - Constructs streams and matches with schemas. Will need to interface to catalog for type checking an validation of streams
Will need to deal with the Tuple creation to allocate arrays.
How/where is that done?
Tuples are treated as char *s
The notes seem to indicate that this is managed by the node object -Check out AuroraNode.*
Tuple I/O is managed here, but what about allocs?
Use join as an example b/c it requires allocation to create a new tuple of a different type
check out produce_dont_emit...
Check out the construction and reception of tuples in example clients
Perhaps the Marshal code handles this
Basic Design Ideas
Initially basic class will be just an array declaration in Blitz
What type of wrapping will it need to be a tuple?
Headers need the following: (b/c everything in the tuple passing around is cast to being string or void *)
dimensions, size and type info
sub-array location wrt to larger array for partial evaluation
Sensor location?
flags to denote array characteristics (dense/sparse, irregular/regular, etc.)
Partial eval implementation strategy will largely depend on how the data is chunked
Perhaps do n-dimensional chunks in Stonebraker manner just to get started
This can be added later as a type of operator
Will have another class for irregular strides which will be a user-defined templated type of array blitz
Should this be a standard array or 1-D that we can search on to get values? (The answer may depend on sparse/dense call)
How this work for math ops? We could require that all indices match, but that will make type checking take O(n)
Similar problems for sparse arrays
I know we do not plan to allocate for cells not filled, so this may need to be a search-able vector of values
Array Precedence order:
Dense, irregular
Dense, regular
Sparse, irregular
Sparse, regular
Size may be used to determine when/how to chunk
Will declare arrays as one of these types and write operators to make them compatible with each other
Distinction may be lower level (i.e. all derive from GenericArray, which they are cast to for passing around, but operators look for flags in order to evaluate most efficiently/compatibly)
Will probably need to modify the way queryprocessor receives the data to serialize the data for allocation as *FieldExpressions
Ask Brad about the place to do this
Most of our first pass will be writing wrappers for basic Blitz operations and appropriate headers for data being passed around
Arrays will be declared as ArrayFieldExpressions and heap allocated like all other concrete field expressions
will need arguments to know how much memory it will need
Trace down how output / intermediate results are calculated for further memory details
Basic Data Flow
Array and meta-data arrive, either apart or together
Meta-data is the array size, dimensions, base type, sequence number in stream, sub-array info, etc.
Timestamp when it gets there for windowing purposes
Array is simply values, one after another which will be cast into a basic array before we run the Blitz constructor on it
Blitz object is inserted into a table
This is a many-to-one relationship (many ids-one blitz obj)
hash pairs are ids-to-ptr-to-Blitz object
ID is a hash of stream id, sequence number and timestamp (to facilitate windows)
Acceptor then notifies all boxes that are interested in the stream of the array that just arrived
Will need to use/extend enqueue methods of traditional boxes (or more likely create entirely new boxes)
Boxes need to support scalars as well as arrays
Extending current boxes
+ can capitalize on previous work and code
+ most of the data evaluation will be the same
- data load mechanism will probably be different
-foundation may already be over extended
-learning curve
Create new boxes
+ Fresh, potentially cleaner design
+ Can benefit from lessons learned
- More work/investment/development time
If a box can modify a tuple or create a new one (i.e. join or map) then it creates a new entry in the table as its out stream
May have an option for deltas if they are of the same schema
Else if a box cannot modify a tuple (i.e. filter) then it creates a new id in the hash table that points to the previous Blitz entry and the acceptor will notify anyone listening on its out stream
Brad: How does Borealis know when to send a stream out across the network? I.e. when the out stream is listened to by a box on another node.
May need to build a mini-router to get out-streams to the right ArrayAcceptor
This will hinge on previous work
When an out stream is generated the ArrayAcceptor will also check with other nodes to see if they are interested in it
If so, strip out Blitz stuff and pass along in the same way it was received
Determine which is a bigger bottleneck: transportation time or bookkeeping
If we choose to break up the arrays and tuples, how will we make them synchronize?
Sequence number
Does the ArrayAcceptor take both of them or should the meta-data go directly to the box(es)?
I think that the acceptor needs both of them b/c meta-data is necessary for construction
The box should just take the hash id from the ArrayAcceptor
The ArrayAcceptor sits at the node level, as does the table
Brad: Could this be part of AuroraNode or a side method?
What does AuroraNode encompass now?
Basic Construction
Predicates will be cell-based, but may allow wild carding in the form of comparison to scalar
Schema builder will need [] implemented
ExpressionHandler may need to be rewritten to handle different array types as well as partial evaluation
Handle multi-host eval as well as multiple procs?
Ops will be polymorphic
Prime areas for optimization
Caching strategies:
when does it make more sense to have arrays on disk?
Should table contents be in-memory only or could larger aggregates be supported by not limiting ourselves?
Disk usage could also be crucial for very large arrays
Statefulness and Windowing:
How big do we make our table?
When do we hold on to values and when does it make sense to drop them from this table cache?
Would it make sense to designate which streams have aggregates associated with them and hold on to more than one value only for those?
Size cut-off? i.e. if a stream is holding more than its fair share of the table space
What is the best place (node-wise) to store this historical data we eventually want to integrate?
We could divide it up categorically (i.e. spatial location on a map, or where it lies on a timeline)
It would be nice to have some breakdown that does not require centralized look up
When we do access it should we regard it as a stream or treat it entirely differently?
The ArrayAcceptor could be a good way to encapsulate this.
Having the ArrayAcceptor/TableManager be its own module could be sensible b/c it will be doing a variety of things
Historical data loader
Caching optimizer
Stream notifier / router (push-based)
I need to start thinking about finding a camera or something for an initial data stream.
Push vs. Pull
Pull
+consistent polling, perhaps less work for the node
-may be more complex; have more overhead
Push
+simple hand-off, easier traceability
+seems more intuitive from a stream processing perspective
-may require more up front setup
-puts more burden on the node
Table vs. Array Part of Tuple (2 pipes vs. 1)
Table
+Minimizes duplication
+encapsulation of memory management / optimization techniques
+ may be more extensible b/c we are designing from the ground up
-more initial time / planning investment
-may break previous structures
Tuples with Arrays
+Simpler design, faster implementation
+less apt to break previous team assumptions
-may stretch existing infrastructure a little far2
Do scaling or delta image work for original demo
Proof out xml this week
Non-uniform scaling requires a lot of conditionals for boundaries of the image as well as for scaling up versus down. In traditional Borealis boxes this will require a lot of them. Potential implementation strategies:
Tons and Tons of Boxes
+ Conforms to standard DB operators
+
Will flex the majority of our new capability
- Ugly to code
-
harder to debug
Implementing Scale as an arithmetic operator
+
simpler design
+ code essentially already done
- Seems like a
hack
- Will not be a good showcase for SPE
Making a single
custom box
+ Easier optimization
- Is this splitting hairs
with the arithmetic op one?
Hybrid Solution:
Specialized boxes such as vertical_scale_up, with traditional filters
to route them to the right boxes
+ still shows stream processing /
db ops
+ Simplifies out boundary conditions and allows faux
statefulness (i.e. calculating ratio of old size to new size)
-
Will need to implement custom boxes
Do we want to have each query chain have its own copy of the array to operate on in place or is it preferable for them all to share one copy?
- Going with in-place stream operations right now.
Does it make sense to break the arrays up in cases like filter that have partial matches? How can we handle these break ups? Designate output by indices or ? Do images make sense in arbitrary partial form? Might it make more sense that if we have a match to the predicate we return a sub-array?
Box design
Needs to:
Take in array via loading mechanism
Will take in array header info and pointer to Blitz object
Optionally apply transformations to it in place
Be reliant on new expressions, fields and grammar
Enqueue the array header and pointer to the next box in its query evaluation chain
Should be sending roughly the same chunk of memory (may be partially re-alloc'd to make it smaller or larger data for the box at hand)
Array header will conform to XML schema
How does a box get its XML schema?
Will our designed loading mechanism have any major pitfalls w/the current box implementation?
Does it make more sense to create new array boxes or can the current ones be extended to accept these Blitz pointers?
Blitz may not work b/c it needs to know dimensions at compile time. Unfortunately boost is no better on that front. (May require custom expression evaluator regardless of partial eval after all).
Can use *=, +=, etc. to avoid unnecessary copies in Blitz?
Boxes
All: need to adapt to looking up pointers to arrays.
Should boxes have a way of indicating to the ArrayAcceptor/Manager how many tuples in a stream it may need?
Filter: When acting on an array it will pass on cells that match and set the rest of cells to null in its out stream
Map: Apply transformations in place to matrices and send to out stream
Union: Will probably not change much
Bsort: Will do comparisons and emissions in place. Need to make sure that table keeps n instances of the array.
Aggregate: The way we handle this depends on both the aggregating operation as well as the window. For min(..), max(..) we can just hang on to the appropriate answer and compare it with the rest of the inputs. For average, we will need to hang on to <window size> number of arrays. Is there a better way to do this? Are there other operators that we should implement for aggregate?
Join: Will hopefully not have to buffer too many arrays at once, may want to artificially limit number or work out caching scheme if it is too many. Will have to do the same with aggregate. Caching scheme may have been required anyway (as opposed to having all the tuples in memory at once).
Resample: This will use the solution to the problems of aggregate operators
Custom Boxes: What will we need for input information? What are the requirements of custom box coding now?
Expression Evaluation
If we are using Blitz we need to pass in the array dimensions at allocation time
We should allocate for the largest the array will get in its query chain because re-allocation will be tricky, if possible
Wildcarding
Will need to be done right --> left
E.g. int[*][*][3] < int[3]
Do we want to make the * an explicit or implicit part of the grammar?
For now I am leaning toward implicit (i.e. scale example)
Arrays can be compared cell by cell to a scalar as well
Implementation note: may want to build using matrix and vector data types where appropriate to capitalize on previous optimizations? Check this out
Arithmetic Evaluation
This will be done primarily in Blitz++
See above notes about implementation requirements imposed by Blitz
Custom boxes will be needed for some of our ASAP ops
These may just use Blitz math ops, or go proprietary based on domain specific optimizations
Some of these (i.e. scale) may benefit from being composites of previous ops
Does it make more sense to build an entirely new expression evaluator ? Diverge from the previous one and integrate later? Extend the previous one?
What lessons have we learned implementation-wise from our last expression evaluator?
Model for Generic Expression Evaluation (we may modify/optimize for specific apps)
Recursive Descent Parsing?
Leave it open for piecewise evaluation by making tokens more accessible and evaluatable from outside the tree
We may have to do this anyway due to limitations on tuple size
Random access look up?
Assembly at the end after potential time slicing or farming out?
Alternatives to basic tree struct?
Memory management / caching schemes?
The expression manager needs to
Parse expression strings
Create symbolic representations of arithmetic and comparison expressions
Evaluate these expressions when values arrive efficiently
Have an easy to use API for programmers to both create, modify and access expressions in their code
Composable?
Extensible (i.e. setup for partial evaluation, varying types of arrays, etc.)
The expression library may be the same, with slicing and dicing handled in pre and post processing
May need a base type of evaluation for when we know nothing about the array/scalar value that our more specialized evals can derive from as necessary (for sparse or clumped arrays)
Reflection in Field creation and evaluation
This will enable us to both see what templated types the instances of FieldExpressions are and determine what ops are available to us on the basis of type
Blitz offers more ops for vectors and arrays, so shortcutting to those for 1 and 2-d arrays would be nice
Need improved visibility of expressions
Should be able to modify them without rebuilding / creating new objects?
Who is going to primarily be using this library:
Box implementers
Prediction implementers
Needs to be fairly generic, specific apps will build on basic arithmetic and comparison ops, few to no optimizations will be transformations
Reflection Libraries
+ Seems relatively lightweight and complete
- templating implementation seems hazy
- no guaranteed support for gcc-4.*
Seems to capitalize on RTTI checking mostly
Seems to have a relatively simple API
Do we need to rewrite the whole thing or just the FieldExpressions, acceptable template types, etc.?
Do we need to rewrite it all so that expression components are more accessible?
Our current expression model is extremely obfuscated.
We do need to retain the context model and perhaps expression walker.
Major modules:
ExpressionParser ?#8364;“ takes in string, produces tree form of expression (as well as perhaps other access methods)
ExpressionEvaluator ?#8364;“ evaluates expressions when a tuple comes in that is of the relevant stream
Expression itself will need to at least add functions for concatenation, append, etc. as well as defining common math ops and iterator functions
May want to allow binding operator, although I am not sure how this will fit into our current XML paradigm
Should we make the context more loosely coupled with the expressions themselves?
If an entire expression is regarded as one object how should it be accessed?
Root of a tree for traversal?
.as_string() type of function?
A combination of both? (or multiple access methods?)
Most of the programmer's access will be for debugging purposes or possibly composite functions in the case of predictions
Proposal Notes
coordinate info - int32 (coordinate must be declared like a schema) (coordinates are loaded into a table as declared with the XML) Coordinate systems exist in categories/ classes. A coordinate system declaration should include the coordinate system type (i.e GPS or X, Y, Z for airplanes), and two functions: : one to transform it to the user-defined generic format of its coordinate type and the other to get data of its type to its format.
Wire Format
Header Info:
int n = num_dimensions
enum base_type (can be int, float, etc.)
int dim[n] the size of each dimension
int32 coordinate_system;
bool partition
bool dense
bool regular_stride
float stride_size[n]
partition dimensions[n]
partition number - tells us which partition we are dealing with in row-major order
float sensor_location[3]
Dense Array, Regular Stride Representation:
one element after another in row-major form; 0/null for blank values
Sparse Array (stride does not matter, n-dimensional array):
n floats, one for each dimension coordinate followed by the cell value; repeated in row major form for each value in the array
Dense Array; irregular stride
Each cell in row-major format prefaced with its coordinate
Tuples within Borealis
When arrays on the server/Borealis side a module called an ArrayAcceptor will intercept them and convert the serialized data into a Blitz++ Object. It will then wrap it in a smart pointer which will be put into any tuple referring to it on that host. The tuple itself will contain the header information structure. Arrays will be operated on in place by each box where ever possible. The only time an array should need to be copied is when a stream diverges down two different paths that will modify it. The majority of the modifications between this and our current version of Borealis will revolve around memory management and rewriting most or all of our expression library.
Expression Wildcarding
When arrays are compared in expressions or predicates they do not have to match the dimension of what they are being compared to. They do need to correspond right to left though. For example, an int[640][480][3] can be compared to an int[3] or int[480][3] but not int[640]. Arrays can also be compared to scalars on a cell by cell basis.
Box Semantics
Aggregate – will accept a window of tuples, with window length being defined by either tuples or window size. Slide will be defined similarly. Windowing will be managed by dynamic reference counting on each array object. Each time an array is added to a window the reference count is increased (just as it is when it is accessed by any non-windowing box). When it is removed from the window the reference count is decremented. When the reference count goes to zero the array's destructor is automatically called. Aggregate will continue to operate with the standard functions it has now (min, max, average, etc.). The results will be a composite result based on each cell (i.e. a min of a 640x480 image will return the minimum value for each value over all of the frames in that window)
Join - Joins will have a windowing scheme similar to aggregates. If we are joining two arrays we will create a new one with an extra dimension with pointers to the previous two arrays.
Filter – Filter will compare arrays or scalars. In the case of a superset being compared to a subset (i.e. int[640][480][3] < int[3]) filter will iterate through and null out values that do not match the predicate.
Map - Map will apply operators found in our rewritten expression library. It will not differ much from our previous implementation.
Concatenate – This will basically be a special case of join. It will check to see if the dimensions of the two incoming arrays are of compatible dimensions and throw an exception if they are not.
Locate(Array1, Array2, function, slide[n]) – The box will first check to see that the two arrays are roughly compatible. The function will need to define the size of chunks in an XML schema before it can be passed into a locate box. This does not necessarily have to be the size of Array2, the function may scale, rotate the feature, etc. Like our windowing scheme, this may benefit from defining both the sub-array sizes that are passed into the function as well as slide/overlap (in case the feature is between two blocks). Slide will tell us how much to move in each dimension of Array1 (Array1 is n-dimensional). After all of this is established, Locate will iterate through Array1 in row-major order a slide unit each iteration. It will collect the function returns as an array which will be its output. Each function return will be in the form of one array of size 2n. The first n values will be the lower left hand corner of the location of the found feature. The second n values will be the size of the match in each dimension.
Pivot(array, attribute_list_1, attribute_list_2) – Pivot will substitute the contents of the dimensions specified in the first list with the values in the second list. This will operate on the array in place.
Regrid – This seems like it will be a special case of Map, with more extensibility regarding dimensions. Or should map support the in and out array having different dimensions?
Schema Examples
A video frame and scaling vector:
<schema name="frame_and_scale">
<field name="time" type="int32"/>
<field name="frame" type="int[800][600][3]"/>
<field name="vector" type="float[2]"/>
</schema>
Marshal would translate to a header of :
struct frame_and_scale_frameField_header
{
int num_dimensions = 3;
enum base_type = int32; // may have to make this more flexible for user-defined types later
int dim[n] = (800, 600, 3);
int32 coordinate_system = 0
bool partition =0;
bool dense = 1;
bool regular_stride = 1;
float stride_size[n] = (1, 1, 1);
partition_dimensions[n] = (0, 0, 0)
partition_number – 0;
float sensor_location[3] = (0, 0, 0)
}
When the client sends a tuple it will pass in a basic C++ array (i.e. int[800][600][3]) Blitz is not necessary on this side of the equation. The client side will flatten the array and send it over the wire. The ArrayAcceptor will de-serialize it, etc. The rest of this is described above.

Basic Architecture Layout
9/29 - Brad added a 1-bit flag to the tuple header to designate whether or not it contains arrays
FilterArray Add – created it in the src/modules/queryprocessor/boxes directory, conformed to standard form
Modified Files
src/modules/queryprocessor/boxes/Makefile.in – added it to the list of boxes to build
src/modules/queryprocessor/boxes/Makefile.am – added it to the list of boxes to build
src/modules/catalog/CatalogBox.cc – made it use the infer_filter_out function to get out stream; set is_invertible ;
src/modules/catalog/CatalogValidate.cc – added to list of intrinsic boxes, validated that it had just one in and out stream, integrated it into the expresssion parser
sr/modules/queryprocessor/runtime/QBox.cc – added it to the instantiation code