Array Design (week of 8/17/06)

Arrays in ASAP
- Dense or sparse
- regular or irregular strides
We should probably implement these each as different classes but provide methods to convert from one to another as contents change
It seems like everyone has efficient techniques for organizing arrays on disk, but few address the in-memory problem
Is it too simple/already solved implicitly?
What do we want to be the driving feature of this design? Search-able? Chunk-able? Efficient I/O?
For the simple (regular stride, dense) case we could adopt something like this: http://www.boost.org/libs/multi_array/doc/user.html and extend and optimize it for the other cases.
- It supports some stride features
Blitz++ seems like the best option for simple but optimized array management
- I have successfully built it with the default configuration on the CS machines
- We may need to modify this library for partial evaluation when we break arrays into chunks
Although POOMA might be even closer.
- Its inherent parallelism might do too much of our work for us / interfere

Borealis

Need to add arrays in two phases
- Expressions - first simple number ones with math ops and progress on to supporting more custom types like strings and irregular strides.
  - What is the best way to pass around these arrays in memory? Smart pointers? Heap allocated?
  - How do we want to store sparse arrays wrt to generic arrays?
- Boxes
  - Handle arrays serialization of arrays?
  - Make boxes co-located for streams?
  - This is where most of the optimization will happen, but will require expression implementations as a building block
  - Will have to coincide with distributed chunking through partial evaluation
    - Need to handle chunking as well as reuniting the pieces later
Would need to be divided up into array characteristics for ASAP for return types
- What do we need to do differently with our db operators?
- Will need to write wrappers for the Blitz operators in the form of expressions of type array
- Operators needed:
  - +, -, *, / (for vectors and scalars)
  - ==, !=, <, <=, >, >=
  - dot product
  - cross product -half implemented in Blitz
  - dimension getter
  - []
  - transpose
  - invert -not implemented by Blitz

Catalog management

Borealis Expression Library

Aggregate -Parse and handle aggregate box. StandardAggregate contains most of the implementation. This will be minimally impacted by arrays in the expression phase because it only uses expressions in its arguments.

EString -Basic string type for standard functions. May make a user-defined type for Arrays.

EStringPool -manages allocation and string management. Will not be impacted by arrays.

ExprLexer -Lexical analyzer for string expressions. Invoked in expression parse method. Takes input streams and breaks them into tokens. Will need to add previously unimplemented operators like the dot product and brackets.

ExprParser -Builds an expression tree from lexer output. If we build the expressions properly this will probably require minimal modification, only for new ops

ExprParserTokenTypes.txt -Defines strings to be regarded as tokens. Will need to add our new ops and type to this list.

Will we need to have an input method in XML for comparing arrays (i.e. a filter predicate that could ask if an array is 3x3 and filled with ones)?

Expr.g -Add support for dot, [], etc. I am leaning against making arrays declarable in our initial pass (as in the above example).

ExprUtil -Support functions

Will need to add appropriate field types for arrays
Handles math and relational ops and precedence in generic, templated way

ExprWalker -This generates and evaluates(?) expressions from SQL-style queries. Do boxes translate themselves into this in their respective classes? This will require no modification if everything is implemented in the standard fashion.

Expression -Base class that all expressions are built from. May want to add a hook to this base class to differentiate between array and scalar contents for memory management purposes. Will need to implement is_impl and evalImpl methods for our standard functions.

EvalContext -Tells the expression which tuples it refers to

MathFunctions -Implementations of various templated math ops. Will need TypedArray versions of this. These will be wrappers for Blitz equivalent library calls

Get arg(n) of expression and evaluate it in the given context
Return appropriate function applied to its children

NArgs - This enforces that an expression has the correct number of arguments. It also has the base class for Function. The Function factories will need to be modified to create array funcs.

SQL* - Manage and access tables of expressions. This will require minimal to no intervention for arrays.

Arrays should probably be like TypedExpression<>.

TypedArrayExpression<T>
Fields (arrays) can derive from this

StandardAggregate -Window management and other standard functions. Should not be impacted by arrays

StandardFunctions -Generic functions such as min/max/strlen/int. Mostly helpers. Will need to add a few of our own here.

modules/common/DataType - Will need to add our type(s) to this. Is there any convenient way to make this templatable? i.e Array<int> [3][3][3] It does not look like it at first glance.. This will make the from string method not have the correct length.

Boxes

May do array allocation and memory management here
What is a table wrt QBoxes?

Qbox - Constructs streams and matches with schemas. Will need to interface to catalog for type checking an validation of streams

Will need to deal with the Tuple creation to allocate arrays.

How/where is that done?
Tuples are treated as char *s
The notes seem to indicate that this is managed by the node object -Check out AuroraNode.*

Use join as an example b/c it requires allocation to create a new tuple of a different type

check out produce_dont_emit...

Check out the construction and reception of tuples in example clients
- Perhaps the Marshal code handles this

Basic Design Ideas

Initially basic class will be just an array declaration in Blitz
Will have another class for irregular strides which will be a user-defined templated type of array blitz
Similar problems for sparse arrays
Array Precedence order:
Size may be used to determine when/how to chunk
Will declare arrays as one of these types and write operators to make them compatible with each other
Will probably need to modify the way queryprocessor receives the data to serialize the data for allocation as *FieldExpressions
Most of our first pass will be writing wrappers for basic Blitz operations and appropriate headers for data being passed around
Arrays will be declared as ArrayFieldExpressions and heap allocated like all other concrete field expressions

Basic Data Flow

Array and meta-data arrive, either apart or together
Blitz object is inserted into a table
Acceptor then notifies all boxes that are interested in the stream of the array that just arrived
If a box can modify a tuple or create a new one (i.e. join or map) then it creates a new entry in the table as its out stream
Else if a box cannot modify a tuple (i.e. filter) then it creates a new id in the hash table that points to the previous Blitz entry and the acceptor will notify anyone listening on its out stream
Brad: How does Borealis know when to send a stream out across the network? I.e. when the out stream is listened to by a box on another node.
May need to build a mini-router to get out-streams to the right ArrayAcceptor
When an out stream is generated the ArrayAcceptor will also check with other nodes to see if they are interested in it
If we choose to break up the arrays and tuples, how will we make them synchronize?
Does the ArrayAcceptor take both of them or should the meta-data go directly to the box(es)?
The ArrayAcceptor sits at the node level, as does the table

Basic Construction

Predicates will be cell-based, but may allow wild carding in the form of comparison to scalar
ExpressionHandler may need to be rewritten to handle different array types as well as partial evaluation

Handle multi-host eval as well as multiple procs?
Ops will be polymorphic

Prime areas for optimization
- Caching strategies:
  - when does it make more sense to have arrays on disk?
  - Should table contents be in-memory only or could larger aggregates be supported by not limiting ourselves?
  - Disk usage could also be crucial for very large arrays
- Statefulness and Windowing:
  - How big do we make our table?
  - When do we hold on to values and when does it make sense to drop them from this table cache?
    - Would it make sense to designate which streams have aggregates associated with them and hold on to more than one value only for those?
    - Size cut-off? i.e. if a stream is holding more than its fair share of the table space
  - What is the best place (node-wise) to store this historical data we eventually want to integrate?
    - We could divide it up categorically (i.e. spatial location on a map, or where it lies on a timeline)
    - It would be nice to have some breakdown that does not require centralized look up
    - When we do access it should we regard it as a stream or treat it entirely differently?
      - The ArrayAcceptor could be a good way to encapsulate this.

Having the ArrayAcceptor/TableManager be its own module could be sensible b/c it will be doing a variety of things

I need to start thinking about finding a camera or something for an initial data stream.

Push vs. Pull

Pull

+consistent polling, perhaps less work for the node
-may be more complex; have more overhead

Push

+simple hand-off, easier traceability

+seems more intuitive from a stream processing perspective

-may require more up front setup

-puts more burden on the node

Table vs. Array Part of Tuple (2 pipes vs. 1)

Table

+Minimizes duplication

+encapsulation of memory management / optimization techniques

+ may be more extensible b/c we are designing from the ground up

-more initial time / planning investment

-may break previous structures

Tuples with Arrays

+Simpler design, faster implementation
+less apt to break previous team assumptions
-may stretch existing infrastructure a little far2

Do scaling or delta image work for original demo

Proof out xml this week

Non-uniform scaling requires a lot of conditionals for boundaries of the image as well as for scaling up versus down. In traditional Borealis boxes this will require a lot of them. Potential implementation strategies:

Tons and Tons of Boxes
+ Conforms to standard DB operators
+ Will flex the majority of our new capability
- Ugly to code
- harder to debug

Implementing Scale as an arithmetic operator
+ simpler design
+ code essentially already done
- Seems like a hack
- Will not be a good showcase for SPE

Making a single custom box
+ Easier optimization
- Is this splitting hairs with the arithmetic op one?

Hybrid Solution: Specialized boxes such as vertical_scale_up, with traditional filters to route them to the right boxes
+ still shows stream processing / db ops
+ Simplifies out boundary conditions and allows faux statefulness (i.e. calculating ratio of old size to new size)
- Will need to implement custom boxes

Do we want to have each query chain have its own copy of the array to operate on in place or is it preferable for them all to share one copy?

- Going with in-place stream operations right now.

Does it make sense to break the arrays up in cases like filter that have partial matches? How can we handle these break ups? Designate output by indices or ? Do images make sense in arbitrary partial form? Might it make more sense that if we have a match to the predicate we return a sub-array?

Box design

Needs to:

Take in array via loading mechanism
Optionally apply transformations to it in place
Enqueue the array header and pointer to the next box in its query evaluation chain

How does a box get its XML schema?

Will our designed loading mechanism have any major pitfalls w/the current box implementation?

Does it make more sense to create new array boxes or can the current ones be extended to accept these Blitz pointers?

Blitz may not work b/c it needs to know dimensions at compile time. Unfortunately boost is no better on that front. (May require custom expression evaluator regardless of partial eval after all).

Can use *=, +=, etc. to avoid unnecessary copies in Blitz?

Boxes

All: need to adapt to looking up pointers to arrays.

Should boxes have a way of indicating to the ArrayAcceptor/Manager how many tuples in a stream it may need?

Filter: When acting on an array it will pass on cells that match and set the rest of cells to null in its out stream

Map: Apply transformations in place to matrices and send to out stream

Union: Will probably not change much

Bsort: Will do comparisons and emissions in place. Need to make sure that table keeps n instances of the array.

Aggregate: The way we handle this depends on both the aggregating operation as well as the window. For min(..), max(..) we can just hang on to the appropriate answer and compare it with the rest of the inputs. For average, we will need to hang on to <window size> number of arrays. Is there a better way to do this? Are there other operators that we should implement for aggregate?

Join: Will hopefully not have to buffer too many arrays at once, may want to artificially limit number or work out caching scheme if it is too many. Will have to do the same with aggregate. Caching scheme may have been required anyway (as opposed to having all the tuples in memory at once).

Resample: This will use the solution to the problems of aggregate operators

Custom Boxes: What will we need for input information? What are the requirements of custom box coding now?

Expression Evaluation

If we are using Blitz we need to pass in the array dimensions at allocation time
Wildcarding
Implementation note: may want to build using matrix and vector data types where appropriate to capitalize on previous optimizations? Check this out
Arithmetic Evaluation
Custom boxes will be needed for some of our ASAP ops
Does it make more sense to build an entirely new expression evaluator ? Diverge from the previous one and integrate later? Extend the previous one?
What lessons have we learned implementation-wise from our last expression evaluator?

Model for Generic Expression Evaluation (we may modify/optimize for specific apps)

Recursive Descent Parsing?
Leave it open for piecewise evaluation by making tokens more accessible and evaluatable from outside the tree
Memory management / caching schemes?

The expression manager needs to

Parse expression strings

Create symbolic representations of arithmetic and comparison expressions
Evaluate these expressions when values arrive efficiently
Have an easy to use API for programmers to both create, modify and access expressions in their code
Composable?
Extensible (i.e. setup for partial evaluation, varying types of arrays, etc.)
- The expression library may be the same, with slicing and dicing handled in pre and post processing
May need a base type of evaluation for when we know nothing about the array/scalar value that our more specialized evals can derive from as necessary (for sparse or clumped arrays)
Reflection in Field creation and evaluation

Who is going to primarily be using this library:

Box implementers
Prediction implementers

Needs to be fairly generic, specific apps will build on basic arithmetic and comparison ops, few to no optimizations will be transformations

Reflection Libraries

Reflex

+ Seems relatively lightweight and complete
- templating implementation seems hazy
- no guaranteed support for gcc-4.*

Reflection C++

Seems to capitalize on RTTI checking mostly
Seems to have a relatively simple API

Do we need to rewrite the whole thing or just the FieldExpressions, acceptable template types, etc.?

Do we need to rewrite it all so that expression components are more accessible?

Our current expression model is extremely obfuscated.

We do need to retain the context model and perhaps expression walker.

Major modules:

ExpressionParser ?#8364;“ takes in string, produces tree form of expression (as well as perhaps other access methods)

ExpressionEvaluator ?#8364;“ evaluates expressions when a tuple comes in that is of the relevant stream

Expression itself will need to at least add functions for concatenation, append, etc. as well as defining common math ops and iterator functions

May want to allow binding operator, although I am not sure how this will fit into our current XML paradigm

Should we make the context more loosely coupled with the expressions themselves?

If an entire expression is regarded as one object how should it be accessed?

Root of a tree for traversal?
.as_string() type of function?
A combination of both? (or multiple access methods?)
Most of the programmer's access will be for debugging purposes or possibly composite functions in the case of predictions

Proposal Notes

coordinate info - int32 (coordinate must be declared like a schema) (coordinates are loaded into a table as declared with the XML) Coordinate systems exist in categories/ classes. A coordinate system declaration should include the coordinate system type (i.e GPS or X, Y, Z for airplanes), and two functions: : one to transform it to the user-defined generic format of its coordinate type and the other to get data of its type to its format.

Wire Format

Header Info:

int n = num_dimensions

enum base_type (can be int, float, etc.)

int dim[n] the size of each dimension

int32 coordinate_system;

bool partition

bool dense

bool regular_stride

float stride_size[n]

partition dimensions[n]

partition number - tells us which partition we are dealing with in row-major order

float sensor_location[3]

Dense Array, Regular Stride Representation:

one element after another in row-major form; 0/null for blank values

Sparse Array (stride does not matter, n-dimensional array):

n floats, one for each dimension coordinate followed by the cell value; repeated in row major form for each value in the array

Dense Array; irregular stride

Each cell in row-major format prefaced with its coordinate

Tuples within Borealis

When arrays on the server/Borealis side a module called an ArrayAcceptor will intercept them and convert the serialized data into a Blitz++ Object. It will then wrap it in a smart pointer which will be put into any tuple referring to it on that host. The tuple itself will contain the header information structure. Arrays will be operated on in place by each box where ever possible. The only time an array should need to be copied is when a stream diverges down two different paths that will modify it. The majority of the modifications between this and our current version of Borealis will revolve around memory management and rewriting most or all of our expression library.

Expression Wildcarding

When arrays are compared in expressions or predicates they do not have to match the dimension of what they are being compared to. They do need to correspond right to left though. For example, an int[640][480][3] can be compared to an int[3] or int[480][3] but not int[640]. Arrays can also be compared to scalars on a cell by cell basis.

Box Semantics

Aggregate – will accept a window of tuples, with window length being defined by either tuples or window size. Slide will be defined similarly. Windowing will be managed by dynamic reference counting on each array object. Each time an array is added to a window the reference count is increased (just as it is when it is accessed by any non-windowing box). When it is removed from the window the reference count is decremented. When the reference count goes to zero the array's destructor is automatically called. Aggregate will continue to operate with the standard functions it has now (min, max, average, etc.). The results will be a composite result based on each cell (i.e. a min of a 640x480 image will return the minimum value for each value over all of the frames in that window)

Join - Joins will have a windowing scheme similar to aggregates. If we are joining two arrays we will create a new one with an extra dimension with pointers to the previous two arrays.

Filter – Filter will compare arrays or scalars. In the case of a superset being compared to a subset (i.e. int[640][480][3] < int[3]) filter will iterate through and null out values that do not match the predicate.

Map - Map will apply operators found in our rewritten expression library. It will not differ much from our previous implementation.

Concatenate – This will basically be a special case of join. It will check to see if the dimensions of the two incoming arrays are of compatible dimensions and throw an exception if they are not.

Locate(Array1, Array2, function, slide[n]) – The box will first check to see that the two arrays are roughly compatible. The function will need to define the size of chunks in an XML schema before it can be passed into a locate box. This does not necessarily have to be the size of Array2, the function may scale, rotate the feature, etc. Like our windowing scheme, this may benefit from defining both the sub-array sizes that are passed into the function as well as slide/overlap (in case the feature is between two blocks). Slide will tell us how much to move in each dimension of Array1 (Array1 is n-dimensional). After all of this is established, Locate will iterate through Array1 in row-major order a slide unit each iteration. It will collect the function returns as an array which will be its output. Each function return will be in the form of one array of size 2n. The first n values will be the lower left hand corner of the location of the found feature. The second n values will be the size of the match in each dimension.

Pivot(array, attribute_list_1, attribute_list_2) – Pivot will substitute the contents of the dimensions specified in the first list with the values in the second list. This will operate on the array in place.

Regrid – This seems like it will be a special case of Map, with more extensibility regarding dimensions. Or should map support the in and out array having different dimensions?

Schema Examples

A video frame and scaling vector:

</schema>

Marshal would translate to a header of :

struct frame_and_scale_frameField_header

{

int num_dimensions = 3;

enum base_type = int32; // may have to make this more flexible for user-defined types later

int dim[n] = (800, 600, 3);

int32 coordinate_system = 0

bool partition =0;

bool dense = 1;

bool regular_stride = 1;

float stride_size[n] = (1, 1, 1);

partition_dimensions[n] = (0, 0, 0)

partition_number – 0;

float sensor_location[3] = (0, 0, 0)

}

When the client sends a tuple it will pass in a basic C++ array (i.e. int[800][600][3]) Blitz is not necessary on this side of the equation. The client side will flatten the array and send it over the wire. The ArrayAcceptor will de-serialize it, etc. The rest of this is described above.

Basic Architecture Layout

9/29 - Brad added a 1-bit flag to the tuple header to designate whether or not it contains arrays

FilterArray Add – created it in the src/modules/queryprocessor/boxes directory, conformed to standard form

Modified Files

src/modules/queryprocessor/boxes/Makefile.in – added it to the list of boxes to build
src/modules/queryprocessor/boxes/Makefile.am – added it to the list of boxes to build
src/modules/catalog/CatalogBox.cc – made it use the infer_filter_out function to get out stream; set is_invertible ;
src/modules/catalog/CatalogValidate.cc – added to list of intrinsic boxes, validated that it had just one in and out stream, integrated it into the expresssion parser
sr/modules/queryprocessor/runtime/QBox.cc – added it to the instantiation code