January 11, 2006 NGVM meeting, Albuquerque

 

Attendees:

 

Hans Boehm (HP),

Steve Blackburn (Intel),

Michal Cierniak (Google),

Cliff Click (Azul),

Manuael Hermenegildo (UNM),

Tony Hosking (Purdue),

Kathryn McKinley (U Texas),

Eliot Moss (U Mass),

Marcus Lagergren (BEA),

Suresh Srinivas (Intel),

Darko Stefaovic (UNM),

Greg Wright (Sun),

Mario Wolczko (Sun)

 

 

First Session:

 

Steve Blackburn - Introduction & Background

 

This meeting - what we are doing

Last meeting - brain storming

 

Introductions

 

Greg Wright, Mario Wolczko, Jan Vitek, Michal Cierniak, Eliot Moss, Tony Hosking,

Manuael Hermenegildo, Marucs Lagergren, Cliff Click, Kathryn McKinley,

Darko Stefaovic, Hans Boehm, Steve Blackburn

 

Openness: want to incubate ideas and keep a low profile for now

 

Apache Harmony: Effort to build a JVM that is an incubator for an

Apache project, (under an Apache license, required),

        more than a JVM, one or more?

        libraries, etc.

        a more conventional goal of open sourcing existing VM technology

        problem: mailing list development not ideal

        NGVM is on the side of this project

 

Jikes RVM:

 

JIU - Java innovation unit

 

Existing VMs, e.g., Hotspot, Jikes RVM, etc. are dinosaur

 

Aims and Goals:

 

NGVM: Platform for VMs, so multiple VMs Product quality: flexibility,

extensible, robustness, performance (competitive or better)

 

Are we building just for Java, arbitrary runtimes, e.g., C, C#, CLR?

Action item:

        list technical challenges that this choice encompasses

 

Draw on experience of designers

 

Portability, plug and play

 

Eliot: use descriptions to generate components, e.g., burg in Jikes

 

Need: developers, the research community (academic & industry), end

users, embedded devices, servers

 

Cliff: drop embedded devices

 

Dave: J9 is a single system that generates family of VMs, server and

embedded,

 

Intel speak: "stretch goal" ambitious

 

VM core: the part of the VM with the scheduler, memory model, no JIT

or GC or class libraries (these are additional components you can plug

and play your own)

 

Attacking the tensions between flexibility and performance

 

Approach

 

create a development framework from scratch

 

What Steve has been doing

 

More on why and how it will be used

 

Management model, Developers (N), Advisors (10), Steering Committee (3)

 

Cliff: interpreters in assembly (generated from higher level form), hotspot ports by using the baseline

 

Ertle & Gregg: portability to multiple targets with automated tools

 

Generation technology: make vs self build not necessarily easily componentized

 

Code persistence: boot image AOT, JIT persistence, code cache, IR vs assembly

Marcus: use IR, changing assembly is a nightmare

Michal & Cliff: don't store anything persistence

 

Isolates JiJ app vs user VM separation

 

Clean system programming (in Java): breaks Java safety rules

 

Michal: debugging environment.  generates the assembly is the assembly trusted.  How can we move more in to the trusted code base?

 

Cliff: Perfection is too hard, too much context for optimizations with broad scope.

 

Carry semantic properties as far through the sytem

 

Reduce the size of the trusted code base as much as possible

 

Jan: Help for correctness, e.g., type the low level IR

 

Second Session:

 

          Experience talks:

 

Cliff Click

2. Reasons for HS decisions, why big VMs not small VMs

3. Native ABIs vs custom, argument shuffling

4. Portability comes from porting - write at least 2 from the start.

JIT to JIT, vs. JIT/interpreter calls

5. Grove - interpreter register convention

Moss - asm parts, or whole thing in interpreter?

7. MMTK fixes all this (?)

8. Cliff convinced by cooperative safepoints now [no disagreement from the room]

9. Boehm - report SP+PC of each thread from a signal handler in each thread? Click - not sure...

10. Moss - transactional memory coming, may or may not take off - unsure what to do right now for lock scaling.

Click - this will still be an issue, lots of code will still use locks

14. Moss - very important to generate these argument shuffling routines, they break frequently.

Click - they do something like this at Azul (somewhat duplicated in register allocator)

18. Moss - got to raise the semantic level here!

Click - there are lots of these

19. Boehm - cost of safepoints?  Click - as small as you like. (unroll the loop)

20. Hosking - why does graph-coloring work well? Click - another day

24. Moss - word length? Click - a pain, I was hoping to dodge this. Boehm - also memory model, a real portability test.

25. Moss - in general, tracking assumptions made when generating code. Click - atomic boundary here (code running in middle of method when you need to blow it away - general deoptimization problem)

31. Moss - could also maintain constant references in code with a non-moving area in heap

 

 

Dave Grove, Object models

Jikes 1-word object header:  Click - really mask low bits off every time you use the class ptr?  Grove - yes, not as bad as you think.  Single word header is going too far, 2 (32-bit) words is probably right. (Or one 64-bit word)

 

Moss - Jikes has an annoying assumption that "class" info in header is 1-1 with a language class.

 

Cierniak - Have to change JIT to go from one model to another? Grove - only bits of JITs know about the model, get inlined in IR. (May be duplicated in two different compilers with different IRs)

 

Interfaces: Click - use guarded call for interfaces, works very well. Grove - websphere has lots of polymorphic interfaces.

 

Moss - in Jikes, pieces of code to deal with memory model are scattered around. We need to deal with this.  Grove - aspect-oriented? Cliff - e.g. invocation: runtime, JIT, class loader, ...

 

? - what's the nastiest thing to do?  Grove - real-time Java, tried to use arraylets.

 

byte/char arrays interact with libc, don't have as much freedom

Wolczko - But we were trapped by writing in C before now. Can we break away from that?

 

Moss - another VM assumption: subclasses extend superclass layout. (customize superclass code when compiling the subclass)

 

 

Jan Vitek, OVM

Used GNU Classpath libs - a problem, don't implement things, ...

 

Cierniak - movingGC [example on the slide] is an interface or class? Very cross-cutting (generational, etc..), may be lots of these interfaces, not linear.

 

McKinley - Object mem models implemented with mixins?

Vitek - No. started with idea of parallel trees of interfaces and possible implementations. Horrible (overdesign for generality, obfuscation). When tight coupling needed between components it's horrible to keep downcasting to specific types.

Blackburn - Pushed on this with recent versions of MMTK, have something that works.

Vitek - Tried aspects, some compiler support for inlining. Eventually that compiler option was dropped, and no debugging support, etc, so moved away from it. General problem is composing VM out of bunch of components. AOP is one story - eclipse now has better support.

 

Transactional example [on slide]: preRunThreadHook compiled away during image generation - generated C code controlled by stitcher spec. (e.roots->vals[97] is the static singleton)

 

Domains (replace isolates): separate kernel from other code.  Partly by bytecode rewriting.

[Discussion about difference between domains and isolates]

 

 

Marcus Lagergren - BEA Jrockit

[Lots of discussion about debuggers, calling convention dependence (etc).]

Click - HotSpot has lots of functions to print internal data structures.

Moss - generate these along with the VM?

 

Third Session:

 

Systems Programming

 

- unboxed types:

  Controlled layout

  Richer types as in C#?

  traced versus untraced?

  Need a proposal!

 

- operators:

  Preprocess syntactic sugar to get intrinsic calls.

  Operator name-space?

  New syntax or overload existing syntax or annotations?

 

Kernel/user separation

 

- JinJ => separation between user and kernel

- name-space separation

- cross-boundary pointers?

- not user/user separation!

 

Minimizing Trusted Code Base (TCB)

 

- strongly-typed IR to build trust among components and isolate bugs

- circumscribing failure (eg, compiler fails, VM continues)

- use of automatic generators from formal descriptions (generate stylized

  outputs that are more easily validated)

- assertions!

 

Portability

- Architecture

  CISC/RISC

  register windows?

  64/32-bit and endianness

  multi-processor (essential)

  code patching: cache coherence

- Memory models

  coherence

  fences

- GC:

  barriers

  moving

  concurrent

- HW (OS) stacks:

  direction

  contiguous/split?

- OS:

  Threading model & scheduling (priorities?)

  Address translation?

  Memory contiguous/sparse?

  CPU affinity

  Page protection?

 

Object models

- Flexibility: more than one, but not dynamically

- Encapsulating and abstracting models

- Who determines it:

  GC

  locking

  dispatch

  hash

  type ops

- Who uses it:

  interpreter/JIT

  GC

  locking

  boot image

  debugger

  JVMTI etc

  VM reflection (boot image, JNI)

  class libraries

- Experimentation support

  eg, adding fields

- Generated from descriptions

  negotiated layout

 

Kernel/library dependence

- How much of Java can we depend on?

  bootstrap and more generally

- What constrains this??

  boostrapping, space constraints

- What do we ?

  memory allocation

  hash tables

  basic I/O

  strings

  floating point

- Implementation strategy: make home-baked versions as compatible as possible

- Self-hosting (see Appel)

 

Multiple languages?

- Bytecodes are a problenm

  More expressive than Java, difficult to work with

- Continuations in the runtime

 

Extra references (Thanks to Kathryn and Michal):

 

  1. Axiomatic Bootstrapping: A guide for compiler hackers, Andrew W. Appel, ACM

Transactions on Programming Languages and Systems, vol. 16, number 6, pp.

1699-1718, November 1994. Older version without diagrams, in ACM SIGPLAN

Workshop on ML and its Applications, Orlando, Florida, June 25, 1994, ©1994 ACM.

http://www.cs.princeton.edu/~appel/papers/

 

  1. About "GC at every instruction":

James Stichnoth, Guei-Yuan Lueh and Michal Cierniak. Support for

Garbage Collection at Every Instruction in a Java Compiler. In

Proceedings of the SIGPLAN'99 Conference on Programming Language

Design and Implementation (PLDI), Atlanta, GA, May 1999.

http://portal.acm.org/citation.cfm?id=301652

 

  1. About a language to abstract away the object model from the JIT, GC

and other JVM components:

Michal Cierniak, Neal Glew, Spyridon Triantafyllis, Marsha Eng, Brian

Lewis and James Stichnoth. Object-Model Independence via Code Implants

. In Proceedings of the 2003 Workshop on Multiparadigm Programming

with OO Languages (MPOOL'03), Anaheim, CA, October 2003.

http://blogs.msdn.com/michaljc/archive/2005/02/23/379329.aspx

 

  1. About the use of machine learning to find bugs:

Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken and Michael I.

Jordan.  Scalable Statistical Bug Isolation, PLDI 2005

http://www.cs.wisc.edu/~liblit/pldi-2005/

 

Fourth Session:

 

Management and Governance

 

Contrast between building a cutting edge VM and a feature rich application

 

Product quality demands good leadership and good coding

 

Apache model: veto power from committer == code writer

            -> bottom up evolution / direction

VM needs more coherent direction, while remaining open source

 

A model:

            Developers commit code

            Advisory board gives expertise, does not commit code

            Steering committee (small) gives direction (has veto on code)

 

Key issues:

            Coherent direction

            Getting existing expertise to actual programmers

           

Suggestion: also have architects: veto over developers, make choices, help enforce process

How does bottom-up style enforce process?  (testing, etc)

 

Steve Proposes:

            Start this as an Apache-owned project (also under Apache license)

            Committers (initially) being authors + participants at this & prior meeting (plus Intel engineers already working with Steve)

            This bootstraps the culture

                        Has own mailing list

                        Has well defined charter to maintain focus

            Need plan for answering email before starting mailing list

                        Needs someone who will be backup to catch things that go unanswered for a while.  May require employer commitment of employee time.

            Steve will send out invitations.