January 11, 2006 NGVM meeting, Albuquerque
Attendees:
Hans Boehm (HP),
Steve Blackburn (Intel),
Michal Cierniak (Google),
Cliff Click (Azul),
Manuael Hermenegildo (UNM),
Tony Hosking (Purdue),
Kathryn McKinley (U Texas),
Eliot Moss (U Mass),
Marcus Lagergren (BEA),
Suresh Srinivas (Intel),
Darko Stefaovic (UNM),
Greg Wright (Sun),
Mario Wolczko (Sun)
First Session:
Steve Blackburn - Introduction & Background
This meeting - what we are doing
Last meeting - brain storming
Introductions
Greg Wright, Mario Wolczko, Jan Vitek, Michal Cierniak, Eliot Moss, Tony Hosking,
Manuael Hermenegildo, Marucs Lagergren, Cliff Click, Kathryn McKinley,
Darko Stefaovic, Hans Boehm, Steve Blackburn
Openness: want to incubate ideas and keep a low profile for now
Apache Harmony: Effort to build a JVM that is an incubator for an
Apache project, (under an Apache license, required),
more than a JVM, one or more?
libraries, etc.
a more conventional goal of open sourcing existing VM technology
problem: mailing list development not ideal
NGVM is on the side of this project
Jikes RVM:
JIU - Java innovation unit
Existing VMs, e.g., Hotspot, Jikes RVM, etc. are dinosaur
Aims and Goals:
NGVM: Platform for VMs, so multiple VMs Product quality: flexibility,
extensible, robustness, performance (competitive or better)
Are we building just for Java, arbitrary runtimes, e.g., C, C#, CLR?
Action item:
list technical challenges that this choice encompasses
Draw on experience of designers
Portability, plug and play
Eliot: use descriptions to generate components, e.g., burg in Jikes
Need: developers, the research community (academic & industry), end
users, embedded devices, servers
Cliff: drop embedded devices
Dave: J9 is a single system that generates family of VMs, server and
embedded,
Intel speak: "stretch goal" ambitious
VM core: the part of the VM with the scheduler, memory model, no JIT
or GC or class libraries (these are additional components you can plug
and play your own)
Attacking the tensions between flexibility and performance
Approach
create a development framework from scratch
What Steve has been doing
More on why and how it will be used
Management model, Developers (N), Advisors (10), Steering Committee (3)
Cliff: interpreters in assembly (generated from higher level form), hotspot ports by using the baseline
Ertle & Gregg: portability to multiple targets with automated tools
Generation technology: make vs self build not necessarily easily componentized
Code persistence: boot image AOT, JIT persistence, code cache, IR vs assembly
Marcus: use IR, changing assembly is a nightmare
Michal & Cliff: don't store anything persistence
Isolates JiJ app vs user VM separation
Clean system programming (in Java): breaks Java safety rules
Michal: debugging environment. generates the assembly is the assembly trusted. How can we move more in to the trusted code base?
Cliff: Perfection is too hard, too much context for optimizations with broad scope.
Carry semantic properties as far through the sytem
Reduce the size of the trusted code base as much as possible
Jan: Help for correctness, e.g., type the low level IR
Second Session:
Experience talks:
Cliff Click
2. Reasons for HS decisions, why big VMs not small VMs
3. Native ABIs vs custom, argument shuffling
4. Portability comes from porting - write at least 2 from the start.
JIT to JIT, vs. JIT/interpreter calls
5. Grove - interpreter register convention
Moss - asm parts, or whole thing in interpreter?
7. MMTK fixes all this (?)
8. Cliff convinced by cooperative safepoints now [no disagreement from the room]
9. Boehm - report SP+PC of each thread from a signal handler in each thread? Click - not sure...
10. Moss - transactional memory coming, may or may not take off - unsure what to do right now for lock scaling.
Click - this will still be an issue, lots of code will still use locks
14. Moss - very important to generate these argument shuffling routines, they break frequently.
Click - they do something like this at Azul (somewhat duplicated in register allocator)
18. Moss - got to raise the semantic level here!
Click - there are lots of these
19. Boehm - cost of safepoints? Click - as small as you like. (unroll the loop)
20. Hosking - why does graph-coloring work well? Click - another day
24. Moss - word length? Click - a pain, I was hoping to dodge this. Boehm - also memory model, a real portability test.
25. Moss - in general, tracking assumptions made when generating code. Click - atomic boundary here (code running in middle of method when you need to blow it away - general deoptimization problem)
31. Moss - could also maintain constant references in code with a non-moving area in heap
Dave Grove, Object models
Jikes 1-word object header: Click - really mask low bits off every time you use the class ptr? Grove - yes, not as bad as you think. Single word header is going too far, 2 (32-bit) words is probably right. (Or one 64-bit word)
Moss - Jikes has an annoying assumption that "class" info in header is 1-1 with a language class.
Cierniak - Have to change JIT to go from one model to another? Grove - only bits of JITs know about the model, get inlined in IR. (May be duplicated in two different compilers with different IRs)
Interfaces: Click - use guarded call for interfaces, works very well. Grove - websphere has lots of polymorphic interfaces.
Moss - in Jikes, pieces of code to deal with memory model are scattered around. We need to deal with this. Grove - aspect-oriented? Cliff - e.g. invocation: runtime, JIT, class loader, ...
? - what's the nastiest thing to do? Grove - real-time Java, tried to use arraylets.
byte/char arrays interact with libc, don't have as much freedom
Wolczko - But we were trapped by writing in C before now. Can we break away from that?
Moss - another VM assumption: subclasses extend superclass layout. (customize superclass code when compiling the subclass)
Jan Vitek, OVM
Used GNU Classpath libs - a problem, don't implement things, ...
Cierniak - movingGC [example on the slide] is an interface or class? Very cross-cutting (generational, etc..), may be lots of these interfaces, not linear.
McKinley - Object mem models implemented with mixins?
Vitek - No. started with idea of parallel trees of interfaces and possible implementations. Horrible (overdesign for generality, obfuscation). When tight coupling needed between components it's horrible to keep downcasting to specific types.
Blackburn - Pushed on this with recent versions of MMTK, have something that works.
Vitek - Tried aspects, some compiler support for inlining. Eventually that compiler option was dropped, and no debugging support, etc, so moved away from it. General problem is composing VM out of bunch of components. AOP is one story - eclipse now has better support.
Transactional example [on slide]: preRunThreadHook compiled away during image generation - generated C code controlled by stitcher spec. (e.roots->vals[97] is the static singleton)
Domains (replace isolates): separate kernel from other code. Partly by bytecode rewriting.
[Discussion about difference between domains and isolates]
Marcus Lagergren - BEA Jrockit
[Lots of discussion about debuggers, calling convention dependence (etc).]
Click - HotSpot has lots of functions to print internal data structures.
Moss - generate these along with the VM?
Third Session:
Systems Programming
- unboxed types:
Controlled layout
Richer types as in C#?
traced versus untraced?
Need a proposal!
- operators:
Preprocess syntactic sugar to get intrinsic calls.
Operator name-space?
New syntax or overload existing syntax or annotations?
Kernel/user separation
- JinJ => separation between user and kernel
- name-space separation
- cross-boundary pointers?
- not user/user separation!
Minimizing Trusted Code Base (TCB)
- strongly-typed IR to build trust among components and isolate bugs
- circumscribing failure (eg, compiler fails, VM continues)
- use of automatic generators from formal descriptions (generate stylized
outputs that are more easily validated)
- assertions!
Portability
- Architecture
CISC/RISC
register windows?
64/32-bit and endianness
multi-processor (essential)
code patching: cache coherence
- Memory models
coherence
fences
- GC:
barriers
moving
concurrent
- HW (OS) stacks:
direction
contiguous/split?
- OS:
Threading model & scheduling (priorities?)
Address translation?
Memory contiguous/sparse?
CPU affinity
Page protection?
Object models
- Flexibility: more than one, but not dynamically
- Encapsulating and abstracting models
- Who determines it:
GC
locking
dispatch
hash
type ops
- Who uses it:
interpreter/JIT
GC
locking
boot image
debugger
JVMTI etc
VM reflection (boot image, JNI)
class libraries
- Experimentation support
eg, adding fields
- Generated from descriptions
negotiated layout
Kernel/library dependence
- How much of Java can we depend on?
bootstrap and more generally
- What constrains this??
boostrapping, space constraints
- What do we ?
memory allocation
hash tables
basic I/O
strings
floating point
- Implementation strategy: make home-baked versions as compatible as possible
- Self-hosting (see Appel)
Multiple languages?
- Bytecodes are a problenm
More expressive than Java, difficult to work with
- Continuations in the runtime
Extra references (Thanks to Kathryn and Michal):
Transactions on Programming Languages and Systems, vol. 16, number 6, pp.
1699-1718, November 1994. Older version without diagrams, in ACM SIGPLAN
Workshop on ML and its Applications, Orlando, Florida, June 25, 1994, ©1994 ACM.
http://www.cs.princeton.edu/~appel/papers/
James Stichnoth, Guei-Yuan Lueh and Michal Cierniak. Support for
Garbage Collection at Every Instruction in a Java Compiler. In
Proceedings of the SIGPLAN'99 Conference on Programming Language
Design and Implementation (PLDI), Atlanta, GA, May 1999.
http://portal.acm.org/citation.cfm?id=301652
and other JVM components:
Michal Cierniak, Neal Glew, Spyridon Triantafyllis, Marsha Eng, Brian
Lewis and James Stichnoth. Object-Model Independence via Code Implants
. In Proceedings of the 2003 Workshop on Multiparadigm Programming
with OO Languages (MPOOL'03), Anaheim, CA, October 2003.
http://blogs.msdn.com/michaljc/archive/2005/02/23/379329.aspx
Ben Liblit, Mayur Naik, Alice X. Zheng, Alex Aiken and Michael I.
Jordan. Scalable Statistical Bug Isolation, PLDI 2005
http://www.cs.wisc.edu/~liblit/pldi-2005/
Fourth Session:
Management and Governance
Contrast between building a cutting edge VM and a feature rich application
Product quality demands good leadership and good coding
Apache model: veto power from committer == code writer
-> bottom up evolution / direction
VM needs more coherent direction, while remaining open source
A model:
Developers commit code
Advisory board gives expertise, does not commit code
Steering committee (small) gives direction (has veto on code)
Key issues:
Coherent direction
Getting existing expertise to actual programmers
Suggestion: also have architects: veto over developers, make choices, help enforce process
How does bottom-up style enforce process? (testing, etc)
Steve Proposes:
Start this as an Apache-owned project (also under Apache license)
Committers (initially) being authors + participants at this & prior meeting (plus Intel engineers already working with Steve)
This bootstraps the culture
Has own mailing list
Has well defined charter to maintain focus
Need plan for answering email before starting mailing list
Needs someone who will be backup to catch things that go unanswered for a while. May require employer commitment of employee time.
Steve will send out invitations.