aboutsummaryrefslogtreecommitdiffstats
path: root/drivers
Commit message (Collapse)AuthorAgeFilesLines
* Remove old lguest bus and drivers.Rusty Russell2007-10-2310-1385/+0
| | | | | | | This gets rid of the lguest bus, drivers and DMA mechanism, to make way for a generic virtio mechanism. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* Virtio helper routines for a descriptor ringbuffer implementationRusty Russell2007-10-233-0/+319
| | | | | | | | | | | | | | | | | | | | These helper routines supply most of the virtqueue_ops for hypervisors which want to use a ring for virtio. Unlike the previous lguest implementation: 1) The rings are variable sized (2^n-1 elements). 2) They have an unfortunate limit of 65535 bytes per sg element. 3) The page numbers are always 64 bit (PAE anyone?) 4) They no longer place used[] on a separate page, just a separate cacheline. 5) We do a modulo on a variable. We could be tricky if we cared. 6) Interrupts and notifies are suppressed using flags within the rings. Users need only get the ring pages and provide a notify hook (KVM wants the guest to allocate the rings, lguest does it sanely). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Dor Laor <dor.laor@qumranet.com>
* Module autoprobing support for virtio drivers.Rusty Russell2007-10-231-0/+18
| | | | | | | This adds the logic to convert the virtio ids into module aliases, and includes a modalias entry in sysfs and the env var to make probing work. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* Virtio console driverRusty Russell2007-10-233-0/+230
| | | | | | | | | | This is an hvc-based virtio console driver. It's suboptimal becuase hvc expects to have raw access to interrupts and virtio doesn't assume that, so it currently polls. There are two solutions: expose hvc's "kick" interface, or wean off hvc. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* Block driver using virtio.Rusty Russell2007-10-233-0/+315
| | | | | | | | | | | | | | | | | | | | The block driver uses scatter-gather lists with sg[0] being the request information (struct virtio_blk_outhdr) with the type, sector and inbuf id. The next N sg entries are the bio itself, then the last sg is the status byte. Whether the N entries are in or out depends on whether it's a read or a write. We accept the normal (SCSI) ioctls: they get handed through to the other side which can then handle it or reply that it's unsupported. It's not clear that this actually works in general, since I don't know if blk_pc_request() requests have an accurate rq_data_dir(). Although we try to reply -ENOTTY on unsupported commands, ioctl(fd, CDROMEJECT) returns success to userspace. This needs a separate patch. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Jens Axboe <jens.axboe@oracle.com>
* Net driver using virtioRusty Russell2007-10-233-0/+442
| | | | | | | | | | | | | | | | The network driver uses two virtqueues: one for input packets and one for output packets. This has nice locking properties (ie. we don't do any for recv vs send). TODO: 1) Big packets. 2) Multi-client devices (maybe separate driver?). 3) Resolve freeing of old xmit skbs (Christian Borntraeger) Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: netdev@vger.kernel.org
* Virtio interfaceRusty Russell2007-10-236-0/+191
| | | | | | | | | | | | | | | | This attempts to implement a "virtual I/O" layer which should allow common drivers to be efficiently used across most virtual I/O mechanisms. It will no-doubt need further enhancement. The virtio drivers add buffers to virtio queues; as the buffers are consumed the driver "interrupt" callbacks are invoked. There is also a generic implementation of config space which drivers can query to get setup information from the host. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Dor Laor <dor.laor@qumranet.com> Cc: Arnd Bergmann <arnd@arndb.de>
* Boot with virtual == physical to get closer to native Linux.Rusty Russell2007-10-236-32/+62
| | | | | | | | | | | | | | | | | | | | 1) This allows us to get alot closer to booting bzImages. 2) It means we don't have to know page_offset. 3) The Guest needs to modify the boot pagetables to create the PAGE_OFFSET mapping before jumping to C code. 4) guest_pa() walks the page tables rather than using page_offset. 5) We don't use page_offset to figure out whether to emulate: it was always kinda quesationable, and won't work for instructions done before remapping (bzImage unpacking in particular). 6) We still want the kernel address for tlb flushing: have the initial hypercall give us that, too. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* Allow guest to specify syscall vector to use.Rusty Russell2007-10-234-11/+75
| | | | | | | | | | (Based on Ron Minnich's LGUEST_PLAN9_SYSCALL patch). This patch allows Guests to specify what system call vector they want, and we try to reserve it. We only allow one non-Linux system call vector, to try to avoid DoS on the Host. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* Rename "cr3" to "gpgdir" to avoid x86-specific naming.Rusty Russell2007-10-232-12/+12
| | | | Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* Pagetables to use normal kernel typesMatias Zabaljauregui2007-10-233-141/+98
| | | | | | | | This is my first step in the migration of page_tables.c to the kernel types and functions/macros (2.6.23-rc3). Seems to be working OK. Signed-off-by: Matias Zabaljauregui <matias.zabaljauregui@cern.ch> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* Move register setup into i386_core.cJes Sorensen2007-10-233-36/+38
| | | | | | | | Move setup_regs() to lguest_arch_setup_regs() in i386_core.c given that this is very architecture specific. Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* Change example launcher to use unsigned long not u32Jes Sorensen2007-10-231-15/+16
| | | | | | | | | | | | | Apply Clue 2x4 to lguest userland<->kernel handling code and the lguest launcher. Pointers are not to be passed in u32's! Basic rule of thumb: Anything passing u32's back and forth should be passing unsigned longs to be portable to 64 bit archs. For those who forgotten already, I repeat: NO POINTERS IN u32! Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* Make hypercalls arch-independent.Jes Sorensen2007-10-233-76/+94
| | | | | | | | | | | | | | Clean up the hypercall code to make the code in hypercalls.c architecture independent. First process the common hypercalls and then call lguest_arch_do_hcall() if the call hasn't been handled. Rename struct hcall_ring to hcall_args. This patch requires the previous patch which reorganize the layout of struct lguest_regs on i386 so they match the layout of struct hcall_args. Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* Introduce "hcall" pointer to indicate pending hypercall.Rusty Russell2007-10-234-38/+34
| | | | | | | | | | | | | | | | | | | Currently we look at the "trapnum" to see if the Guest wants a hypercall. But once the hypercall is done we have to reset trapnum to a bogus value, otherwise if we exit to userspace and return, we'd run the same hypercall twice (that was a nasty bug to find!). This has two main effects: 1) When Jes's patch changes the hypercall args to be a generic "struct hcall_args" we simply change the type of "lg->hcall". It's set by arch code, so if it has to copy args or something it can do so, and point "hcall" into lg->arch somewhere. 2) Async hypercalls only get run when an actual hypercall is pending. This simplfies the code a little and is a more logical semantic. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* Reorder guest saved regs to match hyperall orderJes Sorensen2007-10-231-2/+2
| | | | | | | | | | | | | Move eax next to ebx/ecx/edx in struct lguest_regs on i386, so they will be located together and allow it to map directly to a struct hcall_ring entry (which will be renamed struct hcall_args as in a subsequent patch). This is in preparation for making the code hcall code architecture independent. Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* Move i386 part of core.c to x86/core.c.Jes Sorensen2007-10-237-522/+525
| | | | | | | | Separate i386 architecture specific from core.c and move it to x86/core.c and add x86/lguest.h header file to match. Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* Make shadow IDT a complete IDT with 256 entries.Rusty Russell2007-10-232-32/+20
| | | | | | | This simplifies the code a little, in preparation for allowing alternate system call vectors in guests (Plan 9 uses 0x40). Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* Remove fixed limit on number of guests, and lguests array.Rusty Russell2007-10-236-43/+14
| | | | | | | | | | | Back when we had all the Guest state in the switcher, we had a fixed array of them. This is no longer necessary. If we switch the network code to using random_ether_addr (46 bits is enough to avoid clashes), we can get rid of the concept of "guest id" altogether. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* Introduce guest mem offset, static link example launcherRusty Russell2007-10-236-38/+50
| | | | | | | | | | | In order to avoid problematic special linking of the Launcher, we give the Host an offset: this means we can use any memory region in the Launcher as Guest memory rather than insisting on mmap() at 0. The result is quite pleasing: a number of casts are replaced with simple additions. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* Rename switcher.S to x86/switcher_32.SRusty Russell2007-10-232-3/+5
| | | | | | | | lguest uses a "switcher" shim mapped high to bounce between host and guest. As lguest becomes less i386-centric, we separate this code into a subdir. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* Move lguest guest support to arch/x86.Rusty Russell2007-10-233-1201/+2
| | | | | | | | | Lguest has two sides: host support (to launch guests) and guest support (replacement boot path and paravirt_ops). This moves the guest side to arch/x86/lguest where it's closer to related code. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de>
* Clocksource is continuous regardless of the state of the host's TSC.Tony Breeds2007-10-231-3/+2
| | | | | | | | | | | Currently lguest will spend a lot of of time waking up the host, as it cannot go tickless (if the [host] TSC has been marked unstable). On my laptop I was getting ~40% of wakeups from lguest. With this patch applied, my laptop is much happier! Signed-off-by: Tony Breeds <tony@bakeyournoodle.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* lguest_devices belongs in lguest_bus.c: it's not i386-specific.Rusty Russell2007-10-232-1/+2
| | | | Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* Lguest currently depends on 32-bit x86, not just x86.Rusty Russell2007-10-231-1/+1
| | | | Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* Use copy_to_user() not put_user for struct timespecJes Sorensen2007-10-231-1/+1
| | | | | | | | | | Use copy_to_user() when copying a struct timespec to the guest - put_user() cannot handle two long's in one go on a 64bit arch. Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Jes Sorensen <jes@sgi.com> Cc: Al Viro <viro@ftp.linux.org.uk>
* Remove binfmts.h include from lg.hRusty Russell2007-10-231-1/+0
| | | | | | It wasn't needed since a very early prototype of lguest. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
* Consolidate host virtualization support under Virtualization menuRusty Russell2007-10-232-2/+4
| | | | | | | Move lguest under the virtualization menu. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Avi Kivity <avi@qumranet.com>
* Normalize config options for guest supportRusty Russell2007-10-231-2/+1
| | | | | | | | | | | | | 1) Group all the "guest OS" support options together, under a PARAVIRT_GUEST menu. 2) Make those options select CONFIG_PARAVIRT, as suggested by Andi. 3) Make kconfig help titles consistent. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Andi Kleen <ak@suse.de> Cc: Zach Amsden <zach@vmware.com> Cc: Jeremy Fitzhardinge <jeremy@goop.org> Cc: Chris Wright <chrisw@sous-sol.org>
* Merge branch 'for-linus' of ↵Linus Torvalds2007-10-226-40/+54
|\ | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: appletouch - apply idle reset logic to all touchpads Input: usbtouchscreen - add support for GoTop tablet devices Input: bf54x-keys - return real error when request_irq() fails Input: i8042 - export i8042_command()
| * Input: appletouch - apply idle reset logic to all touchpadsAnton Ekblad2007-10-221-14/+11
| | | | | | | | | | | | | | | | | | | | Not only Geyser 3 but also Geyser 1 need to be reset after they become idle to stop them from needlessly waking up the kernel. Do idle reset on all touchpads, regardless of their version - if we see 10 empty packets the touchpad needs to be reset; good touchpads should not send empty packets anyway. Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
| * Input: usbtouchscreen - add support for GoTop tablet devicesJerrold Jones2007-10-222-2/+40
| | | | | | | | | | | | | | | | Add support for GoTop Super_Q2/GogoPen/PenPower tablets to usbtouchscreen. Protocol discovery was done by Yick Yan Lam. Signed-off-by: Daniel Ritz <daniel.ritz@gmx.ch> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
| * Input: bf54x-keys - return real error when request_irq() failsMichael Hennerich2007-10-221-1/+0
| | | | | | | | | | | | Signed-off-by: Michael Hennerich <michael.hennerich@analog.com> Signed-off-by: Bryan Wu <bryan.wu@analog.com> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
| * Input: i8042 - export i8042_command()Márton Németh2007-10-222-23/+3
| | | | | | | | | | | | | | | | | | Export the i8042_command() function which manages the mutual exclusion with the help of the i8042_lock spinlock. This allows to access i8042 safely from other parts of the kernel. Signed-off-by: Márton Németh <nm127@freemail.hu> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
* | Merge branch 'for-linus' of ↵Linus Torvalds2007-10-225-68/+103
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/avi/kvm: KVM: Use new smp_call_function_mask() in kvm_flush_remote_tlbs() sched: don't clear PF_VCPU in scheduler KVM: Improve local apic timer wraparound handling KVM: Fix local apic timer divide by zero KVM: Move kvm_guest_exit() after local_irq_enable() KVM: x86 emulator: fix access registers for instructions with ModR/M byte and Mod = 3 KVM: VMX: Force vm86 mode if setting flags during real mode KVM: x86 emulator: implement 'movnti mem, reg' KVM: VMX: Reset mmu context when entering real mode KVM: VMX: Handle NMIs before enabling interrupts and preemption KVM: MMU: Set shadow pte atomically in mmu_pte_write_zap_pte() KVM: x86 emulator: fix repne/repnz decoding KVM: x86 emulator: fix merge screwup due to emulator split
| * | KVM: Use new smp_call_function_mask() in kvm_flush_remote_tlbs()Laurent Vivier2007-10-221-23/+3
| | | | | | | | | | | | | | | | | | | | | | | | In kvm_flush_remote_tlbs(), replace a loop using smp_call_function_single() by a single call to smp_call_function_mask() (which is new for x86_64). Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * | KVM: Improve local apic timer wraparound handlingKevin Pedretti2007-10-221-10/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Better handle wrap-around cases when reading the APIC CCR (current count register). Also, if ICR is 0, CCR should also be 0... previously reading CCR before setting ICR would result in a large kinda-random number. Signed-off-by: Kevin Pedretti <kevin.pedretti@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * | KVM: Fix local apic timer divide by zeroKevin Pedretti2007-10-221-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | kvm_lapic_reset() was initializing apic->timer.divide_count to 0, which could potentially lead to a divide by zero error in apic_get_tmcct(). Any guest that reads the APIC's CCR (current count) register before setting DCR (divide configuration) would trigger a divide by zero exception in the host kernel, leading to a host-OS crash. This patch results in apic->timer.divide_count being initialized to 2 at reset, eliminating the bug (DCR=0 at reset, meaning divide by 2). Signed-off-by: Kevin Pedretti <kevin.pedretti@gmail.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * | KVM: Move kvm_guest_exit() after local_irq_enable()Laurent Vivier2007-10-221-1/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to make sure that the timer interrupt happens before we clear PF_VCPU, so the accounting code actually sees guest mode. http://lkml.org/lkml/2007/10/15/114 Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * | KVM: x86 emulator: fix access registers for instructions with ModR/M byte ↵Aurelien Jarno2007-10-221-0/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | and Mod = 3 The patch belows changes the access type to register from memory for instructions that are declared as SrcMem or DstMem, but have a ModR/M byte with Mod = 3. It fixes (at least) the lmsw and smsw instructions on an AMD64 CPU, which are needed for FreeBSD. Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * | KVM: VMX: Force vm86 mode if setting flags during real modeAvi Kivity2007-10-221-0/+2
| | | | | | | | | | | | | | | | | | | | | When resetting from userspace, we need to handle the flags being cleared even after we are in real mode. Signed-off-by: Avi Kivity <avi@qumranet.com>
| * | KVM: x86 emulator: implement 'movnti mem, reg'Sheng Yang2007-10-221-1/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | Implement emulation of instruction: movnti m32/m64, r32/r64 opcode: 0x0f 0xc3 Signed-off-by: Sheng Yang <sheng.yang@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * | KVM: VMX: Reset mmu context when entering real modeEddie Dong2007-10-222-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Resetting an SMP guest will force AP enter real mode (RESET) with paging enabled in protected mode. While current enter_rmode() can only handle mode switch from nonpaging mode to real mode which leads to SMP reboot failure. Fix by reloading the mmu context on entering real mode. Signed-off-by: Yaozu (Eddie) Dong <eddie.dong@intel.com> Signed-off-by: Qing He <qing.he@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * | KVM: VMX: Handle NMIs before enabling interrupts and preemptionAvi Kivity2007-10-221-4/+9
| | | | | | | | | | | | | | | | | | | | | This makes sure we handle NMI on the current cpu, and that we don't service maskable interrupts before non-maskable ones. Signed-off-by: Avi Kivity <avi@qumranet.com>
| * | KVM: MMU: Set shadow pte atomically in mmu_pte_write_zap_pte()Izik Eidus2007-10-221-1/+1
| | | | | | | | | | | | | | | | | | | | | Setting shadow page table entry should be set atomicly using set_shadow_pte(). Signed-off-by: Izik Eidus <izike@qumranet.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * | KVM: x86 emulator: fix repne/repnz decodingLaurent Vivier2007-10-221-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The repnz/repne instructions must set rep_prefix to 1 like rep/repe/repz. This patch correct the disk probe problem met with OpenBSD. This issue appears with commit e70669abd4e60dfea3ac1639848e20e2b8dd1255 because before it, the decoding was done internally to kvm and after it is done by x86_emulate.c (which doesn't do it correctly). Signed-off-by: Laurent Vivier <Laurent.Vivier@bull.net> Signed-off-by: Avi Kivity <avi@qumranet.com>
| * | KVM: x86 emulator: fix merge screwup due to emulator splitNitin A Kamble2007-10-221-25/+26
| | | | | | | | | | | | | | | | | | | | | | | | This code has gone to wrong place in the file. Moving it back to right location. Signed-off-by: Nitin A Kamble <nitin.a.kamble@intel.com> Signed-off-by: Avi Kivity <avi@qumranet.com>
* | | Merge branch 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6Linus Torvalds2007-10-226-44/+41
|\ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'for-linus' of git://git390.osdl.marist.edu/pub/scm/linux-2.6: [S390] 4level-fixup cleanup [S390] Cleanup page table definitions. [S390] Introduce follow_table in uaccess_pt.c [S390] Remove unused user_seg from thread structure. [S390] tlb flush fix. [S390] kernel: Fix dump on panic for DASDs under LPAR. [S390] struct class_device -> struct device conversion. [S390] cio: Fix incomplete commit for uevent suppression. [S390] cio: Use to_channelpath() for device to channel path conversion. [S390] Add per-cpu idle time / idle count sysfs attributes. [S390] Update default configuration.
| * | | [S390] struct class_device -> struct device conversion.Cornelia Huck2007-10-224-38/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Convert struct class_device users under drivers/s390/char to use struct device. Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
| * | | [S390] cio: Fix incomplete commit for uevent suppression.Cornelia Huck2007-10-221-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit fa1a8c23eb7d3ded8a3c6d0e653339a2bc7fca9e intended to introduce uevent suppression for subchannels, but half of it was lost somewhere. Now, we end up with two uevents for every registered subchannel :( So we should better add the missing part from http://marc.info/?l=linux-kernel&m=117515953113974&w=2. Signed-off-by: Cornelia Huck <cornelia.huck@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>