Merge tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-nextgrafted

Pull networking updates from Jakub Kicinski: "Core: - Add dedicated kmem_cache for typical/small skb->head, avoid having to access struct page at kfree time, and improve memory use. - Introduce sysctl to set default RPS configuration for new netdevs. - Define Netlink protocol specification format which can be used to describe messages used by each family and auto-generate parsers. Add tools for generating kernel data structures and uAPI headers. - Expose all net/core sysctls inside netns. - Remove 4s sleep in netpoll if carrier is instantly detected on boot. - Add configurable limit of MDB entries per port, and port-vlan. - Continue populating drop reasons throughout the stack. - Retire a handful of legacy Qdiscs and classifiers. Protocols: - Support IPv4 big TCP (TSO frames larger than 64kB). - Add IP_LOCAL_PORT_RANGE socket option, to control local port range on socket by socket basis. - Track and report in procfs number of MPTCP sockets used. - Support mixing IPv4 and IPv6 flows in the in-kernel MPTCP path manager. - IPv6: don't check net.ipv6.route.max_size and rely on garbage collection to free memory (similarly to IPv4). - Support Penultimate Segment Pop (PSP) flavor in SRv6 (RFC8986). - ICMP: add per-rate limit counters. - Add support for user scanning requests in ieee802154. - Remove static WEP support. - Support minimal Wi-Fi 7 Extremely High Throughput (EHT) rate reporting. - WiFi 7 EHT channel puncturing support (client & AP). BPF: - Add a rbtree data structure following the "next-gen data structure" precedent set by recently added linked list, that is, by using kfunc + kptr instead of adding a new BPF map type. - Expose XDP hints via kfuncs with initial support for RX hash and timestamp metadata. - Add BPF_F_NO_TUNNEL_KEY extension to bpf_skb_set_tunnel_key to better support decap on GRE tunnel devices not operating in collect metadata. - Improve x86 JIT's codegen for PROBE_MEM runtime error checks. - Remove the need for trace_printk_lock for bpf_trace_printk and bpf_trace_vprintk helpers. - Extend libbpf's bpf_tracing.h support for tracing arguments of kprobes/uprobes and syscall as a special case. - Significantly reduce the search time for module symbols by livepatch and BPF. - Enable cpumasks to be used as kptrs, which is useful for tracing programs tracking which tasks end up running on which CPUs in different time intervals. - Add support for BPF trampoline on s390x and riscv64. - Add capability to export the XDP features supported by the NIC. - Add __bpf_kfunc tag for marking kernel functions as kfuncs. - Add cgroup.memory=nobpf kernel parameter option to disable BPF memory accounting for container environments. Netfilter: - Remove the CLUSTERIP target. It has been marked as obsolete for years, and we still have WARN splats wrt races of the out-of-band /proc interface installed by this target. - Add 'destroy' commands to nf_tables. They are identical to the existing 'delete' commands, but do not return an error if the referenced object (set, chain, rule...) did not exist. Driver API: - Improve cpumask_local_spread() locality to help NICs set the right IRQ affinity on AMD platforms. - Separate C22 and C45 MDIO bus transactions more clearly. - Introduce new DCB table to control DSCP rewrite on egress. - Support configuration of Physical Layer Collision Avoidance (PLCA) Reconciliation Sublayer (RS) (802.3cg-2019). Modern version of shared medium Ethernet. - Support for MAC Merge layer (IEEE 802.3-2018 clause 99). Allowing preemption of low priority frames by high priority frames. - Add support for controlling MACSec offload using netlink SET. - Rework devlink instance refcounts to allow registration and de-registration under the instance lock. Split the code into multiple files, drop some of the unnecessarily granular locks and factor out common parts of netlink operation handling. - Add TX frame aggregation parameters (for USB drivers). - Add a new attr TCA_EXT_WARN_MSG to report TC (offload) warning messages with notifications for debug. - Allow offloading of UDP NEW connections via act_ct. - Add support for per action HW stats in TC. - Support hardware miss to TC action (continue processing in SW from a specific point in the action chain). - Warn if old Wireless Extension user space interface is used with modern cfg80211/mac80211 drivers. Do not support Wireless Extensions for Wi-Fi 7 devices at all. Everyone should switch to using nl80211 interface instead. - Improve the CAN bit timing configuration. Use extack to return error messages directly to user space, update the SJW handling, including the definition of a new default value that will benefit CAN-FD controllers, by increasing their oscillator tolerance. New hardware / drivers: - Ethernet: - nVidia BlueField-3 support (control traffic driver) - Ethernet support for imx93 SoCs - Motorcomm yt8531 gigabit Ethernet PHY - onsemi NCN26000 10BASE-T1S PHY (with support for PLCA) - Microchip LAN8841 PHY (incl. cable diagnostics and PTP) - Amlogic gxl MDIO mux - WiFi: - RealTek RTL8188EU (rtl8xxxu) - Qualcomm Wi-Fi 7 devices (ath12k) - CAN: - Renesas R-Car V4H Drivers: - Bluetooth: - Set Per Platform Antenna Gain (PPAG) for Intel controllers. - Ethernet NICs: - Intel (1G, igc): - support TSN / Qbv / packet scheduling features of i226 model - Intel (100G, ice): - use GNSS subsystem instead of TTY - multi-buffer XDP support - extend support for GPIO pins to E823 devices - nVidia/Mellanox: - update the shared buffer configuration on PFC commands - implement PTP adjphase function for HW offset control - TC support for Geneve and GRE with VF tunnel offload - more efficient crypto key management method - multi-port eswitch support - Netronome/Corigine: - add DCB IEEE support - support IPsec offloading for NFP3800 - Freescale/NXP (enetc): - support XDP_REDIRECT for XDP non-linear buffers - improve reconfig, avoid link flap and waiting for idle - support MAC Merge layer - Other NICs: - sfc/ef100: add basic devlink support for ef100 - ionic: rx_push mode operation (writing descriptors via MMIO) - bnxt: use the auxiliary bus abstraction for RDMA - r8169: disable ASPM and reset bus in case of tx timeout - cpsw: support QSGMII mode for J721e CPSW9G - cpts: support pulse-per-second output - ngbe: add an mdio bus driver - usbnet: optimize usbnet_bh() by avoiding unnecessary queuing - r8152: handle devices with FW with NCM support - amd-xgbe: support 10Mbps, 2.5GbE speeds and rx-adaptation - virtio-net: support multi buffer XDP - virtio/vsock: replace virtio_vsock_pkt with sk_buff - tsnep: XDP support - Ethernet high-speed switches: - nVidia/Mellanox (mlxsw): - add support for latency TLV (in FW control messages) - Microchip (sparx5): - separate explicit and implicit traffic forwarding rules, make the implicit rules always active - add support for egress DSCP rewrite - IS0 VCAP support (Ingress Classification) - IS2 VCAP filters (protos, L3 addrs, L4 ports, flags, ToS etc.) - ES2 VCAP support (Egress Access Control) - support for Per-Stream Filtering and Policing (802.1Q, 8.6.5.1) - Ethernet embedded switches: - Marvell (mv88e6xxx): - add MAB (port auth) offload support - enable PTP receive for mv88e6390 - NXP (ocelot): - support MAC Merge layer - support for the the vsc7512 internal copper phys - Microchip: - lan9303: convert to PHYLINK - lan966x: support TC flower filter statistics - lan937x: PTP support for KSZ9563/KSZ8563 and LAN937x - lan937x: support Credit Based Shaper configuration - ksz9477: support Energy Efficient Ethernet - other: - qca8k: convert to regmap read/write API, use bulk operations - rswitch: Improve TX timestamp accuracy - Intel WiFi (iwlwifi): - EHT (Wi-Fi 7) rate reporting - STEP equalizer support: transfer some STEP (connection to radio on platforms with integrated wifi) related parameters from the BIOS to the firmware. - Qualcomm 802.11ax WiFi (ath11k): - IPQ5018 support - Fine Timing Measurement (FTM) responder role support - channel 177 support - MediaTek WiFi (mt76): - per-PHY LED support - mt7996: EHT (Wi-Fi 7) support - Wireless Ethernet Dispatch (WED) reset support - switch to using page pool allocator - RealTek WiFi (rtw89): - support new version of Bluetooth co-existance - Mobile: - rmnet: support TX aggregation" * tag 'net-next-6.3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1872 commits) page_pool: add a comment explaining the fragment counter usage net: ethtool: fix __ethtool_dev_mm_supported() implementation ethtool: pse-pd: Fix double word in comments xsk: add linux/vmalloc.h to xsk.c sefltests: netdevsim: wait for devlink instance after netns removal selftest: fib_tests: Always cleanup before exit net/mlx5e: Align IPsec ASO result memory to be as required by hardware net/mlx5e: TC, Set CT miss to the specific ct action instance net/mlx5e: Rename CHAIN_TO_REG to MAPPED_OBJ_TO_REG net/mlx5: Refactor tc miss handling to a single function net/mlx5: Kconfig: Make tc offload depend on tc skb extension net/sched: flower: Support hardware miss to tc action net/sched: flower: Move filter handle initialization earlier net/sched: cls_api: Support hardware miss to tc action net/sched: Rename user cookie and act cookie sfc: fix builds without CONFIG_RTC_LIB sfc: clean up some inconsistent indentings net/mlx4_en: Introduce flexible array to silence overflow warning net: lan966x: Fix possible deadlock inside PTP net/ulp: Remove redundant ->clone() test in inet_clone_ulp(). ...
author: Linus Torvalds <torvalds@linux-foundation.org> 2023-02-21 18:24:12 -0800
committer: Linus Torvalds <torvalds@linux-foundation.org> 2023-02-21 18:24:12 -0800
commit: 5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 (patch)
tree: cc5c2d0a898769fd59549594fedb3ee6f84e59a0 /arch/x86/math-emu/fpu_trig.c
download: linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.tar.gz
linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.zip
1 files changed, 1649 insertions, 0 deletions
diff --git a/arch/x86/math-emu/fpu_trig.c b/arch/x86/math-emu/fpu_trig.c
new file mode 100644
index 000000000..990d847ae
--- /dev/null
+++ b/arch/x86/math-emu/fpu_trig.c
@@ -0,0 +1,1649 @@
+// SPDX-License-Identifier: GPL-2.0
+/*---------------------------------------------------------------------------+
+ |  fpu_trig.c                                                               |
+ |                                                                           |
+ | Implementation of the FPU "transcendental" functions.                     |
+ |                                                                           |
+ | Copyright (C) 1992,1993,1994,1997,1999                                    |
+ |                       W. Metzenthen, 22 Parker St, Ormond, Vic 3163,      |
+ |                       Australia.  E-mail   billm@melbpc.org.au            |
+ |                                                                           |
+ |                                                                           |
+ +---------------------------------------------------------------------------*/
+
+#include "fpu_system.h"
+#include "exception.h"
+#include "fpu_emu.h"
+#include "status_w.h"
+#include "control_w.h"
+#include "reg_constant.h"
+
+static void rem_kernel(unsigned long long st0, unsigned long long *y,
+		       unsigned long long st1, unsigned long long q, int n);
+
+#define BETTER_THAN_486
+
+#define FCOS  4
+
+/* Used only by fptan, fsin, fcos, and fsincos. */
+/* This routine produces very accurate results, similar to
+   using a value of pi with more than 128 bits precision. */
+/* Limited measurements show no results worse than 64 bit precision
+   except for the results for arguments close to 2^63, where the
+   precision of the result sometimes degrades to about 63.9 bits */
+static int trig_arg(FPU_REG *st0_ptr, int even)
+{
+	FPU_REG tmp;
+	u_char tmptag;
+	unsigned long long q;
+	int old_cw = control_word, saved_status = partial_status;
+	int tag, st0_tag = TAG_Valid;
+
+	if (exponent(st0_ptr) >= 63) {
+		partial_status |= SW_C2;	/* Reduction incomplete. */
+		return -1;
+	}
+
+	control_word &= ~CW_RC;
+	control_word |= RC_CHOP;
+
+	setpositive(st0_ptr);
+	tag = FPU_u_div(st0_ptr, &CONST_PI2, &tmp, PR_64_BITS | RC_CHOP | 0x3f,
+			SIGN_POS);
+
+	FPU_round_to_int(&tmp, tag);	/* Fortunately, this can't overflow
+					   to 2^64 */
+	q = significand(&tmp);
+	if (q) {
+		rem_kernel(significand(st0_ptr),
+			   &significand(&tmp),
+			   significand(&CONST_PI2),
+			   q, exponent(st0_ptr) - exponent(&CONST_PI2));
+		setexponent16(&tmp, exponent(&CONST_PI2));
+		st0_tag = FPU_normalize(&tmp);
+		FPU_copy_to_reg0(&tmp, st0_tag);
+	}
+
+	if ((even && !(q & 1)) || (!even && (q & 1))) {
+		st0_tag =
+		    FPU_sub(REV | LOADED | TAG_Valid, (int)&CONST_PI2,
+			    FULL_PRECISION);
+
+#ifdef BETTER_THAN_486
+		/* So far, the results are exact but based upon a 64 bit
+		   precision approximation to pi/2. The technique used
+		   now is equivalent to using an approximation to pi/2 which
+		   is accurate to about 128 bits. */
+		if ((exponent(st0_ptr) <= exponent(&CONST_PI2extra) + 64)
+		    || (q > 1)) {
+			/* This code gives the effect of having pi/2 to better than
+			   128 bits precision. */
+
+			significand(&tmp) = q + 1;
+			setexponent16(&tmp, 63);
+			FPU_normalize(&tmp);
+			tmptag =
+			    FPU_u_mul(&CONST_PI2extra, &tmp, &tmp,
+				      FULL_PRECISION, SIGN_POS,
+				      exponent(&CONST_PI2extra) +
+				      exponent(&tmp));
+			setsign(&tmp, getsign(&CONST_PI2extra));
+			st0_tag = FPU_add(&tmp, tmptag, 0, FULL_PRECISION);
+			if (signnegative(st0_ptr)) {
+				/* CONST_PI2extra is negative, so the result of the addition
+				   can be negative. This means that the argument is actually
+				   in a different quadrant. The correction is always < pi/2,
+				   so it can't overflow into yet another quadrant. */
+				setpositive(st0_ptr);
+				q++;
+			}
+		}
+#endif /* BETTER_THAN_486 */
+	}
+#ifdef BETTER_THAN_486
+	else {
+		/* So far, the results are exact but based upon a 64 bit
+		   precision approximation to pi/2. The technique used
+		   now is equivalent to using an approximation to pi/2 which
+		   is accurate to about 128 bits. */
+		if (((q > 0)
+		     && (exponent(st0_ptr) <= exponent(&CONST_PI2extra) + 64))
+		    || (q > 1)) {
+			/* This code gives the effect of having p/2 to better than
+			   128 bits precision. */
+
+			significand(&tmp) = q;
+			setexponent16(&tmp, 63);
+			FPU_normalize(&tmp);	/* This must return TAG_Valid */
+			tmptag =
+			    FPU_u_mul(&CONST_PI2extra, &tmp, &tmp,
+				      FULL_PRECISION, SIGN_POS,
+				      exponent(&CONST_PI2extra) +
+				      exponent(&tmp));
+			setsign(&tmp, getsign(&CONST_PI2extra));
+			st0_tag = FPU_sub(LOADED | (tmptag & 0x0f), (int)&tmp,
+					  FULL_PRECISION);
+			if ((exponent(st0_ptr) == exponent(&CONST_PI2)) &&
+			    ((st0_ptr->sigh > CONST_PI2.sigh)
+			     || ((st0_ptr->sigh == CONST_PI2.sigh)
+				 && (st0_ptr->sigl > CONST_PI2.sigl)))) {
+				/* CONST_PI2extra is negative, so the result of the
+				   subtraction can be larger than pi/2. This means
+				   that the argument is actually in a different quadrant.
+				   The correction is always < pi/2, so it can't overflow
+				   into yet another quadrant. */
+				st0_tag =
+				    FPU_sub(REV | LOADED | TAG_Valid,
+					    (int)&CONST_PI2, FULL_PRECISION);
+				q++;
+			}
+		}
+	}
+#endif /* BETTER_THAN_486 */
+
+	FPU_settag0(st0_tag);
+	control_word = old_cw;
+	partial_status = saved_status & ~SW_C2;	/* Reduction complete. */
+
+	return (q & 3) | even;
+}
+
+/* Convert a long to register */
+static void convert_l2reg(long const *arg, int deststnr)
+{
+	int tag;
+	long num = *arg;
+	u_char sign;
+	FPU_REG *dest = &st(deststnr);
+
+	if (num == 0) {
+		FPU_copy_to_regi(&CONST_Z, TAG_Zero, deststnr);
+		return;
+	}
+
+	if (num > 0) {
+		sign = SIGN_POS;
+	} else {
+		num = -num;
+		sign = SIGN_NEG;
+	}
+
+	dest->sigh = num;
+	dest->sigl = 0;
+	setexponent16(dest, 31);
+	tag = FPU_normalize(dest);
+	FPU_settagi(deststnr, tag);
+	setsign(dest, sign);
+	return;
+}
+
+static void single_arg_error(FPU_REG *st0_ptr, u_char st0_tag)
+{
+	if (st0_tag == TAG_Empty)
+		FPU_stack_underflow();	/* Puts a QNaN in st(0) */
+	else if (st0_tag == TW_NaN)
+		real_1op_NaN(st0_ptr);	/* return with a NaN in st(0) */
+#ifdef PARANOID
+	else
+		EXCEPTION(EX_INTERNAL | 0x0112);
+#endif /* PARANOID */
+}
+
+static void single_arg_2_error(FPU_REG *st0_ptr, u_char st0_tag)
+{
+	int isNaN;
+
+	switch (st0_tag) {
+	case TW_NaN:
+		isNaN = (exponent(st0_ptr) == EXP_OVER)
+		    && (st0_ptr->sigh & 0x80000000);
+		if (isNaN && !(st0_ptr->sigh & 0x40000000)) {	/* Signaling ? */
+			EXCEPTION(EX_Invalid);
+			if (control_word & CW_Invalid) {
+				/* The masked response */
+				/* Convert to a QNaN */
+				st0_ptr->sigh |= 0x40000000;
+				push();
+				FPU_copy_to_reg0(st0_ptr, TAG_Special);
+			}
+		} else if (isNaN) {
+			/* A QNaN */
+			push();
+			FPU_copy_to_reg0(st0_ptr, TAG_Special);
+		} else {
+			/* pseudoNaN or other unsupported */
+			EXCEPTION(EX_Invalid);
+			if (control_word & CW_Invalid) {
+				/* The masked response */
+				FPU_copy_to_reg0(&CONST_QNaN, TAG_Special);
+				push();
+				FPU_copy_to_reg0(&CONST_QNaN, TAG_Special);
+			}
+		}
+		break;		/* return with a NaN in st(0) */
+#ifdef PARANOID
+	default:
+		EXCEPTION(EX_INTERNAL | 0x0112);
+#endif /* PARANOID */
+	}
+}
+
+/*---------------------------------------------------------------------------*/
+
+static void f2xm1(FPU_REG *st0_ptr, u_char tag)
+{
+	FPU_REG a;
+
+	clear_C1();
+
+	if (tag == TAG_Valid) {
+		/* For an 80486 FPU, the result is undefined if the arg is >= 1.0 */
+		if (exponent(st0_ptr) < 0) {
+		      denormal_arg:
+
+			FPU_to_exp16(st0_ptr, &a);
+
+			/* poly_2xm1(x) requires 0 < st(0) < 1. */
+			poly_2xm1(getsign(st0_ptr), &a, st0_ptr);
+		}
+		set_precision_flag_up();	/* 80486 appears to always do this */
+		return;
+	}
+
+	if (tag == TAG_Zero)
+		return;
+
+	if (tag == TAG_Special)
+		tag = FPU_Special(st0_ptr);
+
+	switch (tag) {
+	case TW_Denormal:
+		if (denormal_operand() < 0)
+			return;
+		goto denormal_arg;
+	case TW_Infinity:
+		if (signnegative(st0_ptr)) {
+			/* -infinity gives -1 (p16-10) */
+			FPU_copy_to_reg0(&CONST_1, TAG_Valid);
+			setnegative(st0_ptr);
+		}
+		return;
+	default:
+		single_arg_error(st0_ptr, tag);
+	}
+}
+
+static void fptan(FPU_REG *st0_ptr, u_char st0_tag)
+{
+	FPU_REG *st_new_ptr;
+	int q;
+	u_char arg_sign = getsign(st0_ptr);
+
+	/* Stack underflow has higher priority */
+	if (st0_tag == TAG_Empty) {
+		FPU_stack_underflow();	/* Puts a QNaN in st(0) */
+		if (control_word & CW_Invalid) {
+			st_new_ptr = &st(-1);
+			push();
+			FPU_stack_underflow();	/* Puts a QNaN in the new st(0) */
+		}
+		return;
+	}
+
+	if (STACK_OVERFLOW) {
+		FPU_stack_overflow();
+		return;
+	}
+
+	if (st0_tag == TAG_Valid) {
+		if (exponent(st0_ptr) > -40) {
+			if ((q = trig_arg(st0_ptr, 0)) == -1) {
+				/* Operand is out of range */
+				return;
+			}
+
+			poly_tan(st0_ptr);
+			setsign(st0_ptr, (q & 1) ^ (arg_sign != 0));
+			set_precision_flag_up();	/* We do not really know if up or down */
+		} else {
+			/* For a small arg, the result == the argument */
+			/* Underflow may happen */
+
+		      denormal_arg:
+
+			FPU_to_exp16(st0_ptr, st0_ptr);
+
+			st0_tag =
+			    FPU_round(st0_ptr, 1, 0, FULL_PRECISION, arg_sign);
+			FPU_settag0(st0_tag);
+		}
+		push();
+		FPU_copy_to_reg0(&CONST_1, TAG_Valid);
+		return;
+	}
+
+	if (st0_tag == TAG_Zero) {
+		push();
+		FPU_copy_to_reg0(&CONST_1, TAG_Valid);
+		setcc(0);
+		return;
+	}
+
+	if (st0_tag == TAG_Special)
+		st0_tag = FPU_Special(st0_ptr);
+
+	if (st0_tag == TW_Denormal) {
+		if (denormal_operand() < 0)
+			return;
+
+		goto denormal_arg;
+	}
+
+	if (st0_tag == TW_Infinity) {
+		/* The 80486 treats infinity as an invalid operand */
+		if (arith_invalid(0) >= 0) {
+			st_new_ptr = &st(-1);
+			push();
+			arith_invalid(0);
+		}
+		return;
+	}
+
+	single_arg_2_error(st0_ptr, st0_tag);
+}
+
+static void fxtract(FPU_REG *st0_ptr, u_char st0_tag)
+{
+	FPU_REG *st_new_ptr;
+	u_char sign;
+	register FPU_REG *st1_ptr = st0_ptr;	/* anticipate */
+
+	if (STACK_OVERFLOW) {
+		FPU_stack_overflow();
+		return;
+	}
+
+	clear_C1();
+
+	if (st0_tag == TAG_Valid) {
+		long e;
+
+		push();
+		sign = getsign(st1_ptr);
+		reg_copy(st1_ptr, st_new_ptr);
+		setexponent16(st_new_ptr, exponent(st_new_ptr));
+
+	      denormal_arg:
+
+		e = exponent16(st_new_ptr);
+		convert_l2reg(&e, 1);
+		setexponentpos(st_new_ptr, 0);
+		setsign(st_new_ptr, sign);
+		FPU_settag0(TAG_Valid);	/* Needed if arg was a denormal */
+		return;
+	} else if (st0_tag == TAG_Zero) {
+		sign = getsign(st0_ptr);
+
+		if (FPU_divide_by_zero(0, SIGN_NEG) < 0)
+			return;
+
+		push();
+		FPU_copy_to_reg0(&CONST_Z, TAG_Zero);
+		setsign(st_new_ptr, sign);
+		return;
+	}
+
+	if (st0_tag == TAG_Special)
+		st0_tag = FPU_Special(st0_ptr);
+
+	if (st0_tag == TW_Denormal) {
+		if (denormal_operand() < 0)
+			return;
+
+		push();
+		sign = getsign(st1_ptr);
+		FPU_to_exp16(st1_ptr, st_new_ptr);
+		goto denormal_arg;
+	} else if (st0_tag == TW_Infinity) {
+		sign = getsign(st0_ptr);
+		setpositive(st0_ptr);
+		push();
+		FPU_copy_to_reg0(&CONST_INF, TAG_Special);
+		setsign(st_new_ptr, sign);
+		return;
+	} else if (st0_tag == TW_NaN) {
+		if (real_1op_NaN(st0_ptr) < 0)
+			return;
+
+		push();
+		FPU_copy_to_reg0(st0_ptr, TAG_Special);
+		return;
+	} else if (st0_tag == TAG_Empty) {
+		/* Is this the correct behaviour? */
+		if (control_word & EX_Invalid) {
+			FPU_stack_underflow();
+			push();
+			FPU_stack_underflow();
+		} else
+			EXCEPTION(EX_StackUnder);
+	}
+#ifdef PARANOID
+	else
+		EXCEPTION(EX_INTERNAL | 0x119);
+#endif /* PARANOID */
+}
+
+static void fdecstp(void)
+{
+	clear_C1();
+	top--;
+}
+
+static void fincstp(void)
+{
+	clear_C1();
+	top++;
+}
+
+static void fsqrt_(FPU_REG *st0_ptr, u_char st0_tag)
+{
+	int expon;
+
+	clear_C1();
+
+	if (st0_tag == TAG_Valid) {
+		u_char tag;
+
+		if (signnegative(st0_ptr)) {
+			arith_invalid(0);	/* sqrt(negative) is invalid */
+			return;
+		}
+
+		/* make st(0) in  [1.0 .. 4.0) */
+		expon = exponent(st0_ptr);
+
+	      denormal_arg:
+
+		setexponent16(st0_ptr, (expon & 1));
+
+		/* Do the computation, the sign of the result will be positive. */
+		tag = wm_sqrt(st0_ptr, 0, 0, control_word, SIGN_POS);
+		addexponent(st0_ptr, expon >> 1);
+		FPU_settag0(tag);
+		return;
+	}
+
+	if (st0_tag == TAG_Zero)
+		return;
+
+	if (st0_tag == TAG_Special)
+		st0_tag = FPU_Special(st0_ptr);
+
+	if (st0_tag == TW_Infinity) {
+		if (signnegative(st0_ptr))
+			arith_invalid(0);	/* sqrt(-Infinity) is invalid */
+		return;
+	} else if (st0_tag == TW_Denormal) {
+		if (signnegative(st0_ptr)) {
+			arith_invalid(0);	/* sqrt(negative) is invalid */
+			return;
+		}
+
+		if (denormal_operand() < 0)
+			return;
+
+		FPU_to_exp16(st0_ptr, st0_ptr);
+
+		expon = exponent16(st0_ptr);
+
+		goto denormal_arg;
+	}
+
+	single_arg_error(st0_ptr, st0_tag);
+
+}
+
+static void frndint_(FPU_REG *st0_ptr, u_char st0_tag)
+{
+	int flags, tag;
+
+	if (st0_tag == TAG_Valid) {
+		u_char sign;
+
+	      denormal_arg:
+
+		sign = getsign(st0_ptr);
+
+		if (exponent(st0_ptr) > 63)
+			return;
+
+		if (st0_tag == TW_Denormal) {
+			if (denormal_operand() < 0)
+				return;
+		}
+
+		/* Fortunately, this can't overflow to 2^64 */
+		if ((flags = FPU_round_to_int(st0_ptr, st0_tag)))
+			set_precision_flag(flags);
+
+		setexponent16(st0_ptr, 63);
+		tag = FPU_normalize(st0_ptr);
+		setsign(st0_ptr, sign);
+		FPU_settag0(tag);
+		return;
+	}
+
+	if (st0_tag == TAG_Zero)
+		return;
+
+	if (st0_tag == TAG_Special)
+		st0_tag = FPU_Special(st0_ptr);
+
+	if (st0_tag == TW_Denormal)
+		goto denormal_arg;
+	else if (st0_tag == TW_Infinity)
+		return;
+	else
+		single_arg_error(st0_ptr, st0_tag);
+}
+
+static int f_sin(FPU_REG *st0_ptr, u_char tag)
+{
+	u_char arg_sign = getsign(st0_ptr);
+
+	if (tag == TAG_Valid) {
+		int q;
+
+		if (exponent(st0_ptr) > -40) {
+			if ((q = trig_arg(st0_ptr, 0)) == -1) {
+				/* Operand is out of range */
+				return 1;
+			}
+
+			poly_sine(st0_ptr);
+
+			if (q & 2)
+				changesign(st0_ptr);
+
+			setsign(st0_ptr, getsign(st0_ptr) ^ arg_sign);
+
+			/* We do not really know if up or down */
+			set_precision_flag_up();
+			return 0;
+		} else {
+			/* For a small arg, the result == the argument */
+			set_precision_flag_up();	/* Must be up. */
+			return 0;
+		}
+	}
+
+	if (tag == TAG_Zero) {
+		setcc(0);
+		return 0;
+	}
+
+	if (tag == TAG_Special)
+		tag = FPU_Special(st0_ptr);
+
+	if (tag == TW_Denormal) {
+		if (denormal_operand() < 0)
+			return 1;
+
+		/* For a small arg, the result == the argument */
+		/* Underflow may happen */
+		FPU_to_exp16(st0_ptr, st0_ptr);
+
+		tag = FPU_round(st0_ptr, 1, 0, FULL_PRECISION, arg_sign);
+
+		FPU_settag0(tag);
+
+		return 0;
+	} else if (tag == TW_Infinity) {
+		/* The 80486 treats infinity as an invalid operand */
+		arith_invalid(0);
+		return 1;
+	} else {
+		single_arg_error(st0_ptr, tag);
+		return 1;
+	}
+}
+
+static void fsin(FPU_REG *st0_ptr, u_char tag)
+{
+	f_sin(st0_ptr, tag);
+}
+
+static int f_cos(FPU_REG *st0_ptr, u_char tag)
+{
+	u_char st0_sign;
+
+	st0_sign = getsign(st0_ptr);
+
+	if (tag == TAG_Valid) {
+		int q;
+
+		if (exponent(st0_ptr) > -40) {
+			if ((exponent(st0_ptr) < 0)
+			    || ((exponent(st0_ptr) == 0)
+				&& (significand(st0_ptr) <=
+				    0xc90fdaa22168c234LL))) {
+				poly_cos(st0_ptr);
+
+				/* We do not really know if up or down */
+				set_precision_flag_down();
+
+				return 0;
+			} else if ((q = trig_arg(st0_ptr, FCOS)) != -1) {
+				poly_sine(st0_ptr);
+
+				if ((q + 1) & 2)
+					changesign(st0_ptr);
+
+				/* We do not really know if up or down */
+				set_precision_flag_down();
+
+				return 0;
+			} else {
+				/* Operand is out of range */
+				return 1;
+			}
+		} else {
+		      denormal_arg:
+
+			setcc(0);
+			FPU_copy_to_reg0(&CONST_1, TAG_Valid);
+#ifdef PECULIAR_486
+			set_precision_flag_down();	/* 80486 appears to do this. */
+#else
+			set_precision_flag_up();	/* Must be up. */
+#endif /* PECULIAR_486 */
+			return 0;
+		}
+	} else if (tag == TAG_Zero) {
+		FPU_copy_to_reg0(&CONST_1, TAG_Valid);
+		setcc(0);
+		return 0;
+	}
+
+	if (tag == TAG_Special)
+		tag = FPU_Special(st0_ptr);
+
+	if (tag == TW_Denormal) {
+		if (denormal_operand() < 0)
+			return 1;
+
+		goto denormal_arg;
+	} else if (tag == TW_Infinity) {
+		/* The 80486 treats infinity as an invalid operand */
+		arith_invalid(0);
+		return 1;
+	} else {
+		single_arg_error(st0_ptr, tag);	/* requires st0_ptr == &st(0) */
+		return 1;
+	}
+}
+
+static void fcos(FPU_REG *st0_ptr, u_char st0_tag)
+{
+	f_cos(st0_ptr, st0_tag);
+}
+
+static void fsincos(FPU_REG *st0_ptr, u_char st0_tag)
+{
+	FPU_REG *st_new_ptr;
+	FPU_REG arg;
+	u_char tag;
+
+	/* Stack underflow has higher priority */
+	if (st0_tag == TAG_Empty) {
+		FPU_stack_underflow();	/* Puts a QNaN in st(0) */
+		if (control_word & CW_Invalid) {
+			st_new_ptr = &st(-1);
+			push();
+			FPU_stack_underflow();	/* Puts a QNaN in the new st(0) */
+		}
+		return;
+	}
+
+	if (STACK_OVERFLOW) {
+		FPU_stack_overflow();
+		return;
+	}
+
+	if (st0_tag == TAG_Special)
+		tag = FPU_Special(st0_ptr);
+	else
+		tag = st0_tag;
+
+	if (tag == TW_NaN) {
+		single_arg_2_error(st0_ptr, TW_NaN);
+		return;
+	} else if (tag == TW_Infinity) {
+		/* The 80486 treats infinity as an invalid operand */
+		if (arith_invalid(0) >= 0) {
+			/* Masked response */
+			push();
+			arith_invalid(0);
+		}
+		return;
+	}
+
+	reg_copy(st0_ptr, &arg);
+	if (!f_sin(st0_ptr, st0_tag)) {
+		push();
+		FPU_copy_to_reg0(&arg, st0_tag);
+		f_cos(&st(0), st0_tag);
+	} else {
+		/* An error, so restore st(0) */
+		FPU_copy_to_reg0(&arg, st0_tag);
+	}
+}
+
+/*---------------------------------------------------------------------------*/
+/* The following all require two arguments: st(0) and st(1) */
+
+/* A lean, mean kernel for the fprem instructions. This relies upon
+   the division and rounding to an integer in do_fprem giving an
+   exact result. Because of this, rem_kernel() needs to deal only with
+   the least significant 64 bits, the more significant bits of the
+   result must be zero.
+ */
+static void rem_kernel(unsigned long long st0, unsigned long long *y,
+		       unsigned long long st1, unsigned long long q, int n)
+{
+	int dummy;
+	unsigned long long x;
+
+	x = st0 << n;
+
+	/* Do the required multiplication and subtraction in the one operation */
+
+	/* lsw x -= lsw st1 * lsw q */
+	asm volatile ("mull %4; subl %%eax,%0; sbbl %%edx,%1":"=m"
+		      (((unsigned *)&x)[0]), "=m"(((unsigned *)&x)[1]),
+		      "=a"(dummy)
+		      :"2"(((unsigned *)&st1)[0]), "m"(((unsigned *)&q)[0])
+		      :"%dx");
+	/* msw x -= msw st1 * lsw q */
+	asm volatile ("mull %3; subl %%eax,%0":"=m" (((unsigned *)&x)[1]),
+		      "=a"(dummy)
+		      :"1"(((unsigned *)&st1)[1]), "m"(((unsigned *)&q)[0])
+		      :"%dx");
+	/* msw x -= lsw st1 * msw q */
+	asm volatile ("mull %3; subl %%eax,%0":"=m" (((unsigned *)&x)[1]),
+		      "=a"(dummy)
+		      :"1"(((unsigned *)&st1)[0]), "m"(((unsigned *)&q)[1])
+		      :"%dx");
+
+	*y = x;
+}
+
+/* Remainder of st(0) / st(1) */
+/* This routine produces exact results, i.e. there is never any
+   rounding or truncation, etc of the result. */
+static void do_fprem(FPU_REG *st0_ptr, u_char st0_tag, int round)
+{
+	FPU_REG *st1_ptr = &st(1);
+	u_char st1_tag = FPU_gettagi(1);
+
+	if (!((st0_tag ^ TAG_Valid) | (st1_tag ^ TAG_Valid))) {
+		FPU_REG tmp, st0, st1;
+		u_char st0_sign, st1_sign;
+		u_char tmptag;
+		int tag;
+		int old_cw;
+		int expdif;
+		long long q;
+		unsigned short saved_status;
+		int cc;
+
+	      fprem_valid:
+		/* Convert registers for internal use. */
+		st0_sign = FPU_to_exp16(st0_ptr, &st0);
+		st1_sign = FPU_to_exp16(st1_ptr, &st1);
+		expdif = exponent16(&st0) - exponent16(&st1);
+
+		old_cw = control_word;
+		cc = 0;
+
+		/* We want the status following the denorm tests, but don't want
+		   the status changed by the arithmetic operations. */
+		saved_status = partial_status;
+		control_word &= ~CW_RC;
+		control_word |= RC_CHOP;
+
+		if (expdif < 64) {
+			/* This should be the most common case */
+
+			if (expdif > -2) {
+				u_char sign = st0_sign ^ st1_sign;
+				tag = FPU_u_div(&st0, &st1, &tmp,
+						PR_64_BITS | RC_CHOP | 0x3f,
+						sign);
+				setsign(&tmp, sign);
+
+				if (exponent(&tmp) >= 0) {
+					FPU_round_to_int(&tmp, tag);	/* Fortunately, this can't
+									   overflow to 2^64 */
+					q = significand(&tmp);
+
+					rem_kernel(significand(&st0),
+						   &significand(&tmp),
+						   significand(&st1),
+						   q, expdif);
+
+					setexponent16(&tmp, exponent16(&st1));
+				} else {
+					reg_copy(&st0, &tmp);
+					q = 0;
+				}
+
+				if ((round == RC_RND)
+				    && (tmp.sigh & 0xc0000000)) {
+					/* We may need to subtract st(1) once more,
+					   to get a result <= 1/2 of st(1). */
+					unsigned long long x;
+					expdif =
+					    exponent16(&st1) - exponent16(&tmp);
+					if (expdif <= 1) {
+						if (expdif == 0)
+							x = significand(&st1) -
+							    significand(&tmp);
+						else	/* expdif is 1 */
+							x = (significand(&st1)
+							     << 1) -
+							    significand(&tmp);
+						if ((x < significand(&tmp)) ||
+						    /* or equi-distant (from 0 & st(1)) and q is odd */
+						    ((x == significand(&tmp))
+						     && (q & 1))) {
+							st0_sign = !st0_sign;
+							significand(&tmp) = x;
+							q++;
+						}
+					}
+				}
+
+				if (q & 4)
+					cc |= SW_C0;
+				if (q & 2)
+					cc |= SW_C3;
+				if (q & 1)
+					cc |= SW_C1;
+			} else {
+				control_word = old_cw;
+				setcc(0);
+				return;
+			}
+		} else {
+			/* There is a large exponent difference ( >= 64 ) */
+			/* To make much sense, the code in this section should
+			   be done at high precision. */
+			int exp_1, N;
+			u_char sign;
+
+			/* prevent overflow here */
+			/* N is 'a number between 32 and 63' (p26-113) */
+			reg_copy(&st0, &tmp);
+			tmptag = st0_tag;
+			N = (expdif & 0x0000001f) + 32;	/* This choice gives results
+							   identical to an AMD 486 */
+			setexponent16(&tmp, N);
+			exp_1 = exponent16(&st1);
+			setexponent16(&st1, 0);
+			expdif -= N;
+
+			sign = getsign(&tmp) ^ st1_sign;
+			tag =
+			    FPU_u_div(&tmp, &st1, &tmp,
+				      PR_64_BITS | RC_CHOP | 0x3f, sign);
+			setsign(&tmp, sign);
+
+			FPU_round_to_int(&tmp, tag);	/* Fortunately, this can't
+							   overflow to 2^64 */
+
+			rem_kernel(significand(&st0),
+				   &significand(&tmp),
+				   significand(&st1),
+				   significand(&tmp), exponent(&tmp)
+			    );
+			setexponent16(&tmp, exp_1 + expdif);
+
+			/* It is possible for the operation to be complete here.
+			   What does the IEEE standard say? The Intel 80486 manual
+			   implies that the operation will never be completed at this
+			   point, and the behaviour of a real 80486 confirms this.
+			 */
+			if (!(tmp.sigh | tmp.sigl)) {
+				/* The result is zero */
+				control_word = old_cw;
+				partial_status = saved_status;
+				FPU_copy_to_reg0(&CONST_Z, TAG_Zero);
+				setsign(&st0, st0_sign);
+#ifdef PECULIAR_486
+				setcc(SW_C2);
+#else
+				setcc(0);
+#endif /* PECULIAR_486 */
+				return;
+			}
+			cc = SW_C2;
+		}
+
+		control_word = old_cw;
+		partial_status = saved_status;
+		tag = FPU_normalize_nuo(&tmp);
+		reg_copy(&tmp, st0_ptr);
+
+		/* The only condition to be looked for is underflow,
+		   and it can occur here only if underflow is unmasked. */
+		if ((exponent16(&tmp) <= EXP_UNDER) && (tag != TAG_Zero)
+		    && !(control_word & CW_Underflow)) {
+			setcc(cc);
+			tag = arith_underflow(st0_ptr);
+			setsign(st0_ptr, st0_sign);
+			FPU_settag0(tag);
+			return;
+		} else if ((exponent16(&tmp) > EXP_UNDER) || (tag == TAG_Zero)) {
+			stdexp(st0_ptr);
+			setsign(st0_ptr, st0_sign);
+		} else {
+			tag =
+			    FPU_round(st0_ptr, 0, 0, FULL_PRECISION, st0_sign);
+		}
+		FPU_settag0(tag);
+		setcc(cc);
+
+		return;
+	}
+
+	if (st0_tag == TAG_Special)
+		st0_tag = FPU_Special(st0_ptr);
+	if (st1_tag == TAG_Special)
+		st1_tag = FPU_Special(st1_ptr);
+
+	if (((st0_tag == TAG_Valid) && (st1_tag == TW_Denormal))
+	    || ((st0_tag == TW_Denormal) && (st1_tag == TAG_Valid))
+	    || ((st0_tag == TW_Denormal) && (st1_tag == TW_Denormal))) {
+		if (denormal_operand() < 0)
+			return;
+		goto fprem_valid;
+	} else if ((st0_tag == TAG_Empty) || (st1_tag == TAG_Empty)) {
+		FPU_stack_underflow();
+		return;
+	} else if (st0_tag == TAG_Zero) {
+		if (st1_tag == TAG_Valid) {
+			setcc(0);
+			return;
+		} else if (st1_tag == TW_Denormal) {
+			if (denormal_operand() < 0)
+				return;
+			setcc(0);
+			return;
+		} else if (st1_tag == TAG_Zero) {
+			arith_invalid(0);
+			return;
+		} /* fprem(?,0) always invalid */
+		else if (st1_tag == TW_Infinity) {
+			setcc(0);
+			return;
+		}
+	} else if ((st0_tag == TAG_Valid) || (st0_tag == TW_Denormal)) {
+		if (st1_tag == TAG_Zero) {
+			arith_invalid(0);	/* fprem(Valid,Zero) is invalid */
+			return;
+		} else if (st1_tag != TW_NaN) {
+			if (((st0_tag == TW_Denormal)
+			     || (st1_tag == TW_Denormal))
+			    && (denormal_operand() < 0))
+				return;
+
+			if (st1_tag == TW_Infinity) {
+				/* fprem(Valid,Infinity) is o.k. */
+				setcc(0);
+				return;
+			}
+		}
+	} else if (st0_tag == TW_Infinity) {
+		if (st1_tag != TW_NaN) {
+			arith_invalid(0);	/* fprem(Infinity,?) is invalid */
+			return;
+		}
+	}
+
+	/* One of the registers must contain a NaN if we got here. */
+
+#ifdef PARANOID
+	if ((st0_tag != TW_NaN) && (st1_tag != TW_NaN))
+		EXCEPTION(EX_INTERNAL | 0x118);
+#endif /* PARANOID */
+
+	real_2op_NaN(st1_ptr, st1_tag, 0, st1_ptr);
+
+}
+
+/* ST(1) <- ST(1) * log ST;  pop ST */
+static void fyl2x(FPU_REG *st0_ptr, u_char st0_tag)
+{
+	FPU_REG *st1_ptr = &st(1), exponent;
+	u_char st1_tag = FPU_gettagi(1);
+	u_char sign;
+	int e, tag;
+
+	clear_C1();
+
+	if ((st0_tag == TAG_Valid) && (st1_tag == TAG_Valid)) {
+	      both_valid:
+		/* Both regs are Valid or Denormal */
+		if (signpositive(st0_ptr)) {
+			if (st0_tag == TW_Denormal)
+				FPU_to_exp16(st0_ptr, st0_ptr);
+			else
+				/* Convert st(0) for internal use. */
+				setexponent16(st0_ptr, exponent(st0_ptr));
+
+			if ((st0_ptr->sigh == 0x80000000)
+			    && (st0_ptr->sigl == 0)) {
+				/* Special case. The result can be precise. */
+				u_char esign;
+				e = exponent16(st0_ptr);
+				if (e >= 0) {
+					exponent.sigh = e;
+					esign = SIGN_POS;
+				} else {
+					exponent.sigh = -e;
+					esign = SIGN_NEG;
+				}
+				exponent.sigl = 0;
+				setexponent16(&exponent, 31);
+				tag = FPU_normalize_nuo(&exponent);
+				stdexp(&exponent);
+				setsign(&exponent, esign);
+				tag =
+				    FPU_mul(&exponent, tag, 1, FULL_PRECISION);
+				if (tag >= 0)
+					FPU_settagi(1, tag);
+			} else {
+				/* The usual case */
+				sign = getsign(st1_ptr);
+				if (st1_tag == TW_Denormal)
+					FPU_to_exp16(st1_ptr, st1_ptr);
+				else
+					/* Convert st(1) for internal use. */
+					setexponent16(st1_ptr,
+						      exponent(st1_ptr));
+				poly_l2(st0_ptr, st1_ptr, sign);
+			}
+		} else {
+			/* negative */
+			if (arith_invalid(1) < 0)
+				return;
+		}
+
+		FPU_pop();
+
+		return;
+	}
+
+	if (st0_tag == TAG_Special)
+		st0_tag = FPU_Special(st0_ptr);
+	if (st1_tag == TAG_Special)
+		st1_tag = FPU_Special(st1_ptr);
+
+	if ((st0_tag == TAG_Empty) || (st1_tag == TAG_Empty)) {
+		FPU_stack_underflow_pop(1);
+		return;
+	} else if ((st0_tag <= TW_Denormal) && (st1_tag <= TW_Denormal)) {
+		if (st0_tag == TAG_Zero) {
+			if (st1_tag == TAG_Zero) {
+				/* Both args zero is invalid */
+				if (arith_invalid(1) < 0)
+					return;
+			} else {
+				u_char sign;
+				sign = getsign(st1_ptr) ^ SIGN_NEG;
+				if (FPU_divide_by_zero(1, sign) < 0)
+					return;
+
+				setsign(st1_ptr, sign);
+			}
+		} else if (st1_tag == TAG_Zero) {
+			/* st(1) contains zero, st(0) valid <> 0 */
+			/* Zero is the valid answer */
+			sign = getsign(st1_ptr);
+
+			if (signnegative(st0_ptr)) {
+				/* log(negative) */
+				if (arith_invalid(1) < 0)
+					return;
+			} else if ((st0_tag == TW_Denormal)
+				   && (denormal_operand() < 0))
+				return;
+			else {
+				if (exponent(st0_ptr) < 0)
+					sign ^= SIGN_NEG;
+
+				FPU_copy_to_reg1(&CONST_Z, TAG_Zero);
+				setsign(st1_ptr, sign);
+			}
+		} else {
+			/* One or both operands are denormals. */
+			if (denormal_operand() < 0)
+				return;
+			goto both_valid;
+		}
+	} else if ((st0_tag == TW_NaN) || (st1_tag == TW_NaN)) {
+		if (real_2op_NaN(st0_ptr, st0_tag, 1, st0_ptr) < 0)
+			return;
+	}
+	/* One or both arg must be an infinity */
+	else if (st0_tag == TW_Infinity) {
+		if ((signnegative(st0_ptr)) || (st1_tag == TAG_Zero)) {
+			/* log(-infinity) or 0*log(infinity) */
+			if (arith_invalid(1) < 0)
+				return;
+		} else {
+			u_char sign = getsign(st1_ptr);
+
+			if ((st1_tag == TW_Denormal)
+			    && (denormal_operand() < 0))
+				return;
+
+			FPU_copy_to_reg1(&CONST_INF, TAG_Special);
+			setsign(st1_ptr, sign);
+		}
+	}
+	/* st(1) must be infinity here */
+	else if (((st0_tag == TAG_Valid) || (st0_tag == TW_Denormal))
+		 && (signpositive(st0_ptr))) {
+		if (exponent(st0_ptr) >= 0) {
+			if ((exponent(st0_ptr) == 0) &&
+			    (st0_ptr->sigh == 0x80000000) &&
+			    (st0_ptr->sigl == 0)) {
+				/* st(0) holds 1.0 */
+				/* infinity*log(1) */
+				if (arith_invalid(1) < 0)
+					return;
+			}
+			/* else st(0) is positive and > 1.0 */
+		} else {
+			/* st(0) is positive and < 1.0 */
+
+			if ((st0_tag == TW_Denormal)
+			    && (denormal_operand() < 0))
+				return;
+
+			changesign(st1_ptr);
+		}
+	} else {
+		/* st(0) must be zero or negative */
+		if (st0_tag == TAG_Zero) {
+			/* This should be invalid, but a real 80486 is happy with it. */
+
+#ifndef PECULIAR_486
+			sign = getsign(st1_ptr);
+			if (FPU_divide_by_zero(1, sign) < 0)
+				return;
+#endif /* PECULIAR_486 */
+
+			changesign(st1_ptr);
+		} else if (arith_invalid(1) < 0)	/* log(negative) */
+			return;
+	}
+
+	FPU_pop();
+}
+
+static void fpatan(FPU_REG *st0_ptr, u_char st0_tag)
+{
+	FPU_REG *st1_ptr = &st(1);
+	u_char st1_tag = FPU_gettagi(1);
+	int tag;
+
+	clear_C1();
+	if (!((st0_tag ^ TAG_Valid) | (st1_tag ^ TAG_Valid))) {
+	      valid_atan:
+
+		poly_atan(st0_ptr, st0_tag, st1_ptr, st1_tag);
+
+		FPU_pop();
+
+		return;
+	}
+
+	if (st0_tag == TAG_Special)
+		st0_tag = FPU_Special(st0_ptr);
+	if (st1_tag == TAG_Special)
+		st1_tag = FPU_Special(st1_ptr);
+
+	if (((st0_tag == TAG_Valid) && (st1_tag == TW_Denormal))
+	    || ((st0_tag == TW_Denormal) && (st1_tag == TAG_Valid))
+	    || ((st0_tag == TW_Denormal) && (st1_tag == TW_Denormal))) {
+		if (denormal_operand() < 0)
+			return;
+
+		goto valid_atan;
+	} else if ((st0_tag == TAG_Empty) || (st1_tag == TAG_Empty)) {
+		FPU_stack_underflow_pop(1);
+		return;
+	} else if ((st0_tag == TW_NaN) || (st1_tag == TW_NaN)) {
+		if (real_2op_NaN(st0_ptr, st0_tag, 1, st0_ptr) >= 0)
+			FPU_pop();
+		return;
+	} else if ((st0_tag == TW_Infinity) || (st1_tag == TW_Infinity)) {
+		u_char sign = getsign(st1_ptr);
+		if (st0_tag == TW_Infinity) {
+			if (st1_tag == TW_Infinity) {
+				if (signpositive(st0_ptr)) {
+					FPU_copy_to_reg1(&CONST_PI4, TAG_Valid);
+				} else {
+					setpositive(st1_ptr);
+					tag =
+					    FPU_u_add(&CONST_PI4, &CONST_PI2,
+						      st1_ptr, FULL_PRECISION,
+						      SIGN_POS,
+						      exponent(&CONST_PI4),
+						      exponent(&CONST_PI2));
+					if (tag >= 0)
+						FPU_settagi(1, tag);
+				}
+			} else {
+				if ((st1_tag == TW_Denormal)
+				    && (denormal_operand() < 0))
+					return;
+
+				if (signpositive(st0_ptr)) {
+					FPU_copy_to_reg1(&CONST_Z, TAG_Zero);
+					setsign(st1_ptr, sign);	/* An 80486 preserves the sign */
+					FPU_pop();
+					return;
+				} else {
+					FPU_copy_to_reg1(&CONST_PI, TAG_Valid);
+				}
+			}
+		} else {
+			/* st(1) is infinity, st(0) not infinity */
+			if ((st0_tag == TW_Denormal)
+			    && (denormal_operand() < 0))
+				return;
+
+			FPU_copy_to_reg1(&CONST_PI2, TAG_Valid);
+		}
+		setsign(st1_ptr, sign);
+	} else if (st1_tag == TAG_Zero) {
+		/* st(0) must be valid or zero */
+		u_char sign = getsign(st1_ptr);
+
+		if ((st0_tag == TW_Denormal) && (denormal_operand() < 0))
+			return;
+
+		if (signpositive(st0_ptr)) {
+			/* An 80486 preserves the sign */
+			FPU_pop();
+			return;
+		}
+
+		FPU_copy_to_reg1(&CONST_PI, TAG_Valid);
+		setsign(st1_ptr, sign);
+	} else if (st0_tag == TAG_Zero) {
+		/* st(1) must be TAG_Valid here */
+		u_char sign = getsign(st1_ptr);
+
+		if ((st1_tag == TW_Denormal) && (denormal_operand() < 0))
+			return;
+
+		FPU_copy_to_reg1(&CONST_PI2, TAG_Valid);
+		setsign(st1_ptr, sign);
+	}
+#ifdef PARANOID
+	else
+		EXCEPTION(EX_INTERNAL | 0x125);
+#endif /* PARANOID */
+
+	FPU_pop();
+	set_precision_flag_up();	/* We do not really know if up or down */
+}
+
+static void fprem(FPU_REG *st0_ptr, u_char st0_tag)
+{
+	do_fprem(st0_ptr, st0_tag, RC_CHOP);
+}
+
+static void fprem1(FPU_REG *st0_ptr, u_char st0_tag)
+{
+	do_fprem(st0_ptr, st0_tag, RC_RND);
+}
+
+static void fyl2xp1(FPU_REG *st0_ptr, u_char st0_tag)
+{
+	u_char sign, sign1;
+	FPU_REG *st1_ptr = &st(1), a, b;
+	u_char st1_tag = FPU_gettagi(1);
+
+	clear_C1();
+	if (!((st0_tag ^ TAG_Valid) | (st1_tag ^ TAG_Valid))) {
+	      valid_yl2xp1:
+
+		sign = getsign(st0_ptr);
+		sign1 = getsign(st1_ptr);
+
+		FPU_to_exp16(st0_ptr, &a);
+		FPU_to_exp16(st1_ptr, &b);
+
+		if (poly_l2p1(sign, sign1, &a, &b, st1_ptr))
+			return;
+
+		FPU_pop();
+		return;
+	}
+
+	if (st0_tag == TAG_Special)
+		st0_tag = FPU_Special(st0_ptr);
+	if (st1_tag == TAG_Special)
+		st1_tag = FPU_Special(st1_ptr);
+
+	if (((st0_tag == TAG_Valid) && (st1_tag == TW_Denormal))
+	    || ((st0_tag == TW_Denormal) && (st1_tag == TAG_Valid))
+	    || ((st0_tag == TW_Denormal) && (st1_tag == TW_Denormal))) {
+		if (denormal_operand() < 0)
+			return;
+
+		goto valid_yl2xp1;
+	} else if ((st0_tag == TAG_Empty) | (st1_tag == TAG_Empty)) {
+		FPU_stack_underflow_pop(1);
+		return;
+	} else if (st0_tag == TAG_Zero) {
+		switch (st1_tag) {
+		case TW_Denormal:
+			if (denormal_operand() < 0)
+				return;
+			fallthrough;
+		case TAG_Zero:
+		case TAG_Valid:
+			setsign(st0_ptr, getsign(st0_ptr) ^ getsign(st1_ptr));
+			FPU_copy_to_reg1(st0_ptr, st0_tag);
+			break;
+
+		case TW_Infinity:
+			/* Infinity*log(1) */
+			if (arith_invalid(1) < 0)
+				return;
+			break;
+
+		case TW_NaN:
+			if (real_2op_NaN(st0_ptr, st0_tag, 1, st0_ptr) < 0)
+				return;
+			break;
+
+		default:
+#ifdef PARANOID
+			EXCEPTION(EX_INTERNAL | 0x116);
+			return;
+#endif /* PARANOID */
+			break;
+		}
+	} else if ((st0_tag == TAG_Valid) || (st0_tag == TW_Denormal)) {
+		switch (st1_tag) {
+		case TAG_Zero:
+			if (signnegative(st0_ptr)) {
+				if (exponent(st0_ptr) >= 0) {
+					/* st(0) holds <= -1.0 */
+#ifdef PECULIAR_486		/* Stupid 80486 doesn't worry about log(negative). */
+					changesign(st1_ptr);
+#else
+					if (arith_invalid(1) < 0)
+						return;
+#endif /* PECULIAR_486 */
+				} else if ((st0_tag == TW_Denormal)
+					   && (denormal_operand() < 0))
+					return;
+				else
+					changesign(st1_ptr);
+			} else if ((st0_tag == TW_Denormal)
+				   && (denormal_operand() < 0))
+				return;
+			break;
+
+		case TW_Infinity:
+			if (signnegative(st0_ptr)) {
+				if ((exponent(st0_ptr) >= 0) &&
+				    !((st0_ptr->sigh == 0x80000000) &&
+				      (st0_ptr->sigl == 0))) {
+					/* st(0) holds < -1.0 */
+#ifdef PECULIAR_486		/* Stupid 80486 doesn't worry about log(negative). */
+					changesign(st1_ptr);
+#else
+					if (arith_invalid(1) < 0)
+						return;
+#endif /* PECULIAR_486 */
+				} else if ((st0_tag == TW_Denormal)
+					   && (denormal_operand() < 0))
+					return;
+				else
+					changesign(st1_ptr);
+			} else if ((st0_tag == TW_Denormal)
+				   && (denormal_operand() < 0))
+				return;
+			break;
+
+		case TW_NaN:
+			if (real_2op_NaN(st0_ptr, st0_tag, 1, st0_ptr) < 0)
+				return;
+		}
+
+	} else if (st0_tag == TW_NaN) {
+		if (real_2op_NaN(st0_ptr, st0_tag, 1, st0_ptr) < 0)
+			return;
+	} else if (st0_tag == TW_Infinity) {
+		if (st1_tag == TW_NaN) {
+			if (real_2op_NaN(st0_ptr, st0_tag, 1, st0_ptr) < 0)
+				return;
+		} else if (signnegative(st0_ptr)) {
+#ifndef PECULIAR_486
+			/* This should have higher priority than denormals, but... */
+			if (arith_invalid(1) < 0)	/* log(-infinity) */
+				return;
+#endif /* PECULIAR_486 */
+			if ((st1_tag == TW_Denormal)
+			    && (denormal_operand() < 0))
+				return;
+#ifdef PECULIAR_486
+			/* Denormal operands actually get higher priority */
+			if (arith_invalid(1) < 0)	/* log(-infinity) */
+				return;
+#endif /* PECULIAR_486 */
+		} else if (st1_tag == TAG_Zero) {
+			/* log(infinity) */
+			if (arith_invalid(1) < 0)
+				return;
+		}
+
+		/* st(1) must be valid here. */
+
+		else if ((st1_tag == TW_Denormal) && (denormal_operand() < 0))
+			return;
+
+		/* The Manual says that log(Infinity) is invalid, but a real
+		   80486 sensibly says that it is o.k. */
+		else {
+			u_char sign = getsign(st1_ptr);
+			FPU_copy_to_reg1(&CONST_INF, TAG_Special);
+			setsign(st1_ptr, sign);
+		}
+	}
+#ifdef PARANOID
+	else {
+		EXCEPTION(EX_INTERNAL | 0x117);
+		return;
+	}
+#endif /* PARANOID */
+
+	FPU_pop();
+	return;
+
+}
+
+static void fscale(FPU_REG *st0_ptr, u_char st0_tag)
+{
+	FPU_REG *st1_ptr = &st(1);
+	u_char st1_tag = FPU_gettagi(1);
+	int old_cw = control_word;
+	u_char sign = getsign(st0_ptr);
+
+	clear_C1();
+	if (!((st0_tag ^ TAG_Valid) | (st1_tag ^ TAG_Valid))) {
+		long scale;
+		FPU_REG tmp;
+
+		/* Convert register for internal use. */
+		setexponent16(st0_ptr, exponent(st0_ptr));
+
+	      valid_scale:
+
+		if (exponent(st1_ptr) > 30) {
+			/* 2^31 is far too large, would require 2^(2^30) or 2^(-2^30) */
+
+			if (signpositive(st1_ptr)) {
+				EXCEPTION(EX_Overflow);
+				FPU_copy_to_reg0(&CONST_INF, TAG_Special);
+			} else {
+				EXCEPTION(EX_Underflow);
+				FPU_copy_to_reg0(&CONST_Z, TAG_Zero);
+			}
+			setsign(st0_ptr, sign);
+			return;
+		}
+
+		control_word &= ~CW_RC;
+		control_word |= RC_CHOP;
+		reg_copy(st1_ptr, &tmp);
+		FPU_round_to_int(&tmp, st1_tag);	/* This can never overflow here */
+		control_word = old_cw;
+		scale = signnegative(st1_ptr) ? -tmp.sigl : tmp.sigl;
+		scale += exponent16(st0_ptr);
+
+		setexponent16(st0_ptr, scale);
+
+		/* Use FPU_round() to properly detect under/overflow etc */
+		FPU_round(st0_ptr, 0, 0, control_word, sign);
+
+		return;
+	}
+
+	if (st0_tag == TAG_Special)
+		st0_tag = FPU_Special(st0_ptr);
+	if (st1_tag == TAG_Special)
+		st1_tag = FPU_Special(st1_ptr);
+
+	if ((st0_tag == TAG_Valid) || (st0_tag == TW_Denormal)) {
+		switch (st1_tag) {
+		case TAG_Valid:
+			/* st(0) must be a denormal */
+			if ((st0_tag == TW_Denormal)
+			    && (denormal_operand() < 0))
+				return;
+
+			FPU_to_exp16(st0_ptr, st0_ptr);	/* Will not be left on stack */
+			goto valid_scale;
+
+		case TAG_Zero:
+			if (st0_tag == TW_Denormal)
+				denormal_operand();
+			return;
+
+		case TW_Denormal:
+			denormal_operand();
+			return;
+
+		case TW_Infinity:
+			if ((st0_tag == TW_Denormal)
+			    && (denormal_operand() < 0))
+				return;
+
+			if (signpositive(st1_ptr))
+				FPU_copy_to_reg0(&CONST_INF, TAG_Special);
+			else
+				FPU_copy_to_reg0(&CONST_Z, TAG_Zero);
+			setsign(st0_ptr, sign);
+			return;
+
+		case TW_NaN:
+			real_2op_NaN(st1_ptr, st1_tag, 0, st0_ptr);
+			return;
+		}
+	} else if (st0_tag == TAG_Zero) {
+		switch (st1_tag) {
+		case TAG_Valid:
+		case TAG_Zero:
+			return;
+
+		case TW_Denormal:
+			denormal_operand();
+			return;
+
+		case TW_Infinity:
+			if (signpositive(st1_ptr))
+				arith_invalid(0);	/* Zero scaled by +Infinity */
+			return;
+
+		case TW_NaN:
+			real_2op_NaN(st1_ptr, st1_tag, 0, st0_ptr);
+			return;
+		}
+	} else if (st0_tag == TW_Infinity) {
+		switch (st1_tag) {
+		case TAG_Valid:
+		case TAG_Zero:
+			return;
+
+		case TW_Denormal:
+			denormal_operand();
+			return;
+
+		case TW_Infinity:
+			if (signnegative(st1_ptr))
+				arith_invalid(0);	/* Infinity scaled by -Infinity */
+			return;
+
+		case TW_NaN:
+			real_2op_NaN(st1_ptr, st1_tag, 0, st0_ptr);
+			return;
+		}
+	} else if (st0_tag == TW_NaN) {
+		if (st1_tag != TAG_Empty) {
+			real_2op_NaN(st1_ptr, st1_tag, 0, st0_ptr);
+			return;
+		}
+	}
+#ifdef PARANOID
+	if (!((st0_tag == TAG_Empty) || (st1_tag == TAG_Empty))) {
+		EXCEPTION(EX_INTERNAL | 0x115);
+		return;
+	}
+#endif
+
+	/* At least one of st(0), st(1) must be empty */
+	FPU_stack_underflow();
+
+}
+
+/*---------------------------------------------------------------------------*/
+
+static FUNC_ST0 const trig_table_a[] = {
+	f2xm1, fyl2x, fptan, fpatan,
+	fxtract, fprem1, (FUNC_ST0) fdecstp, (FUNC_ST0) fincstp
+};
+
+void FPU_triga(void)
+{
+	(trig_table_a[FPU_rm]) (&st(0), FPU_gettag0());
+}
+
+static FUNC_ST0 const trig_table_b[] = {
+	fprem, fyl2xp1, fsqrt_, fsincos, frndint_, fscale, fsin, fcos
+};
+
+void FPU_trigb(void)
+{
+	(trig_table_b[FPU_rm]) (&st(0), FPU_gettag0());
+}
author	Linus Torvalds <torvalds@linux-foundation.org>	2023-02-21 18:24:12 -0800
committer	Linus Torvalds <torvalds@linux-foundation.org>	2023-02-21 18:24:12 -0800
commit	5b7c4cabbb65f5c469464da6c5f614cbd7f730f2 (patch)
tree	cc5c2d0a898769fd59549594fedb3ee6f84e59a0 /arch/x86/math-emu/fpu_trig.c
download	linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.tar.gz linux-5b7c4cabbb65f5c469464da6c5f614cbd7f730f2.zip