hckrnws
I really appreciated this piece. Thank you to OP for writing and submitting it.
The thing that piqued my interest was the side remark that the Dirac delta is a “distribution“, and that this is an unfortunate name clash with the same concept in probability (measure theory).
My training (in EE) used both Dirac delta “functions” (in signal processing) and distributions in the sense of measure theory (in estimation theory). Really two separate forks of coursework.
I had always thought that the use of delta functions in convolution integrals (signal processing) was ultimately justified by measure theory — the same machinery as I learned (with some effort) when I took measure theoretic probability.
But, as flagged by the OP, that is not the case! Mind blown.
Some of this is the result of the way these concepts are taught. There is some hand waving both in signal processing, and in estimation theory, when these difficult functions and integrals come up.
I’m not aware of signal processing courses (probably graduate level) in which convolution against delta “functions” uses the distribution concept. There are indeed words to the effect of either,
- Dirac delta is not a function, but think of it as a limit of increasingly-concentrated Gaussians;
- use of Dirac delta is ok, because we don’t need to represent it directly, only the result of an inner product against a smooth function (i.e., a convolution)
But these excuses are not rigorously justified, even at the graduate level, in my experience.
*
Separately from that, I wonder if OP has ever seen the book Radically Elementary Probability Theory, by Edward Nelson (https://web.math.princeton.edu/~nelson/books/rept.pdf). It uses nonstandard analysis to get around a lot of the (elegant) fussiness of measure theory.
The preface alone is fun to read.
> But these excuses are not rigorously justified, even at the graduate level, in my experience.
Imo, the informal use is already pretty close to the formal definition. Formally, a distribution is defined purely by its inner products against certain smooth functions (usually the ones with compact support) which is what the OP alluded to when he said:
> The formal definition of a generalized function is: an element of the continuous dual space of a space of smooth functions.
That "element of the continuous dual space" is just a function that takes in a smooth function with compact support f, and returns what we take to be the inner product of f with our generalized function.
So (again, imo) "we don’t need to represent it directly, only the result of an inner product against a smooth function" isn't that distant to the formal definition.
I hear you, and I admit I'm drawing a fuzzy line (is the conventional approach “rigorous”).
Here are two “test functions”-
- we learned much about impulse responses, and sometimes considered responses to dipoles, etc. However, if I read the Wikipedia article correctly (it’s not great…), the theory implies that a distribution (in the technical sense) has derivatives of any order. I’m not sure I really knew that I could count on that. A rigorous treatment would have given me that assurance.
- if I understand correctly, the concept of introducing an impulse to a system that has an identity impulse response, which implies an inner product of delta with itself, is not well-defined. Again, I’m not sure if we covered that concept. (Admittedly, it’s been a long time.)
oops, I realize I completely mis-stated the second point. What it should say is:
- If delta(x) is OK, why is delta^2(x) not OK?
While the limit of increasingly concentrated Gaussian's does result in a Dirac delta, but it is not the only way the Dirac delta comes about and is probably not the correct way to think about it in the context of signal processing.
When we are doing signal processing the Dirac delta primarily comes about as the Fourier transform of a constant function, and if you work out the math this is roughly equivalent to a sinc function where the oscillations become infinitely fast. This distinction is important because the concentrated Gaussian limit has the function going to 0 as we move away from the origin, but the sinc function never goes to 0, it just oscillates really fast. This becomes a Dirac delta because any integral of a function multiplied by this sinc function has cancelling components from the fast oscillations.
The poor behavior of this limit (primarily numerically) is the closely related to the reasons why we have things like Gibbs phenomenon.
Comment was deleted :(
The Dirac delta is a unitary vector when represented on a vectorial basis it's a component of.
I don't know what kind of justification you expect. There's a Dirac delta sized "hole" on linear algebra, that mathematicians need a name for. It's not like we can just leave it there, unfilled.
Comment was deleted :(
Thanks! And yeah I’m familiar with Nelson
Differentiation turns out to be a deeper subject than most people expect even if you just stick to the ordinary real numbers rather than venturing into things like hyperreals.
I once saw in an elementary calculus book a note after the proof of a theorem about differentiation that the converse of the theorem was also true but needed more advanced techniques than were covered in the book.
I checked the advanced calculus and real analysis books I had and they didn't have the proof.
I then did some searching and found mention of a book titled "Differentiation" (or something similar) and found a site that had scans for the first chapter of that book. It proved the theorem on something like page 6 and I couldn't understand it at all. Starting from the beginning I think I got through maybe a page or two before it got to my deep with my mere bachelor's degree in mathematics level of preparation.
I kind of wish I'd bought a copy of that book. I've never since been able to find it. I've found other books with the same or similar title but they weren't it.
Do you remember what the theorem was?
Nope.
One minor nit: A function can be differentiable at a and discontinuous at a even with the standard definition of the derivative. A trivial example would be the function f(x) = (x²-1)/(x-1) which is undefined at x=1, but f'(1)=1 (in fact derivatives have exactly this sort of discontinuity in them which is why they’re defined via limits). In complex analysis, this sort of “hole” in the function is called a removable singularity¹ which is one of three types of singularities that show up in complex functions.
⸻
1. Yes, this is mathematically the reason why black holes are referred to as singularities.
I'm not understanding what you're saying. The standard definition of the derivative of f at c is
f'(c) = lim_{h → 0} (f(c + h) - f(c))/h
The definition would not make sense if f wasn't defined at c (note the "f(c)" in the numerator). For instance, it can't be applied to your f(x) = (x² - 1)/(x - 1) at x = 1, because f(1) is not defined.
And it's a standard result (even stated in Calc 1 classes) that if a function is differentiable at a point, then it's continuous there. For example:
5.2 Theorem. Let f be defined on [a, b]. If f is differentiable at a point x ∈ [a, b], then f is continuous at x.
(Walter Rudin, "Principles of Mathematical Analysis", 3rd edition, p. 104)
Or:
Theorem 2.1 If f is differentiable at x = a, then f is continuous at x = a.
(Robert Smith and Roland Minton, "Calculus -Early Transcendentals", 4th edition, p. 140)
It's true that your f(x) = (x² - 1)/(x - 1) has a removable discontinuity at x = 1, since if we define g(x) = f(x) for x ≠ 1 and g(1) = 2, then g is continuous. Was this what you meant?
This is correct. You cannot have a discontinuity with any accepted definition of a derivative (and your definition is explicit about this: the value f(c) must exist). Just allowing the limits on both sides to be equal already has a mathematical definition which is that of a functional limit, the function in this case being (f(x) - flim(c))/ (x-c) where flim(c) is the value of a (different) functional limit of f(x): x->c (as f(c) doesn't exist).
and yes, by defining a new function with that hole explicitly filled in with a defined value to make it continuous is the typical prescription. It does not imply the derivative exists for the other function as the other post posits.
https://en.m.wikipedia.org/wiki/Classification_of_discontinu... is responsive and quite accessible. It notes that there doesn't have to be an undefined point for a function to be discontinuous (and that terminology often conflates the two), and matches what I recall of determining that if the limit of the derivative from both sides of the discontinuity exists and is equal, the derivative exists.
> ... there doesn't have to be an undefined point for a function to be discontinuous.
That's right. In the example f(x) = (x² - 1)/(x - 1) for x ≠ 1, if we further define f(1) = 0, the function is now defined at x = 1, but discontinuous there.
> ... if the limit of the derivative from both sides of the discontinuity exists and is equal, the derivative exists.
(You probably mean "both sides of the point", since if there's a discontinuity there the derivative can't exist.) Your point that, if the left and right-hand limits both exist and are equal, then the derivative exists (and equals their common value) is true for all limits.
Also, there's a difference between the use of the word "continuous" in calc courses and in topology. In calc courses where functions tend to take real numbers to real numbers, a function may be said to be "not continuous" at a point where it isn't defined. So f(x) = 1/(x - 2) is "not continuous at 2". But in topology, you only consider continuity for points in the domain of the function. So since the (natural) domain of f(x) = 1/(x - 2) is x ≠ 2, the function is continuous everywhere (that it's defined).
I was actually aiming for the situation where a function is defined on all reals but still discontinuous (e.g. the piecewise function in the wiki article for the removable discontinuity). So there's a discontinuity (x=1), however the function is defined everywhere.
The standard definition of a derivative c involves the assumption that f is defined at c.
However, you could also (probably) define the derivative as lim_{h->0} (f(c+h) - f(c-h))/2h, so without needing f(c) to be defined. But that's not standard.
> However, you could also (probably) define the derivative as lim_{h->0} (f(c+h) - f(c-h))/2h, so without needing f(c) to be defined. But that's not standard.
Although this gives the right answer whenever f is differentiable at c, it can wrongly think that a function is differentiable when it isn't, as for the absolute-value function at c = 0.
Good point. So this is probably one of the reasons why the version I stated isn't used.
It is used, just with the caveat in mind that it may exist when the derivative doesn't. It is usually called the symmetric derivative (https://en.wikipedia.org/wiki/Symmetric_derivative).
Comment was deleted :(
Comment was deleted :(
> this sort of “hole” in the function is called a removable singularity
It's called "removable" because it can be removed by a continuous extension - the original function itself is still formally discontinuous (of course, one would often "morally" treat these as the same function, but strictly speaking they're not). An important theorem in complex analysis is that any continuous extension at a single point is automatically a holomorphic (= complex differentiable) extension too.
I don't think it makes sense to allow derivatives of a function f to have a larger domain than the domain of f.
>which is why they’re defined via limits
They're defined via studying f(x+h) - f(x) with a limit h -> 0. But, your example is taking two limits, h->0 and x->1, simultaneously. This is not the same thing.
You are wrong. In order for you to make sense of what you are saying, you first must REDEFINE f(x) to be f(x) = (x^2 - 1)(x - 1) when x != 1 and define f(1) = 2. Of course, then f will be continuous at x = 1 also.
A function is continuous at x = a if it is differentiable at x = a.
You do understand the concept, but your precision in the definitions is lacking.
I think you can get a generalisation of autodiff using this idea of "nonstandard real numbers": You just need a computable field with infinitesimals in it. The Levi-Civita field looks especially convenient because it's real-closed. You might be able to get an auto-limit algorithm from it by evaluating a program infinitely close to a limit. I'm not sure if there's a problem with numerical stability when something like division by infinitesimals gets done. Does this have something to do with how Mathematica and other CASes take limits of algebraic expressions?
-----
Concerning the Dirac delta example: I think this is probably a pleasant way of using a sequence of better and better approximations to the Dirac delta. Terry Tao has some nice blog posts where he shows that a lot of NSA can be translated into sequences, either in a high-powered way using ultrafilters, or in an elementary way using passage to convergent subsequences where necessary.
An interesting question is: What does distribution theory really accomplish? Why is it useful? I have an idea myself but I think it's an interesting question.
> I think this is probably a pleasant way of using a sequence of better and better approximations to the Dirac delta.
That can give wrong answers because derivative of the limit is not always the limit of the derivative.
When modeling phenomena with Dirac delta, I think the question becomes do I really need a discontinuity to have a useful model or can I get away with smoothening the discontinuity out.
Distribution theory has lots of applications in physics. The charge density of a point particle is the delta function.
Also when Fourier transforming over the whole real line (not just an interval where the function is periodic), one has identities that involve delta functions. E.g. \int dx e^(i * k1 * x) e^(-i * k2 * x) = 2 * pi * delta (k1 - k2).
The article showed that Dirac deltas could be defined WITHOUT distributions. You ignored the article when answering my question.
The question is why distribution theory is a particularly good approach to notions like the Dirac delta.
That's fascinating about charge density of a particle being a dirac delta function. Is that a mathematical convenience or something deeper in the theory?
Well, if we assume that a point particle is an infinitely small thing with all of its charge concentrated in one point, the dirac delta function is obviously the correct way to describe that. Of course, there is not really a way to find out whether that is true. Still, the delta function makes sense if something is so small that we do not know its size. This idealization has, however, led to problems in classical electrodynamics: https://en.wikipedia.org/wiki/Abraham%E2%80%93Lorentz_force. Search for 'preacceleration' in this page. This particular problem was ultimately solved by realizing that in this context quantum electrodynamics is the theory to applies. But, then again, using point particles also causes problems, namely the need to renormalize the theory which is or is not a problem depending on your point of view.
Thanks a bunch for pointing me towards Levi-Civita field. Where can I learn more ? Any pedagogic text ?
See my code at the end. The Wikipedia article is pretty good too. I can send you more if you like.
Found it, thanks.
I've personally always thought of the Dirac delta function as being the limit of a Gaussian with variance approaching 0. From this perspective, the Heaviside step function is a limit of the error function. I feel the error function and logistic function approaches should be equivalent, though I haven't worked through to math to show it rigorously.
All these would be infinitely close in the nonstandard characterization. I just picked logistic because it was easy and step is discontinuous so it shows off the approach’s power. If I started with delta instead I would have done Gaussian and integrated that and ended up with erf.
It is, in a way. The whole point of distributions is to extend the space of functions to one where more operations are permitted.
The limit of the Gaussian function as variance goes to 0 is not a function, but it is a distribution, the Dirac distribution.
Some distributions appear in intermediate steps while solving differential equations, and then disappear in the final solution. This is analogous to complex numbers sometimes appearing while computing the roots of a cubic function, but not being present in the roots themselves.
Hm. Back when I was working on game physics engines this might have been useful.
In impulse/constraint mechanics, when two objects collide, their momentum changes in zero time. An impulse is an infinite force applied over zero time with finite energy transfer. You have to integrate over that to get the new velocity. This is done as a special case. It is messy for multi-body collisions, and is hard to make work with a friction model. This is why large objects in video games bounce like small ones, changing direction in zero time.
I wonder if nonstandard analysis might help.
The following is just my opinion:
Integration can be done with its own special arithmetic: Interval arithmetic. I base this suggestion on the fact that this is apparently the only way of automatically getting error bounds on integrals. It's cool that it works.
NSA does not work with a computable field so it's not directly useful. But at the end of the article, there's a link to some code that uses the Levi-Civita field, which is a "nice" approximation to NSA because it's computable and still real-closed. You might be able to do an "auto-limit" using it, in a kind of generalisation of automatic differentiation. This might for instance turn one numerical algorithm, like Householder QR, into another one, like Gaussian elimination, by taking an appropriate limit.
I don't know if these two things interact well in practice: Levi-Civita for algebraic limits and interval arithmetic for integrals. They might! This might suggest rather provocatively that integration is only clumsily interpreted as a limit of some function. Finally tbh, I'm not sure if this is the best solution to the friction/collision detection problem you're describing.
Making it work in finite but short time should fix that. A large object generally can deform a larger distance. This makes all collisions inelastic, with large ones being different than small ones.
If you can get realistic billiards breaks, you're on the right track.
Nonstandard analysis is the mathematical description of your special case. Same thing.
Wow, it never occurred to me that the step function and the dirac delta are related in this way! but now that i see it, it's obvious!
I've never learnt this level of maths formally, but it's been an interest of mine on and off. And this post explained it very well, and pretty understandably for the laymen.
> The Number of Pieces an Integral is Cut Into
> You’re probably familiar with the idea that each piece has infinitesimal width, but what about the question of ‘how MANY pieces are there?’. The answer to that is a hypernatural number. Let’s call it N again.
Is that right? I thought there was an important theorem specifying that no matter the infinitesimal width of an integral slice, the total area will be in the neighborhood of (= infinitely close to) the same real number, which is the value of the integral. That's why we don't have to specify the value of dx when integrating over dx... right?
The number N in question will adjust with dx (up to infinitesimal error anyway). So if dx is halved, N will double. But both retain their character as infinitesimal and hyperfinite.
But they don't retain their status as hypernaturals! dx does not need to evenly divide the interval over which the integral is taken. Whenever it doesn't, the number of slices in the integral will fail to be a hypernatural number, because one of the slices will extend beyond the interval boundary.
The theorem tells us that the area of the extended interval that uses a hypernatural number of slices has the same real part as the area of the exact interval. It doesn't tell us that the exact interval contains a hypernatural number of slices.
That's what "up to an infinitesimal error anyway" meant.
Yes, but that's also the entire thing I was questioning. The essay says that an integral necessarily contains a hypernatural number of infinitesimal slices. I don't think that's true.
It is an interesting piece but to claim that no heavy machinery is used is a bit disingenuous at best. You have defined some purely algebraic operation “differentiation”. This operation involves a choice of infinitesimal. Is it trivial to show that the definition is independent of infinitesimal? especially if we are deriving at a hyperreal point? I doubt it and likely you would need to do more complicated set theoretic limits rather analytic limits. How do you calculate the integral of this function? Or even define it? Or rather functions, since it’s an infinite family of logistic functions? To even properly define this space you need to go quite heavily into set theory and i doubt many would find it simpler, even than working with distributions
The machinery of mathematics goes arbitrarily deep. I think the interesting thing here is that with relatively little training you can start to compute with these numbers, which is definitely not the case with analysis on distributions.
Or put differently - here you can kinda ignore the deeper formalities and still be productive, whereas with distributions you actually need to sit down and pore over them before you can do anything.
That said, I'm curious why infinitesmals never took off in physics. This kind of quick, shut-up-and-calculate approach seems right up their alley.
> I think the interesting thing here is that with relatively little training you can start to compute with these numbers, which is definitely not the case with analysis on distributions.
I don’t know, this feels like a math “hold my beer” moment. Math is infinitely deep and interconnected, but you have to start somewhere, on solid ground.
I was not being facetious above - the issues that i mentioned above are actual problems when you make calculations. But let’s ignore those issues for a second.
So you found the “derivative” of a single, arbitrary chosen representative of an infinite family of functions. What if you chose (tanh(Nx)+1)/2? What if you chose Logistic(N^2 x) instead of Logistic(N x)? You’d get different derivatives. In fact any function (up to additive constant) whose integral of the neighborhood of 0 is 1 would work there. What use are the values you are calculating if they reflect your choice and not anything inherent to the problem?
As for distributions, i picked up and read a small 100 page penguin “leaflet” from the library during my undergrad that went through the subject rigorously (and with plenty of examples). It’s not that different from working rigorously with probability or real analysis. And at the end, in applications we indeed are usually interested in integrals, not derivatives which we have not even defined. At the end of the day, you have a [X=weak L^infinity(R)] function (heavyside). You look at the dual space and since we established don't really need the deep theory, believe me when i tell you that the correct space is the space of test function on R (X’=infinitely smooth, compact support, bounded integral). Each of those conditions is simple for our simple example of R. The inner product is via integral.
Formally speaking elements of X are equivalence classes of sequences of functions and are not really defined pointwise, but neither was the NSA example. There we had to choose an arbitrary representative hyperreal function and here we may identify pointwise defined functions with the classes of the constant sequences of those functions.
using integration by parts it is simple to show that <F,G’> = <F’,G> if F is continuously differentiable on G’s support. Let us formally define in this way the weak derivative for functions that are not traditionally differentiable, if such an element exists an is unique that satisfies all the integral relations. However note that differentiation is an linear isomorphism on the space of test functions and so weak derivative indeed exists and is unique. Furthermore
We can also define elements of X poinwise by identifying F(x) with the limit <Txn,F> as n grows if it exists and is independent of the sequence Txn where Txn is a sequence of functions with support tending to {x} and constant integral 1. It is a simple exercise to show that for “normal” functions this holds, and by above we can poinwise define derivatives this way as well.
What about our H(x)? it is an exercise to check that by pointwise we get what we should outside of 0. What about the derivative at 0? Well, do the exercise above with <T0n’,H> and we see that it is penrose undefined. Decidedly not even necessarily infinite, just undefined. However, integration by parts shows that <T,DH>=T(0) ie dirac delta at 0.
Aside from all the theory that i kinda gave handwavingly much like OP in the post, the mechanics are simple integration by parts to get the only stuff that’s “real” here, which are the integrals. in NSA we haven’t even defined those. How will knowing what infinity i will get at 0 given an arbitrarily chosen representative for H help me?
Do your results depend on ZFC? stronger axioms? At what level of infinity do we stop? You can brush aside the formalities but then what better is this approach than physicists?
> So you found the “derivative” of a single, arbitrary chosen representative of an infinite family of functions. What if you chose (tanh(Nx)+1)/2? What if you chose Logistic(N^2 x) instead of Logistic(N x)? You’d get different derivatives.
They all differ pointwise by an infinitesmal! This flexibility is a feature, not a bug.
Of course, I agree with you that mathematical progress should proceed on rigorous grounds, and there is a lot to be proven here. But my point is mainly that this is so easy you can just go and see what happens in these cases yourself without much trouble. For applications you really don't have to care.
I've done a course in analysis that covered distributions, but your reply made me chuckle. You've told me that distributions are just as simple and then proceeded to dump paragraphs of jargon at me. L^infinity? Dual space? Support? Penrose defined? Inner product via integrals?
(to be clear, I know what you're talking about, but our hypothetical high school student will have a lot more luck moving infinitesmals around, I guarantee it)
> Do your results depend on ZFC? stronger axioms? At what level of infinity do we stop? You can brush aside the formalities but then what better is this approach than physicists?
For physics, just pick a representative, compute and do some sanity checks. Who is ZFC? =P
Just kidding. Anyway, as you probably know the deeper theory of nonstandard analysis has been worked out in detail already, so if you want to get stuck in the weeds there are answers out there.
This answer is my favorite. You got it =)
> They all differ pointwise by an infinitesmal! This flexibility is a feature, not a bug.
No, they do not differ by an infinitesimal. You picked an arbitrary infinite N and found the derivative to be N/4. What if you picked N^2? or 2^N? or some upper limit set whose existence is stronger than choice? You get a different derivative every time and they all differ by an infinity between them. Good luck explaining that to high school students.
Moreover, working with equivalence relations is never a feature of any theory. Having to prove independence from representative at every step is not a feature, as you clearly demonstrate by making the mistake above.
> I've done a course in analysis that covered distributions, but your reply made me chuckle. You've told me that distributions are just as simple and then proceeded to dump paragraphs of jargon at me. L^infinity? Dual space? Support? Penrose defined? Inner product via integrals?
All concepts that are simple to define and understand. Majority of physicists likely understand well. Those that don’t, could.
Paragraphs of jargon? i’ve rigorously proven and justified my further assertions, at a similar level to OP and above what i’ve seen in some physics lectures.
I deliberately decided to avoid defining the above “jargon” terms after considering doing that to avoid extending the already long comment. I decided this because they are simple and a curious mind could quickly understand them by browsing wikipedia.
You’re welcome to ignore them and to go and compute derivatives and integrals just as mechanistically as in NSA (and I repeat, we haven’t even mentioned integrals in NSA. Good luck defining what are measurable functions on the hyperreals to your hypothetical AP high school students).
And to boot we never have to deal with any quantities that are not real measurable numbers. Anything we care about we can compute, more easily (integrals? integrals??) this way.
That is not to say that this isn’t an interesting theory that should be studied - just that it is quite the opposite of a “simplified” approach to general functions
Look you're coming at this from a mathematics perspective and worrying about every single detail. There's value in this, obviously, but it's unnecessary in practical use, in the same way that I don't have to explain Dedekind cuts to kids before they start to work with real numbers. Nor do I explain measure theory to beginners before they start integrating stuff.
> What if you picked N^2? or 2^N? or some upper limit set whose existence is stronger than choice?
I'm not sure what you mean about picking N to be an upper limit set. N is a hyperreal here, not an ordinal. There aren't really set theoretic difficulties, you can easily construct a model of the hyperreals in ZFC.
It doesn't matter what representative you pick for your Heaviside function - so long as it differs pointwise from the standard Heaviside function by infinitesmals you will get a delta function by differentiating it in NSA. And continuing to differentiate it will give you the higher multiple moments. This is what I meant in my previous response.
It's useful to have the choice because depending on what you want to model you can have non-standard functions that "go to infinity" twice as fast as other functions, for instance. Taking the equivalence class destroys that information, which is sometimes useful, and sometimes not. If all you care about is a small computation you can just pick a representative and move on, I don't think it's a big deal.
Anyway, I'm going to leave this discussion now - we're not really bickering over anything important to my mind. Use the tool you like! Personally I'm having fun playing with NSA right now. Thanks for your time.
My point is that as a practical tool to simplify calculations, i see no value in NSA.
The specific infinite results you get depend on the representative you choose and if you want to calculate integrals which you have yet to define, you need to work harder. On the other hand, if you care about modeling specific infinities for applications you are very likely also someone who needs and wants the actual theory.
As someone with mathematical background indeed i repeat that i think this theory is interesting and worth studying. Just not that it simplifies anything.
Comment was deleted :(
Yeah, you make some good points. I think at the end of the day both can work - anything you can do in the standard analysis setting can in principle be done in NSA too by the transfer theorem, it's just less travelled ground and maybe the requisite techniques aren't as well known.
Integration is very similar, by the way - you just do Riemann sums with a hyperfinite number of partitions of infinitesmal width. It sounds weird, and it is, but it makes it very easy to understand why integrating f(x) against a delta function gives you f(0), for instance, without having to justify it with limits or a bunch of deeper theory.
Even just defining the hyperreals and showing why statements about them are also valid for the reals needs to go through either ultrafilters (which are some rather abstract objects) or model theory. Of course you can just handwave all of that away but then I guess you can also do that with standard analysis.
There are theories like SPOT and Internal Set Theory that don’t require filters.
Plus the ancient mathematicians did very well with just their intuition. And more to the point, I cared much more about building (hyper)number sense than some New Math “let’s learn ultrafilters before we’ve even done arithmetic”.
> Plus the ancient mathematicians did very well with just their intuition.
They did. But they also got things wrong, such as thinking that pointwise limits are enough to carry over continuity (see here for this and other examples: https://mathoverflow.net/a/35558). Anyway, mathematics has changed as a discipline, we now have strong axiomatic foundations and they mean that we can, in principle, always verify whether a proof is correct.
Certainly I'm glad we have better tools now, and know the rules. Sure beats the old days. As for verification, well I'm big into Lean 4 and it plays well with NSA since transferring theory and proofs between finite and infinite saves a lot of labor on the computer.
Oh I wasn't trying to say that NSA is in any way non-rigorous, only that if you make it rigorous it does require some machinery that is itself not terribly straightforward.
As for whether that's worth it or not, I have no strong opinion.
Related to the Hyperreal numbers mentioned in the article is the class of Surreal numbers which have many fun properties. There's a nice book describing them authored by Don Knuth.
The hyperreals and surreals are actually isomorphic under a mild strengthening of the axiom of choice (NBG).
https://mathoverflow.net/questions/91646/surreal-numbers-vs-...
See Ehrlich’s answer.
>We’ll use the hyperreal numbers from the unsexily named field of nonstandard analysis
There it is.
Crafted by Rajat
Source Code